Serving for Deep Learning

Deploy any Tensorflow, Keras, PMML, or Deeplearning4j model as an enterprise-ready microservice

Introduction

Data science teams wanting to move from research to production need performance, security and logging, scalability, and feedback. The Skymind Intelligence Layer (SKIL) bridges data science to deployment and gives you the tools and support to deploy in a professional business environment. SKIL deploys on any infrastructure with support for bare metal and Kubernetes.

Major Libraries Supported

Upload, Deploy, and Query in less than 10 lines of code

API clients available in 8+ different languages with all of the functions you need to interact with a SKIL cluster.

          import skil_client
          uploads = client.upload("tensorflow_rnn.pb")
          new_model = DeployModel(name="recommender_rnn", scale=30, file_location=uploads[0].path)
          model = client.deploy_model(deployment_id, new_model)
          ndarray = INDArray(array=base64.b64encode(x_in))
          input = Prediction(id=1234, prediction=ndarray, needsPreProcessing=false)
          result = client.predict(input, "production", "recommender_rnn")
        

Integrated with Hadoop and Spark, SKIL is designed to be used in business environments on distributed GPUs and CPUs on-prem, in the cloud, or hybrid.

Benefits


Configuring

Training

Collaborating

Versioning

Deploying

Serving

Management UI

1-Click Deployment

  • Faster time to market
  • Track model progress and share experiments, then deploy the best to production
  • AI model server with load balancer, zero-downtime updates, timed updates

Critical integrations

  • API clients in 8 languages, including Python, C#, and Java
  • Apache ZooKeeper for fault tolerance and state
  • Query using HTTP or Thrift RPC

AWS-like ML platform

  • Bare-metal or Cloud deployment
  • Managed Spark/GPU Cluster
  • Distributed, Hybrid CPU/GPU Resource Management
  • Multi-region self-healing fault tolerance

Continuous Deployment and Versioning

Script your workflow just like a continuous integration server. Set up versioning of trained models, periodically train on feedback data, and rollback when new versions underwhelm.

  • Versioning of deployed models
  • Store evaluation results
  • Rollback when performance degrades
  • "Cron jobs" for online learning/batch training
  • Integrate feedback data for retraining

Scalability and Fault Tolerance

Cluster architecture for best practices for protecting system state from node failure. Integrates with best-in-class tooling such as Apache ZooKeeper to support leader election and high availability.

  • Recover from node failures
  • Automatic load balancing across instances
  • Leader election and automatic standby
  • Any JDBC-compatible integration for backup
  • Continuous heartbeat and process checking

Architecture

Compare

Feature Description Community Enterprise
Supported Libraries
Deeplearning4j Deep Learning for the JVM on Hadoop & Spark
Tensorflow Importing Pre-Trained Models from TensorFlow
Keras Importing Pre-Trained Models from Keras
DataVec Data ETL Normalization and Vectorization
ND4J High Performance Linear Algebra CPU and GPU Library on the JVM
RL4J Reinforcement Learning Algorithms
SKIL Platform
Model Server Integrated Model Hosting, Management, and Version Control LIMITED
Model Import Importing Pre-Trained Models from TensorFlow and Keras
Workspaces Notebook System for Model Construction and Collaboration LIMITED
Hardware Acceleration Managed CUDA for GPU and MKL for CPU
Integration Tooling Native Integration with CDH and HDP
Application
Somatic Sensor Vision and Control Integration for Robotics
Support
Online Community Access to Community Forum, Videos, and Documentation
Development Support General Feature Engineering and Model Tuning Advice
SLA Guaranteed Uptime and Response Times
Cost Free Contact Us