ML Ops for the Edge

Add professional AI integration to your standalone system

Why Skymind?

Healthcare, scientific, and engineering hardware are often complex embedded systems that cannot be supported like consumer products. Skymind's SKIL gives you enterprise tooling and support for deploying and debugging professional machine learning models for projects at the edge. Add critical machine learning operations to firewalled environments and tooling for debugging on-site deployments.

Major Libraries Supported

Tooling and debugging for the edge

API clients available in 8 different languages allowing you to tightly integrate and debug in remote and inaccessible environments.

          import skil_client
          ndarray = INDArray(array=base64.b64encode(x_in))
          input = Prediction(id=1234, prediction=ndarray, needsPreProcessing=false)
          result = client.predict(input, "production", "recommender_rnn")
          feedback = client.feedback(result.id, result.array, input, expected)
          logs = client.logs()
        

SKIL comes in multiple distribution flavours and custom support to integrate in custom hardware environments, supply fault tolerance, and integrate with other embedded storage systems.

Benefits


Configuring

Training

Collaborating

Versioning

Deploying

Serving

Management UI

1-Click Deployment

  • Faster time to market
  • Track model progress and share experiments, then deploy the best to production
  • AI model server with load balancer, zero-downtime updates, timed updates

Critical integrations

  • API clients in 8 languages, including Python, C#, and Java
  • Apache ZooKeeper for fault tolerance and state
  • Query using HTTP or Thrift RPC

AWS-like ML platform

  • Bare-metal or Cloud deployment
  • Managed Spark/GPU Cluster
  • Distributed, Hybrid CPU/GPU Resource Management
  • Multi-region self-healing fault tolerance

Continuous Deployment and Versioning

Script your workflow just like a continuous integration server. Set up versioning of trained models, periodically train on feedback data, and rollback when new versions underwhelm.

  • Versioning of deployed models
  • Store evaluation results
  • Rollback when performance degrades
  • "Cron jobs" for online learning/batch training
  • Integrate feedback data for retraining

Scalability and Fault Tolerance

The built-in cluster architecture replicates models to create a redundant system that won't fail due to node failure. SKIL uses the tried and trusted Apache ZooKeeper to ensure high availability.

  • Recover from node failures
  • Automatic load balancing across instances
  • Leader election and automatic standby
  • Any JDBC-compatible integration for backup
  • Continuous heartbeat and process checking

Architecture

Compare

Feature Description Community Enterprise
Supported Libraries
Deeplearning4j Deep Learning for the JVM on Hadoop & Spark
Tensorflow Importing Pre-Trained Models from TensorFlow
Keras Importing Pre-Trained Models from Keras
DataVec Data ETL Normalization and Vectorization
ND4J High Performance Linear Algebra CPU and GPU Library on the JVM
RL4J Reinforcement Learning Algorithms
SKIL Platform
Model Server Integrated Model Hosting, Management, and Version Control LIMITED
Model Import Importing Pre-Trained Models from TensorFlow and Keras
Workspaces Notebook System for Model Construction and Collaboration LIMITED
Hardware Acceleration Managed CUDA for GPU and MKL for CPU
Integration Tooling Native Integration with CDH and HDP
Application
Somatic Sensor Vision and Control Integration for Robotics
Support
Online Community Access to Community Forum, Videos, and Documentation
Development Support General Feature Engineering and Model Tuning Advice
SLA Guaranteed Uptime and Response Times
Cost Free Contact Us