SKIL Community Edition 1.0.0

Getting Started

Welcome

SKIL Community Edition (SKIL CE) gives developers an easy way to train and deploy powerful deep learning models to production quickly and easily.

SKIL CE is an on-premise, AWS-like platform for machine learning where data scientists and engineers can use an open-source stack of machine learning and big data tools. It enables a managed Spark/GPU cluster as well as a managed model server for experiment tracking and model deployment, accessible through notebooks and a GUI. The platform is extensible, a job runner for machine learning apps. SKIL CE's public beta was released on November 2. We welcome feedback on the Github issues page. The next release is slated for early December.

In this quick start, you’ll learn how to:

  • Download and install SKIL CE 1.0.0
  • Create a sample workspace notebook (using Scala) and train a model
  • Deploy the model to the SKIL model server
  • Get a prediction from the REST endpoint
    • Via Web browser
    • Via Java code

Currently the SKIL Community Edition supports the Centos 7 and Redhat 7 operating systems.

A simple way to setup SKIL CE on AWS is by using offical CentOS 7 images from:
https://wiki.centos.org/Cloud/AWS
Specific details: CentOS Linux 7 1701_01 (2018-Mar-31), us-east-1 - AMI: ami-ae7bfdb8 (x86_64)
Users will need to install git and Apache Maven as well on their Linux image to do specific parts of the quick start.

Download and Install SKIL CE 1.0.0 Packages via YUM

To get the SKIL Community Edition (SKIL CE) application installed locally, use yum to download and install the RPM files from a remote repository.

The yum repositories provide packages for RHEL, CentOS, and Fedora-based distributions.

Add the repository to your /etc/yum.repos.d/ directory in a file named skymind.repo

[Skymind]
name=Skymind repository
baseurl=http://packages.skymind.io/rpm/1.0
gpgcheck=0

Once yum is locally configured for access to the RPM files, you can start the install process.

Downloading SKIL CE RPMs via YUM on the Command Line

First, clear the yum caches:

$ sudo yum clean all

The repository is now ready for use. You can install SKIL CE with this command:

$ sudo yum install skil-server

Setup and Start SKIL CE

First, you must disable SELinux. Run setenforce 0 to disable it temporarily. Or follow this guide to disable it permanently:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Enabling_and_Disabling_SELinux-Disabling_SELinux.html

Make sure ports 9008 and 8080 are open.

To start the SKIL CE system locally we would use the following command:

[[email protected] ~]$ sudo systemctl start skil

Wait about a minute for SKIL to fully start. Everything is up and running when the output of jps looks like:

$ jps
38802 ZeppelinInterpreterMain
43044 Jps
38341 ModelHistoryServer
38295 ProcessLauncherDaemon
38363 ZeppelinMain

Once SKIL CE is running locally, you can login to build and deploy your first model.

Login to SKIL

To login to our local installed version of SKIL CE we will visit the address http://[host]:9008 in a web browser. This will give us the login screen as seen below.

For the Community Edition of SKIL we have a single account:

Login: “admin”
Password: “admin”

SKIL CE is meant to be used and evaluated by a single data scientist so in this quick start article we don’t mention anything about role or account management.

Let’s now take a look a building our first notebook and deploying it to the model server.


Taking a Model from Notebook to Deployment

Overview

With SKIL and SKIL CE the goal is to quickly prototype deep learning applications and deploy the best models to a production-class model server. Deep learning models can be complex to construct and unwieldy to deploy. SKIL takes away these pain points for both data scientists and infrastructure engineering teams. Deep learning has a wide domain of applications and touches nearly every industry. Here are a few deep-learning models that can integrate into our applications:

  • ICU Patient Mortality
  • Computer Vision applications
  • Anomaly Detection
  • Generating Shakespeare or Beer Reviews

Beyond just building the aforementioned models, we also want to bring their predictions to bear on real-world applications such as:

  • A Tomcat Web application
  • A Wildfly application
  • A mobile application
  • A streaming system (Streamsets)
  • Robotics applications (Somatic, Lego Mindstorms)

Skymind’s SKIL Platform helps you build the above deep learning models and integrate these models into applications easily with a workbench and a model server.

You can track multiple variants and versions of a model, compare how each performed after training, and deploy the best model to production on SKIL’s AI model server with a single click. With SKIL CE, developers can start building real, state-of-the-art deep learning applications without worrying about infrastructure and plumbing. In the diagram below, we can see a general overview of how the SKIL Workspace system and SKIL’s AI model server work together to provide an enterprise-class platform for operationalizing deep learning applications.

Here’s how to build your first SKIL model with SKIL CE.

Getting Started Building Models with SKIL Workspaces

To get started, create a new workspace in the SKIL user interface by following these steps:

1) Click on the workspaces tab on the left side of the main SKIL interface to bring up the workspaces screen, as seen in the image below.

Every new workspace is a place to conduct a set of “experiments” centered around a particular project.

In SKIL-speak, “experiments” are different configurations of neural net models and data pipelines applied to a given problem, and a workspace is effectively a lab that data scientists can use to test which neural networks perform best on that problem. An experiment in SKIL can be contained in a Zeppelin notebook, which allows us to save, clone, and run a specific modeling job as needed.

Each notebook is tracked and indexed by SKIL and can send its trained model over to SKIL’s AI model server. A SKIL Workspace can have many different notebooks for different experiments conducted as data scientists seek the best model. Let’s say that after a few experiments, you find a model that performs well and you want to integrate it into a production application.

2) Create a Workspace

After clicking on the workspaces tab on the left, you simply click on the “Create Workspace” button on the right side of the page (see below).

Clicking on the “Create Workspace” button brings up this dialog window:

In this window, you name an experiment to distinguish it from other experiments that will be created in this common workspace. For this tutorial, please call this experiment “First Sensor Project.” Optionally, we can add a few labels to this workspace to help identify it later (e.g., “side_project”, or “prod_job”, etc). When you’re ready to finalize the new workspace, you can click “Create Workspace” on the lower right corner of the window.

A new workspace should appear in the list of workspaces:

If we click on the workspace name we just created (“First Sensor Project”) we’ll see the workspace details, like so:

Now you can create your first experiment in the new workspace.

Create an Experiment in the Workspace

Inside this workspace, you can create experiments with Zeppelin notebooks. You and your team can create, run, and clone these experiment notebooks to improve collaboration and speed up time to insight. By clicking on the “Create New Experiment” button (above), you’ll bring up the “Create Experiment” dialog window (below).

Give this experiment a unique, descriptive name that will make it easy to find later, using the input box under “Experiment Name.” Select the only listed option for “Model History Server ID” and “Zeppelin Server ID” (in the present version of SKIL CE, there will be only one option for each of them). You can also provide a distinct notebook name that will apply within the Zeppelin notebook storage system.

Once you’re done setting up the experiment and its notebook, click the “Create Experiment Notebook” button. The new experiment should appear in the list of experiments for the current workspace, as seen below.

With the experiment created, you can check out the associated experiment notebook by clicking the “Open Notebook” button for the new experiment. That will bring up the embedded notebook system (below).

Note: on the first time we use a notebook in SKIL, we need to initialize the SKIL system. Click “Login” within the Zeppelin window and use an “admin” username and “admin” password.

Once logged in, click the “Notebook” dropdown and select your experiment’s notebook.

Each notebook starts out with a generic template containing DL4J code that would serve as the basis of a typical project.

For this example, you’re going to build an LSTM model based on sensor data. The code for this notebook is here:

uci_quickstart_notebook.scala

In this specific example, delete the blocks of template code and copy and paste blocks of code from the github link above into the notebook. The notebook should autosave, but at any point you can make a specific commit into the Zeppelin version control system with the “version control” button as shown in the image below.

The user can click on the version control button inside Zeppelin and optionally add a commit message to save the current state. Once the notebook code is entered into the notebook itself and saved, you’re ready to execute this notebook and produce the model.

Running the Experiment

To run the experiment notebook, click on the “play” icon on the top toolbar inside the Zeppelin embedded notebook UI. This will run all of the code paragraphs inside the current notebook.

The notebook will take some time to run. Once it’s complete, the output of the notebook will be visible in the notebook itself.

Now we’ll dig into what happened in this example by highlighting a few specific sections in the code. The first paragraph in the notebook runs all the needed imports (in Scala) and sets the other paragraphs up to be ready to execute. There are four major functional areas within this notebook:

  1. UCI data download and data prep / ETL
  2. Neural network configuration
  3. Network Training loop
  4. Registering modeling results with SKIL model server

The first area of the code we’ll highlight is where we download the training data for the user and then perform some basic ETL on the code. The link takes us to a section of code where we see the raw csv data:

  1. Loaded from disk
  2. Converted into sequences
  3. Statistics about the data are collected
  4. Sequences are then normalized/standardized

The next area of code is where we use the DL4J api to define the neural network topology. In this example we’re using a variant of a recurrent neural network called a long short-term memory network (LSTM).

        // Configure the network
        val conf: ComputationGraphConfiguration = new NeuralNetConfiguration.Builder()
            .seed(123)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .iterations(1)
            .weightInit(WeightInit.XAVIER)
            .updater(new Nesterovs(0.005, 0.9))
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
            .gradientNormalizationThreshold(0.5)
            .graphBuilder()
            .addInputs("input")
            .setInputTypes(InputType.recurrent(1))
            .addLayer("lstm", new GravesLSTM.Builder().activation(Activation.TANH).nIn(1).nOut(10).build(), "input")
            .addVertex("pool", new LastTimeStepVertex("input"), "lstm")
            .addLayer("output", new OutputLayer.Builder(LossFunction.MCXENT)
                   .activation(Activation.SOFTMAX).nIn(10).nOut(numLabelClasses).build(), "pool")
            .setOutputs("output")
            .pretrain(false)
            .backprop(true)
            .build()
      

The network has a single LSTM hidden layer (Graves variant) and a softmax output layer to give probabilities across the six classes of timeseries data. The training loop for the network is for 40 epochs (each epoch is a complete pass over all records in the training set).

        val nEpochs: Int = 40
        for (i <- 0 until nEpochs) {
            network_model.fit(trainData)
            // Evaluate on the test set:
            val evaluation = eval(testData)
            var accuracy = evaluation.accuracy()
            var f1 = evaluation.f1()
            println(s"Test set evaluation at epoch $i: Accuracy = $accuracy, F1 = $f1")
            testData.reset()
            trainData.reset()
        }
        Right below the training loop in the notebook, a few debug lines show how to query an LSTM network.
        // Test one record (label should be 1)
        val record = Array(Array(Array(
            -1.65, 1.38, 1.37, 2.56, 2.72, 0.64, 0.76, 0.45, -0.28, -2.72, -2.85, -2.27, -1.23, -1.42, 0.90,
            1.81, 2.77, 1.12, 2.25, 1.26, -0.23, -0.27, -1.74, -1.90, -1.56, -1.35, -0.54, 0.41, 1.20, 1.59,
            1.66, 0.75, 0.96, 0.07, -0.70, -0.32, -1.13, -0.77, -0.96, -0.55, 0.39, 0.56, 0.52, 0.98, 0.91,
            0.23, -0.13, -0.31, -0.98, -0.73, -0.85, -0.77, -0.80, -0.04, 0.64, 0.77, 0.50, 0.98, 0.40, 0.24
        )))
        var flattened = ArrayUtil.flattenDoubleArray(record)
        var input = Nd4j.create(flattened, Array(1, 1, 60), 'c')
        var output = network_model.output(input)
        var label = Nd4j.argMax(output(0), -1)
        println(s"Label: $label")
      

The label that is returned from the network prediction should be “1”, which we’ll hand check from the client side in a moment. The notebook ends with a block of code that collects the model just trained, and catalogs it in the model history tracking system.

Each experiment needs to send a model and its evaluation metrics to the model server to be registered and archived. Each SKIL notebook must include a small amount of code, explained below, to make sure the model gets stored in the right place.

        val modelId = skilContext.addModelToExperiment(z, network_model)
        val evalId = skilContext.addEvaluationToModel(z, modelId, evaluation)
      

In addition to the correct import headers and creating the skilContext object near the top of the notebook, these lines of code are what’s necessary to connect this notebook with the result of the SKIL system. In the first line, the specific model is attached to the experiment in the SKIL system. In the next line, the evaluation metric results are cataloged with the model ID tag in SKIL, so the model can be evaluated in the UI against other models later.

Cataloging a Model in the Model Server

The AI model server plays a major role in how SKIL stores and integrates deep learning models with user applications. It stores all of the model revisions for an experiment and lets you choose which model you’d like to mark as “active” or “deploy.” Deploying a model means that it will be the model that serves the predictions to production applications who are querying the REST endpoint.

After the above notebook example is done running in SKIL, click on the “Models” sub-tab in the experiment page to see the new model listed in the table (as below). This may require a page refresh.

Now that you’ve built a model with the notebook and made sure the model was cataloged in the model server, let’s see how to expose the model to the rest of the world through the REST interfaces by deploying the model.

Deploy the Model to the SKIL Model Server

Once the model is indexed in the model history server, it will show up in the list of models that can be deployed to production and handle new data inference requests.

In the “Models” sub-tab in the experiment page, there’s a list of all models produced by notebook runs for this experiment, as seen above. Clicking on one of the models in the list will bring up specific model details (below).

For each model in this list, we have two operations we can perform:

  1. Mark Best
  2. Deploy

If we mark a model as “Best” (or the one we prefer the most) it is pinned to the top of the model list, as seen above. If we click the “Deploy Wizard” button, a “deploy model” dialog window comes up (below).

Within SKIL, a “deployment” is a “logical group of models, transforms, and KNN endpoints”. It helps us logically group deployed components to track what goes together and manage the system better.

As explained in the dialog, this wizard will make your model available via a REST API. Then you’ll be able to expose the ETL process as a transform endpoint, and configure the model. You also have the option to update an existing model in place. Clicking “next” will let you either create a new deployment or replace an existing one, as seen in the dialog below.

Let’s create a new deployment and name it “OurFirstDeployment”. In the dialog window that comes up after pressing the “Next” button, you see the option to “deploy the current ETL json as a transform”.

This relates to the ETL portion of vectorizing data into the correct vector format, which we won’t deal with in this specific quick start article. So for now, you can leave that checkbox unchecked. Clicking “Next” again takes us to the final deployment wizard screen (below).

The “name” option here is different from “deployment name”, as it distinguishes the model inside the deployment group. It is required. We’ll use the name “lstm_model” for this example. You can also see the static filepath for the physical model file in the local filesystem. The “Scale” option tells the system how many model servers to start for model replication. SKIL CE is limited to 1 model server, so you don’t have to change that parameter. In the next line, the option exists to provide additional JVM arguments. The last option, “Endpoint Urls”, gives the option of housing multiple models under the same URI. We won’t set this option in the course of this quick start tutorial. Accept the “Deployment Mode” as “New Model”, and then click “Deploy” to finalize the deployment.

Once the deployment is finalized by clicking the “Deploy” button, the model will be listed in the “Deployments” screen (below).

Clicking on the entry for this newly deployed model brings up the deployment details (below).

NOTE: If the model is not deployed, click the “Start” button on the model:

This deployment includes all vectorization transforms deployed in support of the model deployments. It also includes the endpoint URLs relative to the model server in the “endpoint” column. Here’s how to get live predictions via a REST interface from this newly deployed model:

Troubleshooting

If you get the following error at any point (Deploying Model, No JWT present or has expired):

Leave the current browser tab open. Create a new tab and log in again. Close the new tab now that you’re logged back in. Retry the original action that caused the error.


Get Predictions from the Model Server

Once you have a deployment created and launched, you need to connect the “last mile”. This involves actually serving live predictions from this newly created model to a real application.

In this section, you’ll learn how to set up a sample Java client (which could easily be integrated into a Tomcat or Java application) to query the model you just built with the notebook. The model server is running locally and it has exposed a REST endpoint at:

http://[host]:9008/endpoints/ourfirstdeployment/model/lstmmodel/default/

Now configure the client to “speak REST” properly and send the specific query using input data that you select. Let’s take a look at how to get these predictions, with the REST Java client sample code below.

Get Prediction via Java REST Code Example

To get a working SKIL model server client, “git clone” the project stored here

https://github.com/SkymindIO/SKIL_CE_1.0.0_Examples

With the following command:

$ git clone [email protected]:SkymindIO/SKIL_CE_1.0.0_Examples.git

Build the client application JAR with the following commands:

$ cd SKIL_CE_1.0.0_Examples
$ cd sample-api
$ mvn clean package
      

Once you build the client application, you can use the JAR to make a REST call to SKIL’s AI model server with the following command:

$ java -jar target/skil-ce-examples-1.0.0.jar quickstart http://host:9008/[skil_endpoint_name]
      

Note: Replace [skil_endpoint_name] with the endpoint your model was deployed to.

The output of the client example code should look like:

Inference response: Inference.Response.Classify{results{[1]}, probabilities{[0.9729845]}
  Label expected: 1
Inference response: Inference.Response.Classify{results{[4]}, probabilities{[0.8539419]}
  Label expected: 4
Inference response: Inference.Response.Classify{results{[5]}, probabilities{[0.9414516]}
  Label expected: 5
Inference response: Inference.Response.Classify{results{[5]}, probabilities{[0.94135857]}
  Label expected: 5

The output above shows inference predictions the SKIL AI model server returns, referring to specific labels in the classifier.

With that, you’ve built your first deep learning application with SKIL, from notebook to deployed production model. We hope you enjoyed the experience. Watch this space for future tutorials on more advanced applications of deep learning in production.