- AI vs. ML vs. DL
- Apache Spark & Deep Learning
- Attention Mechanisms & Memory Networks
- Automated Machine Learning & AI
- AI & Autonomous Vehicles
- Backpropagation
- Bag of Words & TF-IDF
- Bayes' Theorem & Naive Bayes
- Clojure AI
- Comparison of AI Frameworks
- Convolutional Neural Network (CNN)
- Data for Deep Learning
- Datasets and Machine Learning
- Decision Tree
- Deep Autoencoders
- Deep-Belief Networks
- Deep Reinforcement Learning
- Deep Learning Resources
- Deeplearning4j
- Denoising Autoencoders
- Machine Learning DevOps
- Differentiable Programming
- Eigenvectors, Eigenvalues, PCA, Covariance and Entropy
- Evolutionary & Genetic Algorithms
- Fraud and Anomaly Detection
- Generative Adversarial Network (GAN)
- Glossary
- Gluon
- Graph Analytics
- Hopfield Networks
- Hyperparameter
- Wiki Home
- Java AI
- Jumpy
- Logistic Regression
- LSTMs & RNNs
- Machine Learning Algorithms
- Machine Learning Demos
- Machine Learning Software
- Machine Learning Operations (MLOps)
- Machine Learning Research Groups & Labs
- Machine Learning Workflows
- Machine Learning
- Markov Chain Monte Carlo
- Multilayer Perceptron
- Natural Language Processing (NLP)
- ND4J
- Neural Network Tuning
- Neural Networks
- Open Datasets
- Python AI
- Questions When Applying Deep Learning
- Radial Basis Function Networks
- Random Forest
- Recurrent Network (RNN)
- Recursive Neural Tensor Network
- Restricted Boltzmann Machine (RBM)
- Robotic Process Automation (RPA) & AI
- Scala AI
- Single-layer Network
- Skynet, or How to Regulate AI
- Spiking Neural Networks
- Stacked Denoising Autoencoder (SDA)
- Strong AI vs. Weak AI
- Supervised Learning
- Symbolic Reasoning
- Text Analysis
- Thought Vectors
- Unsupervised Learning
- Deep Learning Use Cases
- Variational Autoencoder (VAE)
- Word2Vec, Doc2Vec and Neural Word Embeddings

Broadly speaking, neural networks are used for the purpose of clustering through unsupervised learning, classification through supervised learning, or regression. That is, they help group unlabeled data, categorize labeled data or predict continuous values.

While classification typically uses a form of logistic regression in the net’s final layer to convert continuous data into dummy variables like 0 and 1 – e.g. given someone’s height, weight and age you might bucket them as a heart-disease candidate or not – true regression maps one set of continuous inputs to another set of continuous outputs.

For example, given the age and floor space of a house and its distance from a good school, you might predict how much the house would sell for: continuous to continuous. No dummy variables like one finds in classification, just mapping independent variables `x`

to a continuous `y`

.

Reasonable people can disagree about whether using neural networks for regression is overkill. The point of this post is just to explain how it can be done (it’s pretty easy).

In the diagram above, `x`

stands for input, the features passed forward from the network’s previous layer. Many x’s will be fed into each node of the last hidden layer, and each `x`

will be multiplied by a corresponding weight, `w`

.

The sum of those products is added to a bias and fed into an activation function. In this case the activation function is a *rectified linear unit* (ReLU), commonly used and highly useful because it doesn’t saturate on shallow gradients as sigmoid activation functions do.

For each hidden node, ReLU outputs an activation, `a`

, and the activations are summed going into the output node, which simply passes the activations’ sum through.

That is, a neural network performing regression will have one output node, and that node will just multiply the sum of the previous layer’s activations by 1. The result will be `ŷ`

, “y hat”, the network’s estimate, the dependent variable that all your `x`

’s map to.

To perform backpropagation and make the network learn, you simply compare `ŷ`

to the ground-truth value of `y`

and adjust the weights and biases of the network until error is minimized, much as you would with a classifier. Root-means-squared-error (RMSE) could be the loss function.

In this way, you can use a neural network to get the function relating an arbitrary number of independent variables x to a dependent variable y that you’re trying to predict.