- Accuracy, Precision, Recall, & F1
- Active Learning
- AI vs. ML vs. DL
- Apache Spark
- Arbiter
- Artificial Intelligence (AI)
- Attention Mechanism Memory Networks
- Automated Machine Learning & AI
- Autonomous Vehicle
- Backpropagation
- Bag of Words & TF-IDF
- Comparison of AI Frameworks
- Convolutional Neural Network (CNN)
- Data for Deep Learning
- Datasets and Machine Learning
- DataVec
- Decision Tree
- Deep Autoencoders
- Deep-Belief Networks
- Deep Reinforcement Learning
- Deep Learning Resources
- Deeplearning4j
- Denoising Autoencoders
- Machine Learning DevOps
- Differentiable Programming
- Eigenvectors, Eigenvalues, PCA, Covariance and Entropy
- Evolutionary & Genetic Algorithms
- Fraud and Anomaly Detection
- Generative Adversarial Network (GAN)
- Glossary
- Gluon
- Graph Analytics
- Hopfield Networks
- Hyperparameter
- Wiki Home
- Java AI
- Jumpy
- Logistic Regression
- LSTM
- Machine Learning Algorithms
- Machine Learning Demos
- Machine Learning Software
- Machine Learning Operations (MLOps)
- Machine Learning Research Groups & Labs
- Machine Learning Workflows
- Machine Learning
- Markov Chain Monte Carlo
- Multilayer Perceptron
- ND4J
- Neural Network Tuning
- Neural Network
- Open Datasets
- Radial Basis Function Networks
- Random Forest
- Recurrent Network (RNN)
- Recursive Neural Tensor Network
- Restricted Boltzmann Machine (RBM)
- Robotic Process Automation (RPA)
- Scala AI
- Single-layer Network
- Skynet
- Spiking Neural Networks
- Stacked Denoising Autoencoder (SDA)
- Strong AI & General AI
- Supervised Learning
- Symbolic Reasoning
- Text Analysis
- Thought Vectors
- Unsupervised Learning
- Deep Learning Use Cases
- Variational Autoencoder (VAE)
- Word2Vec, Doc2Vec and Neural Word Embeddings

*To propagate* is to transmit something (light, sound, motion or information) in a particular direction or through a particular medium. When we discuss backpropagation in deep learning, we are talking about the transmission of information, and that information relates to the error produced by the neural network.

Neural networks are like new-born babies: They are created ignorant of the world, and it is only through exposure to the world, experiencing it, that their ignorance is slowly revised. Algorithms experience the world through data. So by training a neural network on a relevant dataset, we seek to decrease its ignorance. The way we measure progress is by monitoring the error produced by the network.

The knowledge of a neural network with regard to the world is captured by its weights, the parameters that alter input data as the signal flows through the neural network towards the final layer that will make a decision about that input. Those decisions are often wrong, because the parameters transforming the signal into a decision are wrong.

So the parameters of the neural network have a relationship with the error the net produces, and when the parameters change, presumably the error does, too. We change the parameters using an optimization algorithm called gradient descent, which is useful for finding the minimum of a function. We are seeking to minimize the error, which is also known as the *loss function* or the *objective function*.

So a neural propagates the signal of the input data forward through its parameters towards the moment of decision, and then *backprogates* information about the error through the network so that it can alter the parameters one step at a time.

A *gradient* is a slope whose angle we can measure. Like all slopes, it can be expressed as a relationship between two variables: “y over x”, or *rise over run*. In this case, the `y`

is the error produced by the neural network, and `x`

is the parameter. So the gradient tells us the change we can expect in `y`

with regard to `x`

.

To obtain this information, we must use differential calculus, which enables us to measure *instantaneous rates of change*, which in this case is the tangent of a changing slope expressed the relationship of the parameter to the net’s error.

Obviously, a neural network has many parameters, so what we’re really measuring are the *partial derivatives* of each parameter’s contribution to the total change in error.

What’s more, neural networks have parameters that process the input data sequentially, one after another. Therefore, backpropagation establishes the relationship between the neural network’s error and the parameters of the net’s last layer; then it establishes the relationship between the parameters of the neural net’s last layer those the parameters of the second-to-last layer, and so forth, in an application of the *chain rule of calculus*.