A.I. Wiki

Accuracy, Precision, Recall, & F1

When a data scientist has chosen a target variable - the “column” in a spreadsheet they wish to predict - and have done the prerequisites of transforming data and building a model, one of the most important steps in the process is evaluating the model’s performance.

Confusion Matrix

Choosing a performance metric often depends on the business problem being solved. Let’s say you have 100 examples in your data and you’ve fed each one to your model and received a classification. The predicted vs. actual classification can be charted in a table called a confusion matrix.

  Negative (predicted) Positive (predicted)
Negative (actual) 98 0
Positive (actual) 1 1

The table above describes an output of negative vs. positive. These two outcomes are the “classes” of each examples. Because there are only two classes, the model used to generate the confusion matrix can be described as a binary classifier.

To better interpret the table, you can also see it in terms of true positives, false negatives, etc.

  Negative (predicted) Positive (predicted)
Negative (actual) true negative false positive
Positive (actual) false negative true positive


Overall, how often is our model correct?

accuracy formula

As a heuristic, accuracy can immediately tell us whether a model is being trained correctly and how it may perform generally. However, it does not give detailed information regarding its application to the problem.


When the model predicts positive, how often is it correct?

precision formula

Precision helps determine when the costs of false positives are high. So let’s assume the business problem involves the detection of skin cancer. If we have a model that has very low precision, the result is that many patients will receive results that they have melanoma. Lots of extra tests and stress are at stake.


recall formula

Recall helps determine when the costs of false negatives are high. What if our problem requires that we check for a fatal virus such as Ebola? If many patients are told they don’t have Ebola (when they actually do), the result is likely a large infection of the population and an epidemiological crisis.

F1 Score

f1 formula

F1 is a helpful measure of a test’s accuracy. It is a consideration of both precision and recall, and an F1 score is considered perfect when at 1 and is a total failure when at 0.

Start a free consultation today

Our AI experts will chat with you and your solutions architect for a 30 min Q&A.