Naive Bayes

Naive Bayes is a machine learning method for classification that makes simple ("naive") classifications for all the data points. While other, more complex methods might outshine for very complex data, Naive Bayes performs remarkably well despite its simplicity. There are many different algorithms one can use for Naive Bayes, yet they all operate on one simple assumption: that all the features are unrelated to each other. It finds the probability that any given data point is associated with any one of the classes, and then uses the class with the highest probability as the answer.

While Naive Bayes uses some notions from Bayes Theorem, you don’t need to know much about Bayesianism to use Naive Bayes for machine learning.

Common Applications

Common Problem Types

  • Classification

  • Tasks where "good enough" accuracy is ok

  • Tasks where Curse of Dimensionality is a problem (Naive Bayes doesn’t suffer from the Curse as much)

  • Analysis where quick and computationally efficient predictions matter (such as real time speech processing)

A Brief History

While Bayes Theorem itself has been around for a few hundred years, the origins of Naive Bayes are a bit obscure. The assumption that variables are independent was thought to be absurd and more complex alternatives were preferred, especially in the past century. However, as classification tasks became more common with the rise of artificial intelligence tasks in general, Naive Bayes (sometimes called Independence Bayes, Idiot’s Bayes, etc) was given a second look and while there are certainly superior methods for complex data, the surprisingly simple and fast computation for classification that Naive Bayes offered was rediscovered, especially with the rise of machine learning in the past 30 years. For further reading, we suggest this paper.

Code Examples

All of the code examples are written in Python, unless otherwise noted.

Containers

These are code examples in the form of Jupyter notebooks running in a container that come with all the data, libraries, and code you’ll need to run it. Click here to learn why you should be using containers, along with how to do so.
Quickstart: Download Docker, then run the commands below in a terminal.
#pull container, only needs to be run once
docker pull ghcr.io/thedatamine/starter-guides:naive-bayes

#run container
docker run -p 8888:8888 -it ghcr.io/thedatamine/starter-guides:naive-bayes

Need help implementing any of this code? Feel free to reach out to [email protected] and we can help!