Let's say you are given with a fruit which is yellow, sweet, and long and you have to check the class to which it belongs.Step 2: Draw the likelihood table for the features against the classes.
Step 3: Calculate the conditional probabilities for all the classes, i.e., the following in our example:



Step 4: Calculate [latex]\displaystyle\max_{i}{P(C_i|x_1, x_2,\ldots, x_n)}[/latex]. In our example, the maximum probability is for the class banana, therefore, the fruit which is long, sweet and yellow is a banana by Naive Bayes Algorithm.In a nutshell, we say that a new element will belong to the class which will have the maximum conditional probability described above.
The predefined function used for the implementation of Naive Bayes in R is called naiveBayes(). There are only a few parameters that are of use:
Name | Yellow | Sweet | Long | Total |
Mango | 350/800=P(Mango|Yellow) | 450/850 | 0/400 | 650/1200=P(Mango) |
Banana | 400/800 | 300/850 | 350/400 | 400/1200 |
Others | 50/800 | 100/850 | 50/400 | 150/1200 |
Total | 800=P(Yellow) | 850 | 400 | 1200 |



Step 4: Calculate [latex]\displaystyle\max_{i}{P(C_i|x_1, x_2,\ldots, x_n)}[/latex]. In our example, the maximum probability is for the class banana, therefore, the fruit which is long, sweet and yellow is a banana by Naive Bayes Algorithm.In a nutshell, we say that a new element will belong to the class which will have the maximum conditional probability described above.
Variations of the Naive Bayes algorithm
There are multiple variations of the Naive Bayes algorithm depending on the distribution of [latex]P(x_j|C_i)[/latex]. Three of the commonly used variations are- Gaussian: The Gaussian Naive Bayes algorithm assumes distribution of features to be Gaussian or normal, i.e.,
[latex]\displaystyle P(x_j|C_i)=\frac{1}{\sqrt{2\pi\sigma_{C_i}^2}}\exp{\left(-\frac{(x_j-\mu_{C_j})^2}{2\sigma_{C_i}^2}\right)}[/latex]
Read more about it here. - Multinomial: The Multinomial Naive Bayes algorithm is used when the data is distributed multinomially, i.e., multiple occurrences matter a lot. You can read more here.
- Bernoulli: The Bernoulli algorithm is used when the features in the data set are binary-valued. It is helpful in spam filtration and adult content detection techniques. For more details, click here.
Pros and Cons of Naive Bayes algorithm
Every coin has two sides. So does the Naive Bayes algorithm. It has advantages as well as disadvantages, and they are listed below:Pros
- It is a relatively easy algorithm to build and understand.
- It is faster to predict classes using this algorithm than many other classification algorithms.
- It can be easily trained using a small data set.
Cons
- If a given class and a feature have 0 frequency, then the conditional probability estimate for that category will come out as 0. This problem is known as the "Zero Conditional Probability Problem." This is a problem because it wipes out all the information in other probabilities too. There are several sample correction techniques to fix this problem such as "Laplacian Correction."
- Another disadvantage is the very strong assumption of independence class features that it makes. It is near to impossible to find such data sets in real life.
Naive Bayes with Python and R
Let us see how we can build the basic model using the Naive Bayes algorithm in R and in Python.R Code
To start training a Naive Bayes classifier in R, we need to load the e1071 package.library(e1071)To split the data set into training and test data we will use the caTools package.
library(caTools)
The predefined function used for the implementation of Naive Bayes in R is called naiveBayes(). There are only a few parameters that are of use:
naiveBayes(formula, data, laplace = 0, subset, na.action = na.pass)
- formula: The traditional formula [latex]Y\sim X_1+X_2+\ldots+X_n[/latex]
- data: The data frame containing numeric or factor variables
- laplace: Provides a smoothing effect
- subset: Helps in using only a selection subset of the data based on some Boolean filter
- na.action: Helps in determining what is to be done when a missing value in the data set is encountered
> library(e1071)
> library(caTools)
> data(iris)
> iris$spl=sample.split(iris,SplitRatio=0.7)
# By using the sample.split() we are creating a vector with values TRUE and FALSE and by setting
the SplitRatio to 0.7, we are splitting the original Iris dataset of 150 rows to 70% training
and 30% testing data.
> train=subset(iris, iris$spl==TRUE)#the subset of iris dataset for which spl==TRUE
> test=subset(iris, iris$spl==FALSE)
> nB_model <- naiveBayes(train[,1:4], train[,5])
> table(predict(nB_model, test[,-5]), test[,5]) #returns the confusion matrix
setosa versicolor virginica
setosa 17 0 0
versicolor 0 17 2
virginica 0 0 14
Python Code
We will use the Python library scikit-learn to build the Naive Bayes algorithm.>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.naive_bayes import MultinomialNB
>>> from sklearn import datasets
>>> from sklearn.metrics import confusion_matrix
>>> from sklearn.model_selection import train_test_split
>>> iris = datasets.load_iris()
>>> X = iris.data
>>> y = iris.target
# Split the data into a training set and a test set
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
>>> gnb = GaussianNB()
>>> mnb = MultinomialNB()
>>> y_pred_gnb = gnb.fit(X_train, y_train).predict(X_test)
>>> cnf_matrix_gnb = confusion_matrix(y_test, y_pred_gnb)
>>> print(cnf_matrix_gnb)
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
>>> y_pred_mnb = mnb.fit(X_train, y_train).predict(X_test)
>>> cnf_matrix_mnb = confusion_matrix(y_test, y_pred_mnb)
>>> print(cnf_matrix_mnb)
[[16 0 0]
[ 0 0 18]
[ 0 0 11]]
Applications
The Naive Bayes algorithm is used in multiple real-life scenarios such as- Text classification: It is used as a probabilistic learning method for text classification. The Naive Bayes classifier is one of the most successful known algorithms when it comes to the classification of text documents, i.e., whether a text document belongs to one or more categories (classes).
- Spam filtration: It is an example of text classification. This has become a popular mechanism to distinguish spam email from legitimate email. Several modern email services implement Bayesian spam filtering.
Many server-side email filters, such as DSPAM, SpamBayes, SpamAssassin, Bogofilter, and ASSP, use this technique. - Sentiment Analysis: It can be used to analyze the tone of tweets, comments, and reviews—whether they are negative, positive or neutral.
- Recommendation System: The Naive Bayes algorithm in combination with collaborative filtering is used to build hybrid recommendation systems which help in predicting if a user would like a given resource or not.