Dr. O. Aly
Computer Science
Introduction
The purpose of this discussion is to provide examples of how Bayesian analysis can be used in the context of social media. The discussion also summarizes the study which used Bayesian analysis in the context of social media. The discussion begins with Naïve Bayes Classifiers for Mining, followed by the Importance of Social Media, the Social Media Mining, Social Media Mining Techniques, Social Media Mining Process, Data Modelling Step, Twitter Mining Using Naïve Bayes with R. The discussion ends with additional examples of the Bayesian Analysis methods in Social Media.
Naïve Bayes Classifiers for Mining
Naïve Bayes classifiers are probabilistic classifiers, built using the Bayes Theorem (Kumar & Paul, 2016). Naïve Bayes is also known as prior probability and a class conditional probability classifier, since it uses the prior probability of each feature and generates a posterior probability distribution over all the features (Kumar & Paul, 2016).
The Naïve Bayes classifier makes the following assumptions about the data (Kumar & Paul, 2016):
- All the features in the dataset are independent of each other.
- All the features are important.
Though these assumptions may not be accurate in a real-world scenario, Naïve Bayes is still in many applications for text classification such as (Kumar & Paul, 2016):
- Spam filtering for email applications
- Social media mining, such as finding the sentiments in a given text.
- Computer network security applications.
As indicated by (Kumar & Paul, 2016), the classifier has various strength such as:
- Naïve Bayes classifiers are highly scalable and need less computational cycles when compared with other advanced and sophisticated classifiers.
- A vast number of features can be taken into consideration.
- The Naïve Bayes classifiers work well when there is missing data, and the dimensionality of the inputs is high.
- The Naïve Bayes need only small amounts of training sets.
The Importance of Social Media
The traditional media such as radio, newspaper, or television facilitates one-way communication with a limited scope of reach and usability. Although the audience can interact with channels such as radio, the quality, and frequency of such communications are limited (Ravindran & Garg, 2015). On the other hand, the Internet-based social media offers multi-way communication with features such as immediacy and permanence (Ravindran & Garg, 2015).
Social media is an approach to communication using online tools such as Twitter, Facebook, LinkedIn, and so forth (Ravindran & Garg, 2015). Social Media is defined by Andreas Kaplan and Michael Haenlein as cited in (Ravindran & Garg, 2015) as follows: “A group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 and that allow the creation and exchange of user-generated content.”
Social media spans various Internet-based platforms which facilitate human emotions such as:
- Networking, such as Facebook, LinkedIn.
- Microblogging, such as Twitter, Tumblr.
- Photo sharing, such as Instagram, Flickr.
- Video sharing, such as YouTube, Vimeo.
- Stack exchanging, such as Stack Overflow, Github.
- Instant messaging, such as Whatsapp, Hike.
The marketing industry is maturing in understanding the promise or the impact of social media (Ravindran & Garg, 2015). While social media is regarded to be a great tool for banner advertisement regarding cost and reach, it can turn out to be more influential in the long term (Ravindran & Garg, 2015). Organizations need to find out about the opinions of consumers by mining social networks (Ravindran & Garg, 2015). They can understand the current and potential outlook of consumers by collecting information on their opinions, and such informative information can guide a business decision, in the long run, influencing the fate of any business (Ravindran & Garg, 2015).
Social Media Mining
The social media mining is a systematic analysis of information generated from social media. The set of tools and techniques which are used to mine such information are collectively called Data Mining techniques and in the context of social media; Social Media Mining (SMM) (Ravindran & Garg, 2015). There has been much research in multiple disciplines of social media such as modeling behavior, predictive analysis, and recommending content (Ravindran & Garg, 2015).
Social Media Mining (SMM)Techniques
Graph mining is one technique of the SMM. Graph mining is described as “the process of extracting useful knowledge (patterns, outliers and so on), from a social relationship between the community members can be represented as a graph” (Ravindran & Garg, 2015). The most influential example of Graph Mining is Facebook Graph Search (Ravindran & Garg, 2015). The Text Mining is another SMM technique, which includes extraction of meaning from unstructured text data presented in social media. The primary targets of this type of mining are blogs and microblogs such as Twitter (Ravindran & Garg, 2015).
Social Media Mining Process
The process of the social media mining include the following five steps (Ravindran & Garg, 2015):
- Getting authentication from the social website.
- Data Visualization.
- Cleaning and pre-processing.
- Data modeling using standard algorithms such as opinion mining, clustering, anomaly/spam detection, correlations and segmentation, recommendations.
- Result Visualization.
Data Modelling Step
This step is number four in the process of social media mining which includes the application of mining algorithms (Ravindran & Garg, 2015). Standard mining algorithms include the opinion mining or sentiment mining where the opinion/sentiment present in the given phrase is assessed (Ravindran & Garg, 2015). Although the classification of sentiments is not a simple technique, various classification algorithms have been employed to aid opinion mining. This algorithm varies from simple probabilistic classifiers such as Naïve Bayes which assumes that all features are independent and does not use any prior information, to the more advanced classifiers such as Maximum Entropy, which uses the prior information to a certain extent (Ravindran & Garg, 2015). Other classifiers include Support Vector Machine (SVM), and Neural Networks NN) which have been used to correctly classify the sentiments (Ravindran & Garg, 2015). Additional methods include Anomaly/spam detection or social spammer detection (Ravindran & Garg, 2015). Fake profiles created with malicious intentions are known as spam or anomalies profiles (Ravindran & Garg, 2015).
Twitter Mining Using Naïve Bayes with R
In this application, after getting the cleaned Twitter data, R packages are used to assess the sentiments in the tweets. The first step is to obtain the authorization using the following two packages (r-project.org, 2018; Ravindran & Garg, 2015):
- getTwitterOAuth(consumer_key, consumer_secret)
- registerTwitterOAuth(OAuth)
- source(“authenticate.R”)
Collect tweets as a corpus, using searchTwitter() function in R. After obtaining the cleaned Twitter data; few R packages are used to assess the sentiments in the tweets. The first step is to use a Naïve algorithm which gives a score based on the number of times a positive or negative word occurred in the given sentence in this example (Ravindran & Garg, 2015). To estimate the sentiment further, Naïve Bayes is used in deciding on the emotion present in any tweet. The Naïve Bayes method requires R packages called Rstem and sentiment to assist with this assessment. These packages are now removed from R repository. However, they can still be downloaded from the archive at https://cran.r-project.org/src/contrib/Archive/. Additional R packages include classify_emotion(). Example of the result is illustrated in Figure 1, when applying the Naïve Bayes method using R (Ravindran & Garg, 2015).

Figure 1. Example of the Result When Applying Naïve Bayes Method in R (Ravindran & Garg, 2015).
In summary, the Naïve Bayes method can be used in social media such as Twitter example. Certain R packages must be installed to be able to complete the Naïve Bayes analysis on twitter dataset.
Additional Examples of Bayesian Analysis methods in Social Media
In another application of Naïve Bayes method on social media reported by (Singh, Singh, & Singh, 2017). Naïve Bayes classifier is a popular supervised classifier, provides a way to express positive, negative and neutral feelings in the web text. It utilizes conditional probability to classify words into their respective categories (Singh et al., 2017). The benefit of using Naïve Bayes on text classification is that it needs a small dataset for training (Singh et al., 2017). The raw data from web undergoes pre-processing, removal of numeric, foreign words, HTML tags, and special symbols yielding the set of words (Singh et al., 2017). The tagging of words with labels of positive, negative and neutral tags is manually performed by human experts (Singh et al., 2017). This pre-processing produces word-category pairs for training set (Singh et al., 2017).
The work of (Singh et al., 2017) focused on four Text Classifiers utilized for sentiment analysis: Naïve Bayes, J48, BFTree, and OneR algorithms. Naïve Bayes was found to be quite fast in learning whereas OneR method was found more promising in generating the accuracy of 91.3% in precision, 97% in F-measure and 92.34% in correctly classified instances (Singh et al., 2017).
Another example of the application of Bayesian Analysis is also reported in the research paper of (Volkova & Van Durme, 2015). The work of (Volkova & Van Durme, 2015) proposed two approaches in mining streaming social media. They studied iterative, incremental retraining in batch and current settings with and without iterative annotations. They treated each new message as independent evidence which is combined into an incremental user-prediction model applying Bayes Rule and explored model training in parallel with its application, rather than assuming a previously existing labeled dataset. The applied Bayesian rule updates to dynamically revise posterior probability estimates of the attribute value in question (Volkova & Van Durme, 2015).
References
Kumar, A., & Paul, A. (2016). Mastering Text Mining with R: Packt Publishing Ltd.
r-project.org. (2018). Pakcage ‘twitterR’. Retrieved from https://cran.r-project.org/web/packages/twitteR/twitteR.pdf.
Ravindran, S. K., & Garg, V. (2015). Mastering social media mining with R: Packt Publishing Ltd.
Singh, J., Singh, G., & Singh, R. (2017). Optimization of sentiment analysis using machine learning classifiers. Human-centric Computing and Information Sciences, 7(1), 32.
Volkova, S., & Van Durme, B. (2015). Online Bayesian Models for Personal Analytics in Social Media.
There are a few journals written on the use of the Bayesian method for social media analysis.
Below are the links for a couple of those journals for better insight:
Pachinko Prediction: A Bayesian method for event prediction from social media data: https://arxiv.org/pdf/1809.08427.pdf
A Baysian Approach for Predicting the Popularity of Tweets: https://projecteuclid.org/journals/annals-of-applied-statistics/volume-8/issue-3/A-Bayesian-approach-for-predicting-the-popularity-of-tweets/10.1214/14-AOAS741.pdf
LikeLike