The sentiment analysis and
opinion mining are becoming an emerging technique that finds application in
different areas. The process depends on the Collection of data, Analysis of
data, and identification of variation in the data. In the present work, the
paper discussed the Twitter microblogging and the processes associated with
positive negative and irrelevant performance criteria’s. The main objective of
the paper was to identify the application and research precision and
applications along with the limitation faced by the data processing. Twitter facilitates
the users for the micro blogging services enable them to share the messages as
a tweet. In some previous researches, researchers identified that opinions
shared on Twitter by the users can be applied to resolve the real-life problems
[1].
The prime objective of the
present work was to analyze the effectiveness of these processes and how
Twitter is suitable to identify the solutions of problems and the process of
classifying sentiment in tweets. The approach used in the research depends upon
the areas and geographical location of the area selected in the research. The paper
is classified into different segments including previous work on the selection
process, techniques used in the in processing and preprocessing of the data, a
classification approach, results of the research, and proposed positive
directions of the research [1].
Problem statement of Sentiment analysis on the Twitter Data stream
As the technology and use of
the internet are increasing and microblogging is also increasing rapidly.
Extensive research is carried out in the previous result is for the
identification of sentiment expressions and determination of the impact of
these expressions on the tweets. Different approaches have been employed by
researchers to identify lexical terms, lexical resources, and trends of tweets.
Some of the researchers worked on bigram and unigram models indicate the
outcomes of data collected in the research. There are different types of syntax
used on the tweeter such as hashtag, explanation, punctuation, symbols,
emotions, and retweets. The present research measured the influence of opinions
and emotions on the tweets [1].
Data mining methodology of Sentiment analysis on the Twitter Data stream
The methodology applied by
the researcher is based on two disjoint databases. The whole data was collected
from Twitter and then label according to the sentiments in relation to the
query. The positive and negative classification was used for the collection of
data and it was based upon the expressions. The skewness in data was reduced [1]. The application programming interface was used to
deal with different domains of data. The collected data and tweets were
classified as neutral, irrelevant and polar based upon its nature. Special
concerns were carried out for the privacy issues of the public. The technique
of collecting data was accurate to deal with different type of data and huge
databases [1].
Data pre-processing of Sentiment analysis on the Twitter Data stream
The preprocessing process
extracted the data based on classification and information provided in the
sentiment analysis and microblogging. After the collection process, the next
process was to extracting the data in a series to provide a message string for
conversion. The preprocessing technique was based on the classification of the
quality and features of the collected data and performance of the research
increased. The whole process can be classified into different steps including
replacement of emotions, identification of upper cases, classification of law
cases, URL extraction, detection of hashtag and pointers, identification of
punctuation, compression of words, and then moving skewness from the data set.
The classification and sampling of data improve the performance of research and
enable to determine variation in the data and samples. Two different kinds of
variations in the sampling were identified including undersampling and oversampling.
The technique used synthetic minority over-sampling technique for the analysis
of skewness in the data and synthetic processing of data.
The evaluation of the
dataset is carried out by different processes including the experimental
methodology, building of trained data model, Naïve Bayes, random forest,
support to the vector machines (SVMs), algorithm of sequential mining
optimization, and J48 algorithms for training dataset [1].
Summary of results of Sentiment analysis on the Twitter Data stream
The graphical representation was used to Express the relation between
the performance of the algorithm and the classified data such as neutral,
polar, and irrelevant data. The surprising results suggested that widely used
algorithms failed to express the satisfactory performance. While on the other
hand not able methods including Naïve Bayes and Lazy IBK Express the result
with average accuracy. More than half that is 80% of the result accuracy was
made by Bayesian classifiers, random forest, and SMO. The maximum accuracy was
provided by SMO classifier. The tree-based J48 fail to express the consistency
in the results. Although SMOTE technique was employed to reduce the skewness
still the main issue encountered in the results was skewness of the data.
Sufficiently higher accuracy was required to reduce the variation in the data
and by resolving the skewness of data problem and issues faced during the
Analysis of data can be reduced and performance of analysis can be increased [1].
Critical analysis of Sentiment analysis on the Twitter Data stream
The main theme of the
research was to identify the impact of sentiments in the tweets. The whole
process was carried out by using different methods and techniques of data
analysis. The variation in the samples and skewness in the data was also measured.
The critical analysis of the research suggests that the effect of negative
tweets was not identified on sentiments of users.
The research encountered
different problems related to the sentiment classification and details of the
data. The preprocessing method for the raw twitter messages was explained
elaborately. The structural defects were also identified but still, research
provides insufficient information about the processes to reduce the data
skewness. The percentage accuracy of each method was in measured but the
process to identify a deviation in the results was that mentioned clearly.
Smote technique was employed to reduce the skewness of databases and to improve
the accuracy of results. The imbalance of data set introduces uncertainty in
the results, therefore, the appropriate technique was required to measure the
dimensions of dataset and how to overcome the issues faced due to skewness of
data. The research results were insufficient regarding the privacy concerns of
users. The larger proportion of Tweeter users relies on the privacy conditions
for tweets, but researcher was unable to define what sort of privacy techniques
are used by the Tweeter and how much people consider it as an enough condition.
The accessible data for the followers must be limited particularly about the
emotional state of user. The authors acknowledged the use of different
techniques and how they can be developed for the analysis of data but still
some important information remains undescribed. The extensive research is
related to the conditions and assumptions for the classified dataset in the
research but there is possibility to use blended approach of two principles.
For instance, the combination of SVM and Filtered classifier in J48 can be used
together to merge the method and to obtain more accuracy in the results. Besides
other facts, the research classified different international relations and
expressions of users but the impact of language on the emotional expressions is
neglected in the research.
Conclusion of Sentiment analysis on the Twitter Data stream
Based on the result it can
be concluded that the best method for the analysis of expressions in tweets was
filtered classifier. The impact of emotions and sentiments on that was strong
on the readers. The research proposed that accurate implementation of
classification and algorithms for the dataset can improve the accuracy of
results and reduce the skewness in the datasets. The research process
challenges of natural language processors therefore in the future context the
research can be extended to measure the impact of different languages on the
emotions and tweets.
References of Sentiment
analysis on the Twitter Data stream
[1]
|
B. Gokulakrishnan, P. Priyanthan, T.
Ragavan, N. Prasath and A. Perera, "Opinion Mining and Sentiment
Analysis on a Twitter Data Stream," The International Conference on
Advances in ICT for Emerging Regions, vol. 01, no. 01, pp. 182-188, 2012.
|