Final Presentation

Final Presentation

Presentation

Paper 7 Q and A

Welcome to the CSCE 6350 Paper # 7

Questions and Answers Posted Here

Paper 7 Presentation

Proposal

Project Proposal

Presentation

Paper Summaries

Paper Summaries

Presentation

1. Bermingham, Adam and Smeaton, Alan F. (2010) Crowdsourced real-world sensing: sentiment analysis and the real-time web. In: AICS 2010 - Sentiment Analysis Workshop at Artificial Intelligence and Cognitive Science, 30 August - 1 September 2010, Galway, Ireland.

Bermingham and Smeaton discuss the recent prevalence of the user generate content from various sources such as reviews of products and services, blogs/micro-blogs, social networks, forums, and wikis. Much of this new data gets released to the web in “real time” meaning it is visible nearly instantaneously. Another interesting principle of the data is that much of it is mostly generated as a reaction to a real world event. Much of what makes this data interesting is what makes it difficult to study because it appears in real time with out a schedule from a variety of sources. There is also very little know about the authors themselves.

The researchers discuss the background of sentiment analysis as well as other efforts to mine the data for new information. They also discuss the motivation behind the ability to analyses the data which is the possibly the ability to determine mood, option, or feelings from the general public. This information could be used in a variety of ways from public polling, advertising, as well as discovering newly developing stories more quickly.

The challenges of the data has much to do with its dynamic nature, large number of authors, and noise. The researchers discuss different approaches to clearing out the noise, but they make it very clear that, “sentiment analysis in real-time poses a significant challenge for research methodologies.” They conclude by stating that while the user-generated data is exciting and has a high upside of potential, we are not where we need to be to effectively analysis the data quite yet.

2.Bollen, Johan, Alberto Pepe, and Huina Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. (2009): n. pag. Web. 7 Feb 2011. http://arxiv.org/abs/0911.1583 .

Bollen, Pepe, and Mao introduce the reader to the concept of micro-blogging and more specifically Twitter. They explain that tweets are 140 character or less text updates that are generally used to share information and/or convey an idea, but many of these tweets contain information about the authors mood. The researchers analysis over 9 million publicly broadcasted tweets from August 1 to December 20, 2008 which get preprocessed and feed into a Profiling system.

The Profile of Mood States (POMS) which is a well established psychometric instrument. It is works based on looking for six dimensions of mood (Tension, Depression, Anger, Vigour, Fatigue, and Confusion). POMS works by looking for key terms that relate to each one of the moods listed, but the researchers extended the search terms to make it more applicable to this data. The researchers only view tweets that express feelings in some form of, “I am feeling,” and they strip out non-alphanumeric and white space characters. This parsing process leaves them with around 1.1 million tweets to view.

The researchers goal is to see what kind of changes in mood are recognized with large world events such as the 2008 olympics, presidential election, and thanksgiving. They also compared the moods changes against the closing value of the Dow Jones. They are able to recognized short term spikes amounts of vigour around the time of thanksgiving as well as higher levels of depression and confusion on November 3 (before the elections). They were also able to recognize long term trends in depression, anger, and tension that consisted with the worst recession since world war 2. They concluded that tweets provide an interesting source to investigate public mood and emotive trends.

3.Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. 2010. Emerging topic detection on Twitter based on temporal and social terms evaluation. In Proceedings of the Tenth International Workshop on Multimedia Data Mining (MDMKDD '10). ACM, New York, NY, USA, , Article 4 , 10 pages. DOI=10.1145/1814245.1814249 http://doi.acm.org/10.1145/1814245.1814249

Cataldi, Di Caro, and Schifanella use data acquired from Twitter, but view the data in a different light then those who are interested in capturing the mood of the public. They view Twitter's primary role as a vehicle to discover emerging topics of interest from the public. The researchers feel that Twitters major advantage over traditional new outlets is the speed of which an emerging topic can be discovered.

For topic discovery they use a five step process of extracting terms from tweets that have formalized and weighting their values. Building a directed graph of authors using the Page-Rank algorithm. Giving each term a life cycle where its value can drop and fall relative to its recent usage. Selecting emerging terms depending on their life cycle status. Finally, creating a topic graph that links extracted emerging terms with their relative co-occurrent terms in order to obtain a set of emerging topics. This allows emerging events to bubble up from much of the noise that is seen in all of the tweets.

The researchers also noted that topics that emerge tend to be regional at first, and if they grow larger they become more a global emerging topic. An easy example of this would be the Iran elections where much of the youth was tweeting about it. They also noted that topics such as 'morning' can appear an emerging topic because many people tend to use that term in the morning. They noted that topics such as that can emerge because they follow a natural order of things. The researchers also noted that topics emerging should be supervised to help filter out topics or terms that are not actually emerging. They concluded that this method does indeed help discover emerging topics from the noise that is Twitter.

4.Bollen, Johan, Huina Mao, and Xiao-Jun Zeng. "Twitter mood predicts the stock marke." Journal of Computational Science, 2011 n. pag. Web. 7 Feb 2011. http://arxiv.org/abs/1010.3003v1

5. Krisztian Balog, Gilad Mishne, and Maarten de Rijke. 2006. Why are they excited?: identifying and explaining spikes in blog mood levels. In Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters \&\#38; Demonstrations (EACL '06). Association for Computational Linguistics, Stroudsburg, PA, USA, 207-210