Comments Blog About Development Research Sites

Online Reputation Management
How we monitored Blizzards' Diablo III launch

May 18, 2012
Online Reputation Management can roughly be described as the process of collecting and analyzing online opinions. A plethora of sources are available for these opinions: blogs, personal homepages, fora, and in the case of our application, Twitter. With a team of students, I worked these past two months on an application we dubbed Twitter-Sentiments. The source can be found on Google Code. This also includes a build, report and posters. A short summary:

Methods
There four distinct parts of the application:

  • Gather relevant tweets.

  • Analyze their sentiment with respect to a search query.

  • Determine their respective impacts.

  • Output the aggregated result in a meaningful manner.


Of course, each of these steps is wrought with its own particular challenges, as you may expect. This is how we encountered and solved these from a developers perspective:

Gathering relevant tweets
Twitter offers a search API which can find all tweets related to a given keyword. We limited the search (and therefore our application) to English tweets. Since natural language processing is language specific, there was no use in searching globally. To interface with the API, we used Twitter4J, a JAVA library capable of parsing our search requests and returning complete Tweet objects. Of course, we extended these Tweet objects to LocalTweet objects which offered additional functionality for us, such as storing a list of sentiment scores and a pagerank. It would also parse the text from 4-byte UTF8 to 2-byte UTF8, since untill quite recently, MySQL was incapable of storing 4-byte UTF8.

All tweets are stored in and retrieved from a local database as well. This increases speed, prevents hitting the rate-limit (you can only do 350 twitter-requests per hour) and allows us to go further back than 8 days (the retention limit of Twitter).


Performing Sentiment Analysis
An accurate sentiment assessment is difficult to come by. Our main method was to simply count sentiment words in the tweet. Words like 'great', 'good', 'bad', 'terrible' all cary a certain sentimental value: some are positive, some negative, all to varying degrees. Using a lexicon of almost 12.000 value-laden words, we are able to find some sentiment words in about half the tweets we encounter.

To further increase this number, we perform preprocessing steps: words are normalized, sentences are split (using the Lucene Tokenizer), and smilies are extracted. Next, we apply modifiers to these sentiment words: "not good" has the opposite meaning of "good", so "not" flips the value of the consecutive token. Similarly, words like "very" increase the value and words like "hardly" diminish it. A list of modifiers is thus parsed as well.

Lastly, the distance from these sentiment carying phrases to our search target is calculated. If the first word in a tweet is the one we searched for, and the last word, two sentences away is positive, it will probably not have much bearing on our search word. However, if the sentiment word directly precedes our search query (e.g. "I love Blizzard") it is probably very strongly related to it. A normal distribution over word distance determines the effect of each sentiment word in the tweet on the search query.

This has as a happy side-effect that we can assign different sentiment scores to different queries, in the same tweet. Consider for example the sentence: "I love Blizzard, but I really hate Diablo". If "love" has a value of +1 and "hate" has a value of -1, then "Blizzard" will get a positive sentiment score, while "Diablo" will get a negative score - just as the sentence indicates.


Determining a Tweets' impact
There are several ways to determine how much impact a tweet has: the number of followers, retweets and mentions all say something about how many people are reading it, and how much they are influenced by it. Unfortunately, obtaining this information requires several requests per tweet. With a 350-requests-per-hour limit, that presents us with a bit of a problem: we would hit the rate limit after a single search! To solve this, only the top ten most sentimental tweets are analyzed, and the rest are all assigned an average impact score.

Displaying the results
Once the tweets are analyzed, we show the user two graphs (generated through the Google Charts API). One displays the breakdown of tweets: how many are positive, how many are negative, and how many are neutral? This allows the user to see at a glance what the overall sentiment is. The second graph is more informative and shows the aggregated sentiment progress over time: it starts neutral, and each positive tweet increases the sentiment while each negative tweet decreases it.

An example of this timeline:



This shows the search result for "Blizzard". We had some early data from testing, about a week ago. As we can see, the results for this early period are neutral to positive. Then with Diablo III's launch, sentiment scores start to drop: during the first few hours after launch, Diablo's servers crashed, went offline, and the addition of certain bugs caused a lot of anger (culminating in this comic). Several hours later, these problems are fixed, and we can see a steady incline in sentiment scores again - cool, right!

Full report available here.

FragFrog out!

New comment

Your name:
Comment: