Twitter Analytics : These words may be affecting your popularity

May 12, 2009
54 Views

Text Mining techniques can be used to identify specific words that are correlated with Twitter accounts having high or low popularity. This can be done in two ways : (1) By analyzing the text of the Tweets of each user and (2) By analyzing the text of the biography of each user.

Let’s start with the results of the first type of analysis with data originating from user Tweets. Pay attention only to cells that are highlighted in red, their corresponding category column (LOWFOLLOWERS , HIGHFOLLOWERS) and the word at the beginning of each corresponding row. Results show which words appear to be important especially because the affinity shown here is moderate. Use results as possible clues only.

The results so far show us that :

hate, bed : are found to be correlated with low popularity
top, online, send, list,web,media, join : with high popularity

Here is another portion of the results table :

The pattern should be evident by now : Words of negative attitude appear to be influencing a user’s follower count negatively. As also shown above, foul language appears to work negatively also. Several other insights were found such as the existence of specific phrases that are correlated with .


Text Mining techniques can be used to identify specific words that are correlated with Twitter accounts having high or low popularity. This can be done in two ways : (1) By analyzing the text of the Tweets of each user and (2) By analyzing the text of the biography of each user.

Let’s start with the results of the first type of analysis with data originating from user Tweets. Pay attention only to cells that are highlighted in red, their corresponding category column (LOWFOLLOWERS , HIGHFOLLOWERS) and the word at the beginning of each corresponding row. Results show which words appear to be important especially because the affinity shown here is moderate. Use results as possible clues only.

The results so far show us that :

hate, bed : are found to be correlated with low popularity
top, online, send, list,web,media, join : with high popularity

Here is another portion of the results table :

The pattern should be evident by now : Words of negative attitude appear to be influencing a user’s follower count negatively. As also shown above, foul language appears to work negatively also. Several other insights were found such as the existence of specific phrases that are correlated with low popularity (“watching TV”) while other phrases (“stay tuned” ) with popular accounts. The number shown in parentheses quantifies the magnitude of the association that each word has and thus enables us to order words by their importance.

Some of the words -and their synonyms- that were found to be associated with very low follower counts are :

– Sleep, Hate, Damn, Feeling, Homework, Class, Boring, Stuck


A total of 63 words and 25 phrases were found having either a positive or negative association with the followers count. Interestingly, specific phrases that communicate any kind of opportunity are also associated with high number of followers. “Thank you” is highly related with a user’s large popularity.

Here comes the interesting part : Once the Text Mining analysis is completed, a predictive model can be generated that may be used for scoring future Tweets. Let’s assume that you are about to send the following 2 Tweets :

1) ‘Today i feel like sleeping all day. Yawn…’
2) ‘@xyz Your website traffic can be increased with good marketing’

Before you post however, you decide to feed these 2 sentences to a predictive model. The predictive model returns for every Tweet the predicted result (GOOD or BAD) and the associated probability. Here are the results for these 2 examples from an actual run :

In other words :

1) The first Tweet may have a negative effect with a probability of 83.5%
2) The second Tweet may have a positive effect with probability 99.9%

Note that :

  • A predictive model is able to consider combination of words, not just single words. This raises considerably the accuracy of any prediction.
  • In any real world application of Text Mining a 100% prediction accuracy cannot be achieved: Although application-specific, a 72-78% accuracy may be achieved – with considerable effort. Of course many more things are important to achieve high popularity and the example above is given merely to discuss what techniques currently exist. A combination of analytical techniques is the best option and this will be discussed in a future post.

Several other types of analysis can extract similarly interesting insights : Let’s not forget that Twitter Tweets contain the emotions, beliefs and values of users. They contain what people want and what they don’t want. See Clustering the thoughts of Twitter Users and Know your customers the Twitter way for a further discussion on this.

There will be more to say about Text Mining and how it can be put to use by PR Agencies and Marketing companies with practical examples shortly.

Link to original post