Twitter Analytics: Words that make a difference

April 28, 2009
67 Views
Predictive Analytics are already widely used on Twitter to extract potentially interesting insights. On previous posts we discussed:

  • Sentiment Analysis and Ontologies
  • Analyzing the biographies of Twitter users and identifying clusters of similar users.
  • Cluster Analysis on the thoughts of Twitter users
  • Identifying the values and beliefs of Twitter users.

One additional interesting insight is the knowledge of what makes a Twitter user having many followers. Consider the following questions :

  • Are there words that could potentially decrease the popularity of a Twitter account?
  • How important is to have an actual photo (and not the default o_O photo)?
  • Which interests or professions tend to be associated with many followers?
  • How important is to have at least a small text of biography information?
To answer these questions, data from 100,000 Twitter users were collected over the past few weeks. Information collected includes the number of followers, number of friends, total updates, number of Retweets (per 20 tweets), number of replies to other users, number of links to external URLs, number of months that the user is on Twitter, etc. Here is how the data looks like :

You will notice that the separ

Predictive Analytics are already widely used on Twitter to extract potentially interesting insights. On previous posts we discussed:

  • Sentiment Analysis and Ontologies
  • Analyzing the biographies of Twitter users and identifying clusters of similar users.
  • Cluster Analysis on the thoughts of Twitter users
  • Identifying the values and beliefs of Twitter users.

One additional interesting insight is the knowledge of what makes a Twitter user having many followers. Consider the following questions :

  • Are there words that could potentially decrease the popularity of a Twitter account?
  • How important is to have an actual photo (and not the default o_O photo)?
  • Which interests or professions tend to be associated with many followers?
  • How important is to have at least a small text of biography information?
To answer these questions, data from 100,000 Twitter users were collected over the past few weeks. Information collected includes the number of followers, number of friends, total updates, number of Retweets (per 20 tweets), number of replies to other users, number of links to external URLs, number of months that the user is on Twitter, etc. Here is how the data looks like :

You will notice that the separator tilde ‘^’ is used. The first portion of each line contains the user name, date of account creation, months elapsed since account creation, number of friends,number of re-tweets, etc.

The first analysis that was performed was to identify whether specific keywords that exist on user biographies seem to be associated with a large number of followers. A second type of analysis was performed only with numeric data (such as number of re-tweets, number of user replies, number of updates,etc). Then a third type of analysis uses both a vector of keywords plus numerical data. Since a lot of work is needed, the process (but not all results) will be presented during the next posts.

FYI : Users that tend to use a lot the words “boredom”, “boring” or “bored” tend to minimize their chances of being popular.

Link to original post