Twitter Analytics : Which usage behavior attracts many followers?

May 5, 2009
76 Views
This is the first part of a series of posts where Data Mining and Text Mining will be applied to extract potentially useful facts about the usage of Twitter and to draw some conclusions such as what makes a Twitter account interesting enough to other users.

The conclusions that will be presented here are from the analysis of 3651 Twitter accounts and are meant to show how Predictive Analytics can help. Please note that results are shown for informational purposes only.

First, the data used can be summarized with the following table :

You can immediately see problems in the ranges of the data used especially on the number of “followers” and “following”. This is something to be expected since among the users captured were Jack Dorsey (founder of Twitter), Sen. McCain and George Stephanopoulos – users that obviously have a huge amount of followers.

Before finding which usage behavior attracts many followers, one should be able to identify what exactly is a “popular twitter account”. Is it just the absolute number of followers? Perhaps it could be equally important -or at least interesting- to also look at :

1) The followers/following ratio

2) The number of followers per day

For our exa

This is the first part of a series of posts where Data Mining and Text Mining will be applied to extract potentially useful facts about the usage of Twitter and to draw some conclusions such as what makes a Twitter account interesting enough to other users.

The conclusions that will be presented here are from the analysis of 3651 Twitter accounts and are meant to show how Predictive Analytics can help. Please note that results are shown for informational purposes only.

First, the data used can be summarized with the following table :

You can immediately see problems in the ranges of the data used especially on the number of “followers” and “following”. This is something to be expected since among the users captured were Jack Dorsey (founder of Twitter), Sen. McCain and George Stephanopoulos – users that obviously have a huge amount of followers.

Before finding which usage behavior attracts many followers, one should be able to identify what exactly is a “popular twitter account”. Is it just the absolute number of followers? Perhaps it could be equally important -or at least interesting- to also look at :

1) The followers/following ratio

2) The number of followers per day

For our example the absolute number of followers was used as the only criterion of a successful Twitter account. The results can be summarized with the following decision tree :

Some usage patterns that raise the chance of having a successful Twitter account are the following :

  • Having a bio is an absolute must : 82.3% of unsuccessful Twitter accounts have their biography information missing.

  • You should provide more than 3 links per 20 tweets and also more than 0.960 updates per day

  • If you don’t want to provide more than 3 links per 20 tweets, then try to post more than 5.857 updates per day.

  • Users that post more than 3 links per 20 tweets but post less than or equal to 0.960 updates per day, will need more than 222.5 days of usage to get an adequate amount of followers.

By using Feature Selection we are able to look also at the relevant importance of each parameter on achieving many followers : Here are the results of Feature Selection from using ChiSquare, GainRatio and InfoGain attribute evaluators.

=== Attribute selection 10 fold cross-validation (stratified), seed: 1 ===

average merit average rank attribute
362.743 +-10.419 1 +- 0 4 numberOfLinks
319.397 +-10.133 2.4 +- 0.49 6 hasBlankProfile?
311.661 +- 8.612 2.6 +- 0.49 7 updatesPerDay
192.525 +- 7.481 4.1 +- 0.3 3 retweetsNumber
178.236 +- 5.963 4.9 +- 0.3 1 elapsedDays
36.148 +- 3.579 6 +- 0 2 otherUsersTalk
17.843 +- 4.475 7 +- 0 5 questionsAsked

average merit average rank attribute
0.1 +- 0.003 1 +- 0 6 hasBlankProfile?
0.042 +- 0.001 2.4 +- 0.49 4 numberOfLinks
0.039 +- 0.002 3.2 +- 0.6 3 retweetsNumber
0.04 +- 0.004 3.4 +- 0.92 7 updatesPerDay
0.025 +- 0.001 5 +- 0 1 elapsedDays
0.011 +- 0.001 6 +- 0 2 otherUsersTalk
0.005 +- 0.001 7 +- 0 5 questionsAsked

average merit average rank attribute
0.082 +- 0.002 1 +- 0 4 numberOfLinks
0.074 +- 0.003 2.1 +- 0.3 6 hasBlankProfile?
0.071 +- 0.002 2.9 +- 0.3 7 updatesPerDay
0.044 +- 0.002 4.1 +- 0.3 3 retweetsNumber
0.041 +- 0.001 4.9 +- 0.3 1 elapsedDays
0.008 +- 0.001 6 +- 0 2 otherUsersTalk
0.004 +- 0.001 7 +- 0 5 questionsAsked

We see that all three attribute evaluators agree that the number of links provided on Tweets and whether the profile of the user is filled in are the two most important parameters in achieving many followers. Notice also that sending messages to other users (otherUsersTalk) and asking questions (questionsAsked) is not as important as one would expect.

The analysis shown above gives many insights but it does not take into account what the users say and how this affects the popularity of a Twitter account. Text Mining will try to give some answers for this question and also identify which keywords on Twitter profiles seem to be associated with many followers.

Link to original post