Twitter Analytics: Cluster Analysis reveals similar Twitter Users

May 26, 2009
212 Views
So far we have seen various examples of using analytics to gain insights from Twitter. Using cluster analysis is a personal favorite: It enables us to identify common groups of users and in this post we are going to look at a segmentation based on user biography keywords. This analysis was also presented in an older post but some readers asked me to elaborate a bit more on this type of analysis.
Biography information allows us to segment Twitter users in groups of similar interests, professions and qualities. What is more interesting however is that we can identify the words that each segment appears to be associated with. Let’s see an example of words that tend to co-exist with the phrase “social media” in the Biographies of Twitter users :

By looking at the column named “social_media” we see some associated keywords like: addiction (synonym for addicted, junkie etc), evangelist, enthusiast, analytics, etc.

Other groups found and their associated words were…

So far we have seen various examples of using analytics to gain insights from Twitter. Using cluster analysis is a personal favorite: It enables us to identify common groups of users and in this post we are going to look at a segmentation based on user biography keywords. This analysis was also presented in an older post but some readers asked me to elaborate a bit more on this type of analysis.
Biography information allows us to segment Twitter users in groups of similar interests, professions and qualities. What is more interesting however is that we can identify the words that each segment appears to be associated with. Let’s see an example of words that tend to co-exist with the phrase “social media” in the Biographies of Twitter users :

By looking at the column named “social_media” we see some associated keywords like: addiction (synonym for addicted, junkie etc), evangelist, enthusiast, analytics, etc.

Other groups found and their associated words were:

The Geeks : Developer, Linux, Mac, gaming, photography
The Parents : married, boys, girls, christian,conservative
The business owners : CEO, entrepreneur, marketing, founder, lifestyle

Note that “The Geeks” have Mac as an associated keyword, which of course refers to Apple Macintosh: An example suggesting a possible strong bond between a brand and a specific customer segment.

Now imagine running a similar analysis for other segments such as Single Dads and Mothers, Teenage Girls, Nice Guys, IT Developers, VIPs or any other “segment” you prefer (see this entry -posted Jan. 2009- for more)

On a personal Note: Having used Text Mining on Twitter over the past 6 months I realized that whenever a new cycle of analysis is made I come up most of the time with things that I already know. But apart from expected results some of the fine details of people’s lives also appear such as the implications of a life-changing event, the joy of owning something new or the plain fact of “watching TV and feeling bored”. Many of the insights found during these months – although not discussed here on purpose – are highly thought-provoking.

Perhaps Twitter Analytics could also give us some possible clues on:

  • Whether a specific profession could be a risk factor for being single.
  • How important is fashion for girls.
  • How mobile phone user requirements change according to the “segment” they belong to.
  • Finding individuals that do not fit any “segment”.

But the list of potential applications does not end here: Using a technique called Association Rule Learning (or Association Discovery) we can extract emotions or thoughts that appear to co-exist and also emotions that seem to be associated with specific events. Classification Analysis can also play an important part (more on these techniques soon).

Each technique looks at the Social Media Data world from a different perspective. The usage behavior, cluster membership, the emotions and thoughts and also the Tweets that users seem to prefer most (using data from sites such as repeets.com) may be combined. What we can potentially achieve from a combined analysis of this kind will be discussed in later posts.

As already stated in previous posts: The use of the methods described so far enables us to form hypotheses but in no way it is assumed that associations found are the definite cause of a specific event.

Link to original post