Know your customers - The Twitter way

The more i analyze tweets on Twitter, the more interesting i find the whole process. First it was clustering analysis of specific thoughts expressed from Twitter users and then it was Sentiment Mining for Amazon’s Kindle. It was just a matter of time from having the urge to analyze Tweets on a broader perspective.

So i decided to perform a segmentation of the Twitter users : extract common groups of users but this time not for specific thoughts or specific products but a segmentation based on a more generic basis.

I had two goals in this clustering analysis :

1) Cluster the biographies of users
2) Cluster the tweets of the users.

I then decided that the more information i could collect the better, so the first thing i did was to make a ‘spider’ program to extract 10,000 twitter user names. Then for each twitter user the software visits his/her page and extracts :

a) The user’s bio
b) Number of followers
c) Number of people following
d) Number of updates
e) 20 latest Tweets
f) Number of re-tweets
g) Number of replies to other users (ex when @user directive exists)

Let’s see now what we could -potentially- do with such information :

1) Clustering analysis on user bios

2) Clustering analysis on u…

I had two goals in this clustering analysis :

1) Cluster the biographies of users
2) Cluster the tweets of the users.

Let’s see now what we could -potentially- do with such information :

1) Clustering analysis on user bios

2) Clustering analysis on user tweets

3) Classification analysis for identifying the common characteristics of users with many followers

4) Associations discovery between products : Which products tend to be mentioned together in each user’s tweets?

5) Identification of common keywords per cluster : If we identify a cluster of users that we characterize as the “Parents”, what keywords do “Parents” tend to use more? What about the “Tech junkies” cluster?

But let’s start with the first analysis : Clustering the biographies of Twitterers. The analysis generated 30 clusters of users. Some of them are :

1) The Parents
2) The computer Geeks
3) The students
4) The social media addicts
5) The entrepreneurs

I looked at the “Parents” cluster more closely and wanted to find keywords that this cluster is associated with : Single and Jesus where some of them.

So we immediately identify one of the many customer groups : The parents, of which a significant percentage of them are single. The “Parents” cluster also expresses one of its values : Christianity.

By moving on to each generated cluster and finding the associated keywords, i was able to retrieve the values and beliefs of each cluster. Knowledge Extraction at its best…

Link to original post

You Might also Like

Predictive Analytics World Recap

Decision Management and some top CRM processes for a cost-constrained economy

Economic: Indian Caste System -Simplification

How to Access 100M Time Series in R in Under 60 Seconds