Know your customers – The Twitter way

February 17, 2009
39 Views
The more i analyze tweets on Twitter, the more interesting i find the whole process. First it was clustering analysis of specific thoughts expressed from Twitter users and then it was Sentiment Mining for Amazon’s Kindle. It was just a matter of time from having the urge to analyze Tweets on a broader perspective.

So i decided to perform a segmentation of the Twitter users : extract common groups of users but this time not for specific thoughts or specific products but a segmentation based on a more generic basis.

I had two goals in this clustering analysis :

1) Cluster the biographies of users
2) Cluster the tweets of the users.

I then decided that the more information i could collect the better, so the first thing i did was to make a ‘spider’ program to extract 10,000 twitter user names. Then for each twitter user the software visits his/her page and extracts :

a) The user’s bio
b) Number of followers
c) Number of people following
d) Number of updates
e) 20 latest Tweets
f) Number of re-tweets
g) Number of replies to other users (ex when @user directive exists)

Let’s see now what we could -potentially- do with such information :

1) Clustering analysis on user bios

2) Clustering analysis on u

The more i analyze tweets on Twitter, the more interesting i find the whole process. First it was clustering analysis of specific thoughts expressed from Twitter users and then it was Sentiment Mining for Amazon’s Kindle. It was just a matter of time from having the urge to analyze Tweets on a broader perspective.

So i decided to perform a segmentation of the Twitter users : extract common groups of users but this time not for specific thoughts or specific products but a segmentation based on a more generic basis.

I had two goals in this clustering analysis :

1) Cluster the biographies of users
2) Cluster the tweets of the users.

I then decided that the more information i could collect the better, so the first thing i did was to make a ‘spider’ program to extract 10,000 twitter user names. Then for each twitter user the software visits his/her page and extracts :

a) The user’s bio
b) Number of followers
c) Number of people following
d) Number of updates
e) 20 latest Tweets
f) Number of re-tweets
g) Number of replies to other users (ex when @user directive exists)

Let’s see now what we could -potentially- do with such information :

1) Clustering analysis on user bios

2) Clustering analysis on user tweets

3) Classification analysis for identifying the common characteristics of users with many followers

4) Associations discovery between products : Which products tend to be mentioned together in each user’s tweets?

5) Identification of common keywords per cluster : If we identify a cluster of users that we characterize as the “Parents”, what keywords do “Parents” tend to use more? What about the “Tech junkies” cluster?

But let’s start with the first analysis : Clustering the biographies of Twitterers. The analysis generated 30 clusters of users. Some of them are :

1) The Parents
2) The computer Geeks
3) The students
4) The social media addicts
5) The entrepreneurs

I looked at the “Parents” cluster more closely and wanted to find keywords that this cluster is associated with : Single and Jesus where some of them.

So we immediately identify one of the many customer groups : The parents, of which a significant percentage of them are single. The “Parents” cluster also expresses one of its values : Christianity.

By moving on to each generated cluster and finding the associated keywords, i was able to retrieve the values and beliefs of each cluster. Knowledge Extraction at its best…

Link to original post

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares2,906 views
Big Data
298 shares2,906 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
40 views
Data Management
40 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares622 views
Data Management
69 shares622 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…