By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data science anayst
    Growing Demand for Data Science & Data Analyst Roles
    6 Min Read
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Key Words Through Graph Entropy Hierarchical Clustering
Share
Notification Show More
Latest News
ai in automotive industry
AI Is Changing the Automotive Industry Forever
Artificial Intelligence
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Text Analytics > Key Words Through Graph Entropy Hierarchical Clustering
Text Analytics

Key Words Through Graph Entropy Hierarchical Clustering

cristian mesiano
Last updated: 2012/10/24 at 6:49 PM
cristian mesiano
4 Min Read
SHARE

In the last post I showed how to extract key words from a text through a principle called graph entropy.
Today I’m going to show another application of the graph entropy in order to extract clusters of key words.

Why
The key words of a document depict the main topic of the content, but if the document is big, often, there are many different sub topics related to the main.

In this perspective, a clusters of keywords should make easier for the reader the identification of the key points of a document.

In the last post I showed how to extract key words from a text through a principle called graph entropy.
Today I’m going to show another application of the graph entropy in order to extract clusters of key words.

More Read

text analytics

Seven Benefits of Using AI to Perform Text Analysis

5 Applications for Corporate Text Analytics
An Introduction To Hands-On Text Analytics In Python
Predicting Airline Loyalty Churn – Cathay Pacific Marco Polo [Case Study]
How to Be a Text Analytics Rock Star in your Organization

Why
The key words of a document depict the main topic of the content, but if the document is big, often, there are many different sub topics related to the main.

In this perspective, a clusters of keywords should make easier for the reader the identification of the key points of a document.

Moreover, imagine to implement a search engine based on clusters of relevant words instead of the common indexing of atomic words: it enables documents comparison, taxonomies definition, and much more!

How
The definition of graph entropy I’m studying on, assigns to each word of the document a relevance score and a sub graph of words topologically closed to it.

The clustering should maximize the relevance score obtained merging two words in the same cluster.

It’s easy to understand that we have to face a combinatoric maximization problem.

The idea is to take advantage of the Simulated annealing (a bit revisited and adapted to the scope) in order to identify sub-optimal merging solution at each step of the merging phase of the hierarchical clustering.

Experiment
I decided to adopt as document test the complete version of the file we used in the last post: Nuclear_weapon.
Here you are the clusters of first 100 relevant words extracted:

The three clusters obtained.
 

It’s interesting to highlight the following considerations:

  • The first cluster merged together words as “material,uranium, plutonium, isotope” and “war, attack, arm“, and also “proliferation, movement, control, development“.
  • The second cluster (which has the lowest rank) aggregates words as “japan, japanese, place, israel, iraq,american“, and “ton, tnt, yeld”  
  • The third cluster (which has the highest rank) describes quite well the primary topic, merging all the most important words of the document! 

Of course, the procedure is still in “incubator” phase, and the accuracy of the clusters rests on the performance of the Annealing clustering (…maybe different algorithms in this context perform better… but just to show a rough solution I guess it’s enough :D)

This is the optimization process for the last merging stage (I presume that temperature schedule requires an adjustment):

Optimization curve through Simulated Annealing Hierarchical Clustering (last merging stage)


Next steps:
Looking forward to receive comments, and suggestions.
…It would be interesting using such methodology to create a new kind of full text search engine, totally independent by frequency of the words and frequency of visits.

The doc
here you are the document parsed and colored through the clustering assignment (have been highlighted just the first 100 relevant features ranked through the Graph Entropy method).
Stay tuned
cristian.


cristian mesiano October 24, 2012
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai in automotive industry
AI Is Changing the Automotive Industry Forever
Artificial Intelligence
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

text analytics
Text Analytics

Seven Benefits of Using AI to Perform Text Analysis

9 Min Read
corporate text analytics
Text Analytics

5 Applications for Corporate Text Analytics

7 Min Read
hands on text analytics tutorial
AnalyticsExclusiveText Analytics

An Introduction To Hands-On Text Analytics In Python

7 Min Read
airline loyalty
AnalyticsBig DataSocial DataSocial mediaSocial Media AnalyticsText Analytics

Predicting Airline Loyalty Churn – Cathay Pacific Marco Polo [Case Study]

15 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?