Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Text Mining on Financial News
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Text Mining on Financial News
Uncategorized

Text Mining on Financial News

ThemosKalafatis
ThemosKalafatis
5 Min Read
SHARE

As discussed previously, an analyst should give specific attention to problem representation particularly when we are dealing with text data. We are going to discuss a way to do this. However, something has to give and there is no perfect solution for this task.

First of all we have to find the source of the news : It could be financial news sites such as Bloomberg, Financial Times, or RSS Feeds URLs such as the ones provided by MarketWatch. RSS …


As discussed previously, an analyst should give specific attention to problem representation particularly when we are dealing with text data. We are going to discuss a way to do this. However, something has to give and there is no perfect solution for this task.

First of all we have to find the source of the news : It could be financial news sites such as Bloomberg, Financial Times, or RSS Feeds URLs such as the ones provided by MarketWatch. RSS Feeds might be a better solution because there is already some predetermined categorization of news according to the feed type and this can be great help for some analysts.

After finding the news sources and making the necessary code to get the actual information we could end up with the following text file :


You can see that i use a ‘^’ separator to differentiate between :

1) A date stamp,
2) A date string
3) The news string
4) A characterization of the news (important or unimportant)
5) A categorization of the financial news.

This simple file could provide the basis for a training file for text categorization. Assuming that we have trained algorithms to automatically classify news, we could use a news classifier to first categorize news to important or unimportant and pass only the important news to a second classifier which will do the detailed classification of the news.

Another option is to use clustering : You can imagine that the solution detailed above has a tremendous amount of work depending on how much data you are planning to collect…so too much data means too much work, less data could mean -usually but not always- less accuracy.

But how could clustering be performed on such data? Simply, we just use field number (4) on our training text file to train a clustering algorithm and then see what ‘classes’ the algorithm has come up with.

So let’s see a small example about clustering : This is a capture from WEKA just before the clustering process :


As you can see i have produced a training file which essentially contains the ‘buzzwords’ of financial news : barrel, recession, Yen, Euro, ECB, price, consumer, etc. The file is then analyzed by K-means algorithm to extract clusters of the same ‘buzzwords’. Each cluster is assigned a number so each news header ultimately falls onto one cluster number.

After running the K-Means algorithm i ended up with 16 clusters. Let’s see two instances that K-Means decided that they should fall under cluster ‘6’ :

Instance_number : 130.0

Fear
Decrease
US
Economy
Futures

and

Instance_number : 174.0

Fear
Decrease
US
Price
Oil
Banking
Recession

So the first instance is about fear of drop in US Economy which results in Futures in US dropping and the second instance must be -something about- a decrease of Oil prices and Banking stocks because of the fear of US recession. Not bad at all…

But not so fast : Clustering presents a lot of problems later in the process. Remember that what we are after, is to combine text mining and data mining together to better understand how the markets react. Should one use classification or clustering? There are many more things to take under consideration and for obvious reasons i cannot disclose all the details of such a project…but i am hoping to give to the interested reader a good enough introduction on the subject.

Link to original post

More Read

Image
Big Data: A Brief(ish) History Everyone Should Read
Clean Your Data Like You Clean Your Undies
It’s not a full-duplex world
Vivek Kundra: The Alpha CTO
Carbon Footprints (Across your Inbox)
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics for pharmacy trends
How Data Analytics Is Tracking Trends in the Pharmacy Industry
Analytics Big Data Exclusive
ai call centers
Using Generative AI Call Center Solutions to Improve Agent Productivity
Artificial Intelligence Exclusive
warehousing in the age of big data
Top Challenges Of Product Warehousing In The Age Of Big Data
Big Data Exclusive
car expense data analytics
Data Analytics for Smarter Vehicle Expense Management
Analytics Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

March Haiku

0 Min Read

If I Told You a Fractal Solution, Could You Change the CEO’s Mind?

7 Min Read

AT&T Misses the Point on iPad

4 Min Read

Worthy Data Quality Whitepapers (Part 1)

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?