Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    business using business intelligence
    How to Use a Competitive Intelligence Dashboard to Turn Market Data Into Smarter Marketing Decisions 
    9 Min Read
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Text Mining on Financial News
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Text Mining on Financial News
Uncategorized

Text Mining on Financial News

ThemosKalafatis
ThemosKalafatis
5 Min Read
SHARE

As discussed previously, an analyst should give specific attention to problem representation particularly when we are dealing with text data. We are going to discuss a way to do this. However, something has to give and there is no perfect solution for this task.

First of all we have to find the source of the news : It could be financial news sites such as Bloomberg, Financial Times, or RSS Feeds URLs such as the ones provided by MarketWatch. RSS …


As discussed previously, an analyst should give specific attention to problem representation particularly when we are dealing with text data. We are going to discuss a way to do this. However, something has to give and there is no perfect solution for this task.

First of all we have to find the source of the news : It could be financial news sites such as Bloomberg, Financial Times, or RSS Feeds URLs such as the ones provided by MarketWatch. RSS Feeds might be a better solution because there is already some predetermined categorization of news according to the feed type and this can be great help for some analysts.

After finding the news sources and making the necessary code to get the actual information we could end up with the following text file :


You can see that i use a ‘^’ separator to differentiate between :

1) A date stamp,
2) A date string
3) The news string
4) A characterization of the news (important or unimportant)
5) A categorization of the financial news.

This simple file could provide the basis for a training file for text categorization. Assuming that we have trained algorithms to automatically classify news, we could use a news classifier to first categorize news to important or unimportant and pass only the important news to a second classifier which will do the detailed classification of the news.

Another option is to use clustering : You can imagine that the solution detailed above has a tremendous amount of work depending on how much data you are planning to collect…so too much data means too much work, less data could mean -usually but not always- less accuracy.

But how could clustering be performed on such data? Simply, we just use field number (4) on our training text file to train a clustering algorithm and then see what ‘classes’ the algorithm has come up with.

So let’s see a small example about clustering : This is a capture from WEKA just before the clustering process :


As you can see i have produced a training file which essentially contains the ‘buzzwords’ of financial news : barrel, recession, Yen, Euro, ECB, price, consumer, etc. The file is then analyzed by K-means algorithm to extract clusters of the same ‘buzzwords’. Each cluster is assigned a number so each news header ultimately falls onto one cluster number.

After running the K-Means algorithm i ended up with 16 clusters. Let’s see two instances that K-Means decided that they should fall under cluster ‘6’ :

Instance_number : 130.0

Fear
Decrease
US
Economy
Futures

and

Instance_number : 174.0

Fear
Decrease
US
Price
Oil
Banking
Recession

So the first instance is about fear of drop in US Economy which results in Futures in US dropping and the second instance must be -something about- a decrease of Oil prices and Banking stocks because of the fear of US recession. Not bad at all…

But not so fast : Clustering presents a lot of problems later in the process. Remember that what we are after, is to combine text mining and data mining together to better understand how the markets react. Should one use classification or clustering? There are many more things to take under consideration and for obvious reasons i cannot disclose all the details of such a project…but i am hoping to give to the interested reader a good enough introduction on the subject.

Link to original post

More Read

Integrating a Content Plan with Demand Gen [VIDEO]
Sell Your Integrity for $0.65
Is the Cloud Secure Enough for the Financial Industry?
Are Unsubscribe Confirmation Emails CAN-SPAM Compliant?
4 Tips for Effective Change Management
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai product development
Why Businesses Outsource AI Product Development Companies
Exclusive News
banking tools
The Fintech and Banking Tools Global Entrepreneurs Rely On
Fintech Infographic
business using business intelligence
How to Use a Competitive Intelligence Dashboard to Turn Market Data Into Smarter Marketing Decisions 
Analytics Big Data Exclusive Marketing
fda14abd c869 4da5 943c c036ad8efc2e
How Data-Driven Journalists Are Using API News Apps to Improve Reporting
Big Data Exclusive News

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Behavioral Targeting – Where is the Fine Line?

1 Min Read

ESPC Sets Deadline to Require MD5 Hash Encryption

3 Min Read

Maps are Just Another Element

4 Min Read

Slumdog Millionaire & Email Marketing

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?