Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Text Mining on Financial News
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Text Mining on Financial News
Uncategorized

Text Mining on Financial News

ThemosKalafatis
ThemosKalafatis
5 Min Read
SHARE

As discussed previously, an analyst should give specific attention to problem representation particularly when we are dealing with text data. We are going to discuss a way to do this. However, something has to give and there is no perfect solution for this task.

First of all we have to find the source of the news : It could be financial news sites such as Bloomberg, Financial Times, or RSS Feeds URLs such as the ones provided by MarketWatch. RSS …


As discussed previously, an analyst should give specific attention to problem representation particularly when we are dealing with text data. We are going to discuss a way to do this. However, something has to give and there is no perfect solution for this task.

First of all we have to find the source of the news : It could be financial news sites such as Bloomberg, Financial Times, or RSS Feeds URLs such as the ones provided by MarketWatch. RSS Feeds might be a better solution because there is already some predetermined categorization of news according to the feed type and this can be great help for some analysts.

After finding the news sources and making the necessary code to get the actual information we could end up with the following text file :


You can see that i use a ‘^’ separator to differentiate between :

1) A date stamp,
2) A date string
3) The news string
4) A characterization of the news (important or unimportant)
5) A categorization of the financial news.

This simple file could provide the basis for a training file for text categorization. Assuming that we have trained algorithms to automatically classify news, we could use a news classifier to first categorize news to important or unimportant and pass only the important news to a second classifier which will do the detailed classification of the news.

Another option is to use clustering : You can imagine that the solution detailed above has a tremendous amount of work depending on how much data you are planning to collect…so too much data means too much work, less data could mean -usually but not always- less accuracy.

But how could clustering be performed on such data? Simply, we just use field number (4) on our training text file to train a clustering algorithm and then see what ‘classes’ the algorithm has come up with.

So let’s see a small example about clustering : This is a capture from WEKA just before the clustering process :


As you can see i have produced a training file which essentially contains the ‘buzzwords’ of financial news : barrel, recession, Yen, Euro, ECB, price, consumer, etc. The file is then analyzed by K-means algorithm to extract clusters of the same ‘buzzwords’. Each cluster is assigned a number so each news header ultimately falls onto one cluster number.

After running the K-Means algorithm i ended up with 16 clusters. Let’s see two instances that K-Means decided that they should fall under cluster ‘6’ :

Instance_number : 130.0

Fear
Decrease
US
Economy
Futures

and

Instance_number : 174.0

Fear
Decrease
US
Price
Oil
Banking
Recession

So the first instance is about fear of drop in US Economy which results in Futures in US dropping and the second instance must be -something about- a decrease of Oil prices and Banking stocks because of the fear of US recession. Not bad at all…

But not so fast : Clustering presents a lot of problems later in the process. Remember that what we are after, is to combine text mining and data mining together to better understand how the markets react. Should one use classification or clustering? There are many more things to take under consideration and for obvious reasons i cannot disclose all the details of such a project…but i am hoping to give to the interested reader a good enough introduction on the subject.

Link to original post

More Read

Tax Notice
Email Marketing Continues to Thrive, Say the Stats
Guerrilla Social Media Marketing: Features and Benefits List
On A Smarter Planet … Some Organizations Will Be Smarter-er Than Others
Forrester Releases Report on Impact of Snowden Revelations
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

street address database
Why Data-Driven Companies Rely on Accurate Street Address Databases
Big Data Exclusive
predictive analytics risk management
How Predictive Analytics Is Redefining Risk Management Across Industries
Analytics Exclusive Predictive Analytics
data analytics and gold trading
Data Analytics and the New Era of Gold Trading
Analytics Big Data Exclusive
student learning AI
Advanced Degrees Still Matter in an AI-Driven Job Market
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Investing in Data Center Efficiencies: Part Two

6 Min Read

Criminals Leverage Apple Pay for Fraud: Banks Boost Authentication Security

6 Min Read

The WOW Factor – Day One at PARTNERS 2008

3 Min Read

Hunch Has Launched

1 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?