Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Text Mining and Pronouns
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Text Analytics > Text Mining and Pronouns
Text Analytics

Text Mining and Pronouns

mekkin
mekkin
5 Min Read
Image
SHARE

ImageIf there’s one piece of advice I can offer you, both for better text mining and better writing, it is this: please, please, please with a cherry on top, be clear with your pronouns.

ImageIf there’s one piece of advice I can offer you, both for better text mining and better writing, it is this: please, please, please with a cherry on top, be clear with your pronouns.

There’s nothing that makes me more sad than a lost, lonely pronoun separated from its antecedent, or the noun to which the pronoun is referring. The process of determining pronoun ownership, and thereby determining who or what is being spoken about in a particular phrase, is something we call anaphora resolution, and it’s something we’ve been working on for a long time. If we were to say “Jenny wanted to try something new, so she went to yoga class”, anaphora resolution would be identifying the pronoun “she” as referring to Jenny.

Text mining engines have varying degrees of success depending on the pronouns involved. Most of these engines have a general model used to look at two qualities to determine pronoun ownership:

More Read

Why Large Enterprises and EDW Owners Suddenly Care About Big Data
What Really Is Big Data? And Why It Will Change the World
The Year of Text Analytics
New Insights from Text Analytics
Osama Bin Laden Letters Analyzed

1. How far apart are the pronoun and referring noun?

2. Do the pronoun and the referring noun look alike?

Distance is a good way to track the referring entities. Good writers, like good pet owners, keep their pronouns on a short leash. If the pronoun is “she”, for example, in all likelihood, the person being referred to will be the last woman introduced by name, and will be in the previous sentence, or at least in the same paragraph. Ex. “Jenny met a guy in yoga class. She is going on a date tonight.” This is obviously easier to figure out if the woman in question has what is traditionally understood as a female name like Jenny, which brings us to our second strategy for anaphora resolution.

If pronouns and nouns “look alike”, they’ll share a certain quality that allows us to rule out other antecedents. As in the instance above, the pronoun “she” is most likely attached to a woman’s name. Ex. “Jenny and Dave went on a date, but she faked food poisoning to get out early.” In this instance, “she” is probably not referring to Dave, because Dave is a traditionally male name. 

The same principle holds true for nominal pronouns. Nominal pronouns are the kind of pronoun we employ when we write “the company” in an article when we are referring to, say, Google. We know which company is being referred to, because it is the topic of discussion, but we aren’t using the proper noun. In this case the look-alike strategy works very well. For example, Lexalytics’ text mining engine Salience knows Google is a company, and so can easily attach it to the nominal pronoun “the company”. 

Salience also looks for other qualities to ensure the best anaphora resolution possible. For instance, it looks for quotation marks in conjunction with the use of the pronoun “I”, so that it doesn’t confuse the author with someone else who is being quoted. For example, “‘I’ll have to find a new yoga class,’ says Jenny”. The “I” is in quotations so it belongs to Jenny, the speaker. However, when I use “I” without quotations, the pronoun belongs to me, Mekkin, the author.

But it’s not all fun and games. Not when it comes to the unfathomable “it”. With its lack of defining characteristics, it can cause text mining mayham. For that reason, most text mining engines choose to ignore it altogether.

That’s all for today. Please remember to be a responsible writer: spay and neuter your expletives, and keep your pronouns close to home.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (60)
How Finance & BI Teams Choose Accounting Software
Big Data Business Intelligence Exclusive
Why the AI Race Is Being Decided at the Dataset Level
Why the AI Race Is Being Decided at the Dataset Level
Artificial Intelligence Big Data Exclusive
image fx (60)
Data Analytics Driving the Modern E-commerce Warehouse
Analytics Big Data Exclusive
ai for building crypto banks
Building Your Own Crypto Bank with AI
Blockchain Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Teradata’s “Multiple paths to social media”

2 Min Read

Tackling Human Intelligence

3 Min Read

2011 Sentiment Symposium Summary

6 Min Read

Find Value in Online/Social Text and Sentiment: Free Report, Conferences

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?