Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
    data analytics for trademark registration
    Optimizing Trademark Registration with Data Analytics
    6 Min Read
    data analytics for finding zip codes
    Unlocking Zip Code Insights with Data Analytics
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Text Mining and Pronouns
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Text Analytics > Text Mining and Pronouns
Text Analytics

Text Mining and Pronouns

mekkin
mekkin
5 Min Read
Image
SHARE

ImageIf there’s one piece of advice I can offer you, both for better text mining and better writing, it is this: please, please, please with a cherry on top, be clear with your pronouns.

ImageIf there’s one piece of advice I can offer you, both for better text mining and better writing, it is this: please, please, please with a cherry on top, be clear with your pronouns.

There’s nothing that makes me more sad than a lost, lonely pronoun separated from its antecedent, or the noun to which the pronoun is referring. The process of determining pronoun ownership, and thereby determining who or what is being spoken about in a particular phrase, is something we call anaphora resolution, and it’s something we’ve been working on for a long time. If we were to say “Jenny wanted to try something new, so she went to yoga class”, anaphora resolution would be identifying the pronoun “she” as referring to Jenny.

Text mining engines have varying degrees of success depending on the pronouns involved. Most of these engines have a general model used to look at two qualities to determine pronoun ownership:

More Read

big data structure and standards
Big Data Analytics Doesn’t Have to Be the Wild West
The Future of Advertising at a Glance
Digital Reasoning’s Synthesys
Stop Words for Social Media Analytics
What Is Natural Language Processing?

1. How far apart are the pronoun and referring noun?

2. Do the pronoun and the referring noun look alike?

Distance is a good way to track the referring entities. Good writers, like good pet owners, keep their pronouns on a short leash. If the pronoun is “she”, for example, in all likelihood, the person being referred to will be the last woman introduced by name, and will be in the previous sentence, or at least in the same paragraph. Ex. “Jenny met a guy in yoga class. She is going on a date tonight.” This is obviously easier to figure out if the woman in question has what is traditionally understood as a female name like Jenny, which brings us to our second strategy for anaphora resolution.

If pronouns and nouns “look alike”, they’ll share a certain quality that allows us to rule out other antecedents. As in the instance above, the pronoun “she” is most likely attached to a woman’s name. Ex. “Jenny and Dave went on a date, but she faked food poisoning to get out early.” In this instance, “she” is probably not referring to Dave, because Dave is a traditionally male name. 

The same principle holds true for nominal pronouns. Nominal pronouns are the kind of pronoun we employ when we write “the company” in an article when we are referring to, say, Google. We know which company is being referred to, because it is the topic of discussion, but we aren’t using the proper noun. In this case the look-alike strategy works very well. For example, Lexalytics’ text mining engine Salience knows Google is a company, and so can easily attach it to the nominal pronoun “the company”. 

Salience also looks for other qualities to ensure the best anaphora resolution possible. For instance, it looks for quotation marks in conjunction with the use of the pronoun “I”, so that it doesn’t confuse the author with someone else who is being quoted. For example, “‘I’ll have to find a new yoga class,’ says Jenny”. The “I” is in quotations so it belongs to Jenny, the speaker. However, when I use “I” without quotations, the pronoun belongs to me, Mekkin, the author.

But it’s not all fun and games. Not when it comes to the unfathomable “it”. With its lack of defining characteristics, it can cause text mining mayham. For that reason, most text mining engines choose to ignore it altogether.

That’s all for today. Please remember to be a responsible writer: spay and neuter your expletives, and keep your pronouns close to home.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

accountant using ai
AI Improves Integrity in Corporate Accounting
Exclusive
ai and law enforcement
Forensic AI Technology is Doing Wonders for Law Enforcement
Artificial Intelligence Exclusive
langgraph and genai
LangGraph Orchestrator Agents: Streamlining AI Workflow Automation
Artificial Intelligence Exclusive
ai fitness app
Will AI Replace Personal Trainers? A Data-Driven Look at the Future of Fitness Careers
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Amazon's recommendation box, displaying customer ratings.
Big DataMarketingText Analytics

Enhancing Personalization Through Explicit User Feedback

10 Min Read

Information Availability: Exploiting the Full Value of Information to Drive Business

5 Min Read

Case Study: Using Social Media and Text Analytics to Improve the Neiman Marcus Customer Experience

1 Min Read

Attensity Introduces Series of On-Demand Text Analytics Webinars

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?