Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Text Mining and Pronouns
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Text Analytics > Text Mining and Pronouns
Text Analytics

Text Mining and Pronouns

mekkin
mekkin
5 Min Read
Image
SHARE

ImageIf there’s one piece of advice I can offer you, both for better text mining and better writing, it is this: please, please, please with a cherry on top, be clear with your pronouns.

ImageIf there’s one piece of advice I can offer you, both for better text mining and better writing, it is this: please, please, please with a cherry on top, be clear with your pronouns.

There’s nothing that makes me more sad than a lost, lonely pronoun separated from its antecedent, or the noun to which the pronoun is referring. The process of determining pronoun ownership, and thereby determining who or what is being spoken about in a particular phrase, is something we call anaphora resolution, and it’s something we’ve been working on for a long time. If we were to say “Jenny wanted to try something new, so she went to yoga class”, anaphora resolution would be identifying the pronoun “she” as referring to Jenny.

Text mining engines have varying degrees of success depending on the pronouns involved. Most of these engines have a general model used to look at two qualities to determine pronoun ownership:

More Read

The Big Question In Big Data Is…What’s The Question?
Data Variety: What It’s All About
The Future of Advertising at a Glance
No Data, No Problem: My Lean Six Sigma Data Collection Secrets
Spike in the Social Conversation? Is your Social Customer Trying to Tell You Something?

1. How far apart are the pronoun and referring noun?

2. Do the pronoun and the referring noun look alike?

Distance is a good way to track the referring entities. Good writers, like good pet owners, keep their pronouns on a short leash. If the pronoun is “she”, for example, in all likelihood, the person being referred to will be the last woman introduced by name, and will be in the previous sentence, or at least in the same paragraph. Ex. “Jenny met a guy in yoga class. She is going on a date tonight.” This is obviously easier to figure out if the woman in question has what is traditionally understood as a female name like Jenny, which brings us to our second strategy for anaphora resolution.

If pronouns and nouns “look alike”, they’ll share a certain quality that allows us to rule out other antecedents. As in the instance above, the pronoun “she” is most likely attached to a woman’s name. Ex. “Jenny and Dave went on a date, but she faked food poisoning to get out early.” In this instance, “she” is probably not referring to Dave, because Dave is a traditionally male name. 

The same principle holds true for nominal pronouns. Nominal pronouns are the kind of pronoun we employ when we write “the company” in an article when we are referring to, say, Google. We know which company is being referred to, because it is the topic of discussion, but we aren’t using the proper noun. In this case the look-alike strategy works very well. For example, Lexalytics’ text mining engine Salience knows Google is a company, and so can easily attach it to the nominal pronoun “the company”. 

Salience also looks for other qualities to ensure the best anaphora resolution possible. For instance, it looks for quotation marks in conjunction with the use of the pronoun “I”, so that it doesn’t confuse the author with someone else who is being quoted. For example, “‘I’ll have to find a new yoga class,’ says Jenny”. The “I” is in quotations so it belongs to Jenny, the speaker. However, when I use “I” without quotations, the pronoun belongs to me, Mekkin, the author.

But it’s not all fun and games. Not when it comes to the unfathomable “it”. With its lack of defining characteristics, it can cause text mining mayham. For that reason, most text mining engines choose to ignore it altogether.

That’s all for today. Please remember to be a responsible writer: spay and neuter your expletives, and keep your pronouns close to home.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

street address database
Why Data-Driven Companies Rely on Accurate Street Address Databases
Big Data Exclusive
predictive analytics risk management
How Predictive Analytics Is Redefining Risk Management Across Industries
Analytics Exclusive Predictive Analytics
data analytics and gold trading
Data Analytics and the New Era of Gold Trading
Analytics Big Data Exclusive
student learning AI
Advanced Degrees Still Matter in an AI-Driven Job Market
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Image
Best PracticesModelingPolicy and GovernancePredictive AnalyticsSentiment AnalyticsText Analytics

Using Data for K-12 Education

5 Min Read

Apple Products on Twitter – A Text Analytics Example

2 Min Read

The Hastings Effect: Netflix Big Data Approach Is Transforming Education

2 Min Read
Image
CommentaryJobsText Analytics

Top 5 Data Science Masters Programs

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?