The future and trends of Text Analytics

I recently attended a GATE seminar on the University of Sheffield. Having used GATE for quite some time now, i was happy to see that the GATE team is well committed to developing the GATE Text Analysis Workbench by constantly adding more functionality.

Although many of the participants were PhD students i was also happy to see people from companies that now wish to leverage the hidden knowledge that exists in unstructured text. Whether it was analysis on text of Patents information, intelligent search on Text of Photo Captions for a large News Agency or understanding what a customer wants, Text Analytics are becoming an important tool for making better decisions.

I also had the opportunity to speak with several people about the future of Text Analytics. What are we likely to see happening in the next years on Information Extraction and Text Analytics?

First we have to understand how Text Analytics deliver results. In order for a computer to ‘understand’ unstructured text, it should be ‘taught’ that the word ‘Dollar’ is a currency of a country that is called ‘US’ and also that US, United States, USA and U.S.A is the same concept. This means that hundreds of thousands of concepts and synonyms have to be specified so that a computer identifies them in unstructured text. This process is called Text Annotation.

The Golden Standard of Text Annotation is annotations done by humans : A computer sifts through the text of a web page, annotates it with concepts and then these annotations are checked against annotations made by humans on the same text to assess the accuracy with which a computer ‘understands’ this text and the concepts and entities that exist in it.

So what does the future hold? First of all, since unstructured text becomes more available there will be a greater need for ‘annotation farms’ : Groups of people who will be manually annotating free text, identifying an ever-growing number of Companies, Managers, Politician names, or anything else that has to be ‘taught’ to a computer. Note that Annotation Farms already exist but the need for this service will become greater.

The second trend on Text Analytics could be something equivalent to what we have seen happening with NetFlix. Suppose that you own a company that produces Brand ‘X’ and you wish to track the reputation of your product online. You would then submit a sample of your product’s mentions to various companies that analyze text and have them compete against each other in terms of -for example- Precision and Recall. The one that produces consistently the best metrics (whether Precision – Recall, Kappa statistic or F-Measure) will also get the job.

A third trend could be the development of text analytics for specific concepts : Sentiment Analysis and Named Entity recognition is hard work if one wants to produce sound and accurate results. So it could be probable that Text Analytics experts will choose a specific concept -For example reputation of Banks- and then work in the analysis of this -very specific- concept so that they achieve better metrics.

Link to original post