On Text Analytics vs Machine Translation

March 15, 2012
41 Views

 

I’ve made an interesting observation recently while talking to people about Thinkudo Enlighten. It regards the misunderstanding between Text Analytics and Machine (automated) Translation. More than once people’ve asked “How did you do the Chinese translation?” when I mentioned that Enlighten handles Sentiment Analysis in Chinese. So in this post, I’d like clarify the difference between them.

Each to Their Own

 

I’ve made an interesting observation recently while talking to people about Thinkudo Enlighten. It regards the misunderstanding between Text Analytics and Machine (automated) Translation. More than once people’ve asked “How did you do the Chinese translation?” when I mentioned that Enlighten handles Sentiment Analysis in Chinese. So in this post, I’d like clarify the difference between them.

Each to Their Own

First and foremost, Text Analytics and Machine Translation both fall under the field of Natural Langauge Processing (NLP). Whether or not Machine Translation should be a substudy of Text Analytics, I will leave it to the readers within academia to discuss. Personally, I would claim that Text Analytics covers topics which extract and normalize text into measurable data. These topics include topic extraction, word-cloud formation, text classification, and, of course, sentiment analysis. The normalized data can then be fed into other systems for analysis, visualization, and more.

Machine Translation, on the other hand, is a language-specific application of NLP techniques for a very human need. Instead of extracting information from the text, it transforms the text into another form. Granted, Machine Translation might utilize similar techniques as Text Analysis, for instance term-correlation, to achieve its goal. However, the problems they solve come from two separate directions.

Misunderstanding

The misunderstanding might have occured because most of the text analytics studies and results are geared toward the English language. This may lead to misinterpretation that English text is a requirement for Text Analytics problems. However, that is just not true. In fact, many of the theories and models proposed by English Text Analytics are applicable to other languages given modifications. To do so, domain knowledge of the targeted language is necessary to embed the grammar rules and text behaviors into the language model. Just as the n-gram study I’ve shared in my post on Chinese segmentation, with the appropriate preprocesing, the underlying statistical models can still be overserved and utilized for non-English languages. To us, most of the headaches are indeed within the text preprocessing, which may include segmentation, homograph, encoding, and other challenges.

 

 

Images extracted from Cross Validated

The two fields are solving foundamentally different problems, with Machine Translation having more direct and human-applicable use cases than Text Analytics. Going forward, they both have irreplacable values in understanding human communication and expression. However, we should not confuse them or combine them without understanding the implications. If you are interested in finding out how their fusion can go wrong, my previous post covers that topic.

 

Permalink | Leave a comment  »

You may be interested

Big Data is the Key to the Future of Multi-Device Marketing
Big Data
0 shares220 views
Big Data
0 shares220 views

Big Data is the Key to the Future of Multi-Device Marketing

Ryan Kh - May 26, 2017

Digital marketers must reach customers across multiple devices. According to Criteo Mobile eCommerce Report, 40% of all online transactions involve…

Empowering Partners and Customers with Data Insights: A Win-Win for Everyone
Analytics
0 shares273 views
Analytics
0 shares273 views

Empowering Partners and Customers with Data Insights: A Win-Win for Everyone

Guy Greenberg - May 26, 2017

All businesses in the digital age rely on analytics for various activities: Product managers rely on analytics to gain insights…

The State of US Cyber Security
IT
0 shares312 views1
IT
0 shares312 views1

The State of US Cyber Security

bcornell - May 25, 2017

During the first week of May 2017 President Donald Trump signed a cyber security executive order focusing on upgrading government…