On Text Analytics vs Machine Translation

March 15, 2012
91 Views

 

I’ve made an interesting observation recently while talking to people about Thinkudo Enlighten. It regards the misunderstanding between Text Analytics and Machine (automated) Translation. More than once people’ve asked “How did you do the Chinese translation?” when I mentioned that Enlighten handles Sentiment Analysis in Chinese. So in this post, I’d like clarify the difference between them.

Each to Their Own

 

I’ve made an interesting observation recently while talking to people about Thinkudo Enlighten. It regards the misunderstanding between Text Analytics and Machine (automated) Translation. More than once people’ve asked “How did you do the Chinese translation?” when I mentioned that Enlighten handles Sentiment Analysis in Chinese. So in this post, I’d like clarify the difference between them.

Each to Their Own

First and foremost, Text Analytics and Machine Translation both fall under the field of Natural Langauge Processing (NLP). Whether or not Machine Translation should be a substudy of Text Analytics, I will leave it to the readers within academia to discuss. Personally, I would claim that Text Analytics covers topics which extract and normalize text into measurable data. These topics include topic extraction, word-cloud formation, text classification, and, of course, sentiment analysis. The normalized data can then be fed into other systems for analysis, visualization, and more.

Machine Translation, on the other hand, is a language-specific application of NLP techniques for a very human need. Instead of extracting information from the text, it transforms the text into another form. Granted, Machine Translation might utilize similar techniques as Text Analysis, for instance term-correlation, to achieve its goal. However, the problems they solve come from two separate directions.

Misunderstanding

The misunderstanding might have occured because most of the text analytics studies and results are geared toward the English language. This may lead to misinterpretation that English text is a requirement for Text Analytics problems. However, that is just not true. In fact, many of the theories and models proposed by English Text Analytics are applicable to other languages given modifications. To do so, domain knowledge of the targeted language is necessary to embed the grammar rules and text behaviors into the language model. Just as the n-gram study I’ve shared in my post on Chinese segmentation, with the appropriate preprocesing, the underlying statistical models can still be overserved and utilized for non-English languages. To us, most of the headaches are indeed within the text preprocessing, which may include segmentation, homograph, encoding, and other challenges.

 

 

Images extracted from Cross Validated

The two fields are solving foundamentally different problems, with Machine Translation having more direct and human-applicable use cases than Text Analytics. Going forward, they both have irreplacable values in understanding human communication and expression. However, we should not confuse them or combine them without understanding the implications. If you are interested in finding out how their fusion can go wrong, my previous post covers that topic.

 

Permalink | Leave a comment  »

You may be interested

IEEE Big Data Conference 2017 to Highlight Challenges, Opportunities
Big Data
65 shares892 views
Big Data
65 shares892 views

IEEE Big Data Conference 2017 to Highlight Challenges, Opportunities

Ryan Kade - June 23, 2017

Since 2013, the Institute of Electrical and Electronics Engineers has held annual big data conferences to highlight changes and opportunities…

10 of the Top Marketing BI Software Options
Business Intelligence
117 shares1,319 views
Business Intelligence
117 shares1,319 views

10 of the Top Marketing BI Software Options

Hayden B. - June 23, 2017

Business can be complicated sometimes. It’s not always easy to keep track of all the data and information we deal…

The Race for 5G Is the Race for Data Dominance
Big Data
80 shares1,068 views
Big Data
80 shares1,068 views

The Race for 5G Is the Race for Data Dominance

Daniel Matthews - June 22, 2017

Have you noticed how often the phrase “by the year 2020” comes up? In the tech sphere, many are heralding…