On Text Analytics vs Machine Translation

March 15, 2012
137 Views

 

I’ve made an interesting observation recently while talking to people about Thinkudo Enlighten. It regards the misunderstanding between Text Analytics and Machine (automated) Translation. More than once people’ve asked “How did you do the Chinese translation?” when I mentioned that Enlighten handles Sentiment Analysis in Chinese. So in this post, I’d like clarify the difference between them.

Each to Their Own

 

I’ve made an interesting observation recently while talking to people about Thinkudo Enlighten. It regards the misunderstanding between Text Analytics and Machine (automated) Translation. More than once people’ve asked “How did you do the Chinese translation?” when I mentioned that Enlighten handles Sentiment Analysis in Chinese. So in this post, I’d like clarify the difference between them.

Each to Their Own

First and foremost, Text Analytics and Machine Translation both fall under the field of Natural Langauge Processing (NLP). Whether or not Machine Translation should be a substudy of Text Analytics, I will leave it to the readers within academia to discuss. Personally, I would claim that Text Analytics covers topics which extract and normalize text into measurable data. These topics include topic extraction, word-cloud formation, text classification, and, of course, sentiment analysis. The normalized data can then be fed into other systems for analysis, visualization, and more.

Machine Translation, on the other hand, is a language-specific application of NLP techniques for a very human need. Instead of extracting information from the text, it transforms the text into another form. Granted, Machine Translation might utilize similar techniques as Text Analysis, for instance term-correlation, to achieve its goal. However, the problems they solve come from two separate directions.

Misunderstanding

The misunderstanding might have occured because most of the text analytics studies and results are geared toward the English language. This may lead to misinterpretation that English text is a requirement for Text Analytics problems. However, that is just not true. In fact, many of the theories and models proposed by English Text Analytics are applicable to other languages given modifications. To do so, domain knowledge of the targeted language is necessary to embed the grammar rules and text behaviors into the language model. Just as the n-gram study I’ve shared in my post on Chinese segmentation, with the appropriate preprocesing, the underlying statistical models can still be overserved and utilized for non-English languages. To us, most of the headaches are indeed within the text preprocessing, which may include segmentation, homograph, encoding, and other challenges.

 

 

Images extracted from Cross Validated

The two fields are solving foundamentally different problems, with Machine Translation having more direct and human-applicable use cases than Text Analytics. Going forward, they both have irreplacable values in understanding human communication and expression. However, we should not confuse them or combine them without understanding the implications. If you are interested in finding out how their fusion can go wrong, my previous post covers that topic.

 

Permalink | Leave a comment  »

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares3,066 views
Big Data
298 shares3,066 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
62 views
Data Management
62 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares672 views
Data Management
69 shares672 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…