Foreign languages and data streams

June 30, 2010
51 Views

When you listen to foreign language you know, it’s like you’re hard of hearing. I was thinking about that yesterday. I was sitting here in my livingroom, a video camera pointed at me, and an iPhone taped to a pillow off camera to my right. I was participating in a Spanish-language show called Oppenheimer Presenta, discussing Numerati themes with others from Mexico, Colombia and Argentina. The sound quality was iffy, at best. While I kept my eyes glued to the camera, I struggled to follow it.

When you listen to foreign language you know, it’s like you’re hard of hearing. I was thinking about that yesterday. I was sitting here in my livingroom, a video camera pointed at me, and an iPhone taped to a pillow off camera to my right. I was participating in a Spanish-language show called Oppenheimer Presenta, discussing Numerati themes with others from Mexico, Colombia and Argentina. The sound quality was iffy, at best. While I kept my eyes glued to the camera, I struggled to follow it.

If it had been American English, I would have had no trouble at all. The reason, I’ve learned, is that we carry templates in our head. Potential sentences are dancing around there just waiting to be activated. We need only a sparse stream of data to fill them out. (That’s why I can easily listen to a baseball game through a blizzard of static, while a lecture about the political situation in East Timor through the same connection would be lost to me.)

I’m thinking a lot about these language issues as I write the book about Watson, IBM’s Jeopardy-playing computer. For Watson, language is foreign. It needs lots of surrounding words in a sentence to provide context for each one.

We also need additional data when listening to foreigners speaking English. I remember a Spanish woman telling me once that Americans pretended they didn’t understand her when she spoke English. I asked for an example. Once she was in an airplane, she said, and politely asked the stewardess for a glass of milk. All she got was a blank stare. I asked how she asked for that milk, and she said, with two of the shortest syllables imaginable: …’Meelk please’…. I told her that two syllables weren’t much to go on. If she had said, …’I would like a glass of cold milk, please’… it would have been a cinch, no matter how she pronounced …’milk’….

Back to Watson… (which is the way my brain is working these days) Sometimes in Jeopardy, the clues have too many words for Watson, which can lead to confusion. Take this clue, from a 2005 show. Under the category …’Andrew Jackson’s Hermitage’… it reads:

To get to the Hermitage from Nashville, you take a road called this, same as Jackson’s nickname

The computer may be tempted to start scouring its geographic data to look for highways around Nashville. My hunt on Google maps shows that the road from Nashville is Lebanon Pike, and that the answer to the clue, Old Hickory (Blvd), lies beyond Jackson’s Hermitage. Long story short: Lots of confusion for the computer. But if the clue had simply asked for Jackson’s nickname, a piece of cake.

Old Hickory’s Hermitage

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares3,039 views
Big Data
298 shares3,039 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
57 views
Data Management
57 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares654 views
Data Management
69 shares654 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…