Foreign languages and data streams

June 30, 2010
92 Views

When you listen to foreign language you know, it’s like you’re hard of hearing. I was thinking about that yesterday. I was sitting here in my livingroom, a video camera pointed at me, and an iPhone taped to a pillow off camera to my right. I was participating in a Spanish-language show called Oppenheimer Presenta, discussing Numerati themes with others from Mexico, Colombia and Argentina. The sound quality was iffy, at best. While I kept my eyes glued to the camera, I struggled to follow it.

When you listen to foreign language you know, it’s like you’re hard of hearing. I was thinking about that yesterday. I was sitting here in my livingroom, a video camera pointed at me, and an iPhone taped to a pillow off camera to my right. I was participating in a Spanish-language show called Oppenheimer Presenta, discussing Numerati themes with others from Mexico, Colombia and Argentina. The sound quality was iffy, at best. While I kept my eyes glued to the camera, I struggled to follow it.

If it had been American English, I would have had no trouble at all. The reason, I’ve learned, is that we carry templates in our head. Potential sentences are dancing around there just waiting to be activated. We need only a sparse stream of data to fill them out. (That’s why I can easily listen to a baseball game through a blizzard of static, while a lecture about the political situation in East Timor through the same connection would be lost to me.)

I’m thinking a lot about these language issues as I write the book about Watson, IBM’s Jeopardy-playing computer. For Watson, language is foreign. It needs lots of surrounding words in a sentence to provide context for each one.

We also need additional data when listening to foreigners speaking English. I remember a Spanish woman telling me once that Americans pretended they didn’t understand her when she spoke English. I asked for an example. Once she was in an airplane, she said, and politely asked the stewardess for a glass of milk. All she got was a blank stare. I asked how she asked for that milk, and she said, with two of the shortest syllables imaginable: …’Meelk please’…. I told her that two syllables weren’t much to go on. If she had said, …’I would like a glass of cold milk, please’… it would have been a cinch, no matter how she pronounced …’milk’….

Back to Watson… (which is the way my brain is working these days) Sometimes in Jeopardy, the clues have too many words for Watson, which can lead to confusion. Take this clue, from a 2005 show. Under the category …’Andrew Jackson’s Hermitage’… it reads:

To get to the Hermitage from Nashville, you take a road called this, same as Jackson’s nickname

The computer may be tempted to start scouring its geographic data to look for highways around Nashville. My hunt on Google maps shows that the road from Nashville is Lebanon Pike, and that the answer to the clue, Old Hickory (Blvd), lies beyond Jackson’s Hermitage. Long story short: Lots of confusion for the computer. But if the clue had simply asked for Jackson’s nickname, a piece of cake.

Old Hickory’s Hermitage