This past weekend’s Washington Post had an interesting article on the future of translation featuring Google and everybody’s favorite U.S. defense technology lab, DARPA. The article took a look at some of the iPod-sized translation devices that DARPA is testing in Iraq, as well as the phenomenon of Google Translate which, if you haven’t tried, you really should because it’s pretty damn good.
The article was a good introduction to the topic but it really only glossed over the issue and pretty much entirely missed the translation link between Google and DARPA, which is good because that’s something I go into in-depth in my book. In a nutshell, researchers have been trying to get computers to do automatic translation since the fifties, with little success. That’s because they typically programmed computers to interpret a language’s grammatical rules, which was not only excruciatingly time-consuming but also generally resulted in laughable results because people rarely speak in proper grammar.
After a while, a small number of smarty-pants researchers decided there must be a better way, so a concept known as “statistical machine translation” was born. It’s got a boring-sounding name, but wait – it’s really cool! The idea was, rather than a computer working off a predetermined set of grammatical rules, why not let it make its own decisions based on how language is really used? The computer learns how that language works by digesting actual documents, which it scans for patterns. The more documents it has, the more accurate its prediction of the language. In other words, if you give the computer a handful of documents in say, Arabic and English, its translation won’t be very good. But if you give it thousands or millions, it can statistically analyze the language and translate it into another with a fairly high degree of accuracy.
DARPA liked this approach and, just like its robot car races, held some contests starting in 2002 to try and spur interest and advances. One of the contest winners was a German fellow named Franz Josef Och, who used zillions and zillions of United Nations documents – which are all human-translated into the UN’s six official languages – to create his own model. Soon after winning the DARPA prize, Och found himself poached by Google. When I interviewed him out at the Googleplex in Mountain View back in January, Och explained that the company and DARPA have a similar-yet-reverse interest in translation: the military wants to be able to translate languages such as Arabic and Chinese into English, while Google wants to be able to translate the English-dominant web into other languages so it can expand its advertising business. How’s that for a nifty military-commercial link?
So statistical machine translation holds a world of promise for finally bridging the world’s language barriers, as the Washington Post story details. But there’s a far more amazing possibility. What if you apply statistical machine translation not just to languages, but to something like a personality? What if you could feed a computer with enough raw digital data about a person – their e-mails, text messages, banking records, financial transactions, photos, purchases, video game scores and so on – to then have it create a reasonable estimation of what that person is really like? If you’ve seen Caprica, the prequel to the most awesome show ever, Battlestar Galactica, this is exactly how the first Cylon artificial intelligence was created (I am really jazzed for the new series, which begins in January). People are already becoming wary of Google’s size and power, but what if the company ends up becoming the one to finally create a fully functional artificial intelligence? How cool and/or scary would that be?
As luck would have it, I’m talking to Och again later today. I’ll run this somewhat far-out idea by him and post his response.
UPDATE: So I spoke to Franz-Josef Och the other day and brought up the question of whether Google could use this sort of machine translation to in fact create a Cylon. While he was iffy on actual killer robots, he essentially endorsed the idea. “The technology we’re using in machine translation I could very well imagine would be useful in areas where we need to learn about the meaning of words and the meaning of things and in general where we need to correlate a lot of events that might or might not be related to each other,” he said. “Many people see different things in the term ‘artificial intelligence,’ but it will definitely lead to more intelligent software.'” For more on this concept, and how Google sees intelligent software evolving, check out a blog post from the company from last year. Fascinating stuff.