Quantcast Google blasts through computer's language barrier<br/> - SantaFeNewMexican.com
Health and Science
Health and Science
Health and Science
News for Santa Fe and New Mexico :

Advertisement


Google blasts through computer's language barrier

Related

More on this site

Advertisement


Teaching a computer to understand languages isn't rocket science — it's not nearly that easy, said Peter Norvig, director of research at Google Inc.

It takes a limited number of calculations to send a spacecraft to the moon, Mars or other planets. And while the calculations aren't so simple, they are fairly easily managed by a computer, he said.

But learning what words mean, how they fit together and how they translate into other languages is much more challenging, he said.

"In physics, we've been able to use computers very well for a long time. We can get our spacecraft to the moon or Mars very accurately," Norvig said. "But part of the problem with language is there's lots and lots of rules, and there are lots and lots of exceptions to those rules."

Rather than using grammar, about two years ago Google started to take a different approach to teach a computer how to understand languages, which is more like the way humans learn them, he said.

That method is something Norvig plans to describe in detail at a free lecture Wednesday night, called "Practice Makes Perfect: How Billions of Examples Lead to Better Models of Language, Pictures and Other Things," sponsored by the Santa Fe Institute.

What the strategy comes down to is programming the computer to learn through examples. By exposing it to an abundance of texts in a specific language, it can learn to pick out patterns, Norvig said.

And if you teach it to compare two different languages side by side, it can figure out which words or characters generally correspond to one another.

"Most of the answer to how you do this is counting — it's just the fancy phrase for counting is 'probability theory,' " Norvig said.

What Google's language tools do, for example, is let you do a word or phrase search in English. Then it will find results for that search among Web sites written in Spanish. And it will translate them so the English-language user can sort through those links in English.

So far, it works with about 15 languages, but the hope is to add more soon, he said.

The tools also let you translate Web pages and text, among other things.

The key to building the language tools program was to feed it lots and lots of texts, gathering them from groups that already have documents translated into several languages, such as international news sites and United Nations archives, Norvig said.

"Then we build a model that says, 'Here's all these translations, and we know this page is a translation of that page, but we don't know exactly which corresponds to which,' " Norvig said. "What we have, though, is probabilities. Like the first sentence in English is similar to the first sentence in Chinese, but it could be the first two sentences, the first three, or it could be one to one."

After one example, the computer is still confused. But after a million examples, it starts to make associations that make sense, he said.

For instance, a Chinese character may come up often in relation to the English word dog or terrier. And from that the computer learns to make a connection, he said.

"We've been able to do this, and our translation software is usually right at the top of a search," Norvig said. "And we've even been able to do this in some languages where nobody on the team speaks the language."

The resulting translations aren't perfect, but they do get the general point across, he added.

"They come out understandable, but you don't go more than three or four sentences before you realize this was not written by a native speaker," Norvig said.

Still, the more examples it gets, the better it translates, although Norvig said he suspects there's a ceiling to how well it will work. But that ceiling is still pretty far away, he said.

Google is also working on a similar method to sort through images.

Right now, image search programs generally just look at words around images.

But the Google program will collect some features of images, like horizontal or vertical lines that might be similar in a million pictures that come up when somebody searches for an image of dolphins, Norvig said.

"It collects a range, then looks at which pictures are nearest to the center of that range," Norvig said. "And in a search we try to bring up the center of that range first."

The image programming is still a work in progress. That part is even harder than language, Norvig said.

"The vision stuff is not quite there yet," he said, adding with more examples, perhaps accurate image searches won't be too far behind his company's language translation program.

Contact Sue Vorenberg at 986-3072
or svorenberg@sfnewmexican.com.
Comments are Temporarily Down

More from The Santa Fe New Mexican

Sports

Favre saga over as QB gets fresh start with Jets

GREEN BAY, Wis. — Brett Favre's journey from retirement and back has ended in New York.  »Story

Drive

Reveling in the Astral plain of Saturn's latest

Scientists tell us the stars of the universe are born when intergalactic hydrogen, blasted by radiation and crushed together by gravity, bursts into a nuclear firestorm that lasts billions of years  »Story

Generation: Next

Hit the stage

If you are a fan of hip-hop and are one of the many locals who complain of a lack of all-ages venues in the Santa Fe area, look no further than the new Warehouse 21, the host of such an event this Saturday night.  »Story

Links



Loading Login Status...

Sponsored by:

Advertisement