The challenges of translation for Facebook

Source: Slator
Story flagged by: Jared Tabor

On April 19–20, 2017, Necip Fazil Ayan, Engineering Manager at Facebook, gave a 20-minute update at the F8 Developer Conference about the current state of the art of machine translation at the social networking giant.

Slator reported in June 2016 on Facebook’s big expectations for NMT. Then, Alan Packer, Engineering Director and head of the Language Technology team at Facebook, predicted that “statistical or phrase-based MT has kind of reached the end of its natural life” and the way to go was NMT.

Ten months on and Facebook says it is halfway there. The company claims that more than 50% of machine translations across the company’s three platforms — Facebook, Instagram, and Workplace — are powered by NMT today.

Facebook says it started exploring migrating from phrase-based MT to neural MT two years ago and deployed the first system (German to English) using the neural net architecture in June 2016.

Since then, Ayan said 15 systems (from high-traffic language pairs like English to Spanish, English to French, and Turkish to English) have been deployed.

No tech presentation would be complete without a healthy dose of very large numbers. Ayan said Facebook now supports translation in more than 45 languages (2,000 language combination), generates two billion “translation impressions” per day, serves translations to 500 million people daily and 1.3 billion monthly (that is, everyone, basically).

Ayan admitted that translation continues to be a very hard problem. He pointed to informal language as being one of the biggest obstacles, highlighting odd spellings, hashtags, urban slang, dialects, hybrid words, and emoticons as issues that can throw language identification and machine translation systems off balance.

Another key challenge for Facebook: low resources languages. Ayan admitted Facebook has very limited resources for the majority of the languages it translates.

“For most of these languages, we don’t have enough data,” he said — parallel data or high quality translation corpora, that is. What is available even for many low resource languages are large corpora of monolingual data.

Read the full article >>

Comments about this article


The challenges of translation for Facebook
LilianNekipelov
LilianNekipelov  Identity Verified
United States
Local time: 03:42
Russian to English
+ ...
Yeah--it will be hard, I guess even if the technology got Apr 26, 2017

better, because language is a collection of idiolects basically, with some shared features. The only solution would be to forbid slang anywhere on the internet, and create glossaries of the words to use. Also a book with simplified grammar instructions might be useful. Don't take it seriously, please. At least not now. I cannot see any other solution, though.

 
Michele Fauble
Michele Fauble  Identity Verified
United States
Local time: 00:42
Member (2006)
Norwegian to English
+ ...
Really? Apr 26, 2017

"So, Ayan explained, Facebook takes monolingual data (i.e., text), runs it through machine translation, and, voilà, an artificial corpus of bilingual data is created. Apparently, using a large machine-generated parallel corpus to train an NMT system is still better than using a high quality but only small corpus."

 

Sign in to add a comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Jared Tabor[Call to this topic]

You can also contact site staff by submitting a support request »
This discussion can also be accessed via the ProZ.com forum pages.


Translation news
Stay informed on what is happening in the industry, by sharing and discussing translation industry news stories.

All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search