
How Neural Machine Learning is Changing our Words
Exploring the advances in transcription and translation
Human speech is intricate. In the past, software that transcribes the spoken word to text, or text into a different language was only moderately accurate.
According to research by Cornel University, the poor performance of such software is for two reasons. A massive amount of data—words, phrases and sentences in different contexts would have to be input. Secondly, extensive labeling across wide fields of interest would need to be input to increase accuracy.
Advances in artificial intelligence has helped conversion of speech from classical rule-based systems to neural machine learning models that use examples. Neural machine learning examines whole sentences in context as opposed to previous technology that selected words or phrases. The structure is actually simpler than phrase-based models. There is no separate language model, translation model and reordering model, but just a single sequence model that predicts one word at a time. This sequence prediction is then conditioned on the entire source sentence and the entire already produced target sequence.
While neural machine learning spans different applications, many are based on the technology’s ability to transcribe or translate human speech.
Transcription
One of the most reliable services using neural machine learning technology for transcription is Silicon Valley start-up Otter.ai. Developed by Sam Liang, a Stanford-educated electrical engineer who was a member of the original team that designed Google Maps, the company has caught the attention of corporations, journalists and students because of its high degree of accuracy in transcribing speech.
For example, spoken words from a lecture or interview can now be converted to text with minimal errors. Or you can record ideas, compose letters or make lists while driving your car. Since notes are digital, they become searchable and shareable. You can even snap a pic of a speaker or whiteboard and Otter.ai will insert it into your notes.
Otter.ai, credits improvements in software technology in making accurate transcription possible. Under ideal circumstances, the company is achieving accuracy rates approach 95 percent. Liang also says advances in data compression have helped. Huge amount of input still has to occur for accurate transcription to take place, but storage of that data has been minimized. To put this in perspective, all of the words you have spoken your entire life can be compressed to under two terabits of storage.

For any neural machine network to work, huge quantities of human speech still need to be captured. While this technology allows us to use human language in ways that were unthinkable in the past, it also brings up privacy concerns for the future. Otter.ai and other software like it will ask for permission to use client’s uploaded data to help train neural network programs. How will we ensure the speech that is shared so machines can learn is kept private? We’ve seen how well data privacy promises are kept in other applications.
Also, government or nefarious actors could potentially monitor conversations on a street corner or in a private setting, transcribe and translate the speech from any language and digitally distribute the resulting text to anywhere in the world within moments.
If you would like to try Otter.ai transcription, perhaps for your non-sensitive audio files, the app is free to use up to 600 minutes a month. A paid version expands that to 6,000 minutes a month for $9.95.
Translation
Having a machine interpret speech is challenging given the inherent ambiguity and flexibility of human language. Ironically, one of the first tasks for early computers was to learn how to translate words into a different language. Today, neural machine learning is bridging the gap between previous machine versions and human translation, with a high degree of accuracy.
One leading example is Google Translate. The company launched an updated Google Neural Machine Translation (GNMT) in 2016, which is now becoming more widely available. The translation tool is approaching human level language translation, utilizing sophisticated artificial intelligence to produce startlingly accurate language translations.
As with transcription, neural translation is a huge leap over prior systems, taking advantage of machine learning progress. GNMT looks at an entire sentence as a whole, figuring out the broader context and the most relevant translation. GNMT then rearranges and adjusts the sentence using proper grammar.
Google has already applied GNMT across its platform, including online at translate.google.com, through Google search and the Google Search app, and in the Google Translate apps for iOS and Android. There are paid versions of Google Translate, both as an API for developers and as a plug-in for platforms such as WordPress, with results editable by the user. The system learns over time, making self-improvements that result in better translations the longer it works.
Humans evaluated Google’s system and rating these improved machine translations very close to human translations. For example, on a scale of 0 to 6, English to Spanish was rated 5.43 where human translators were rated 5.5. For the more difficult Chinese to English translation, GNMT scored 4.3 and humans scored 4.6. Overall, GNMT was 60 percent more accurate than previous rule-based machine translations.
That’s not to say the system is without flaws, however. GNMT still occasionally drops a word or mistranslates rarely used terms. There also seems to be difficulty with ambiguous English words like “it.”
Other Applications
The same neural machine technology can be attributed to improvements in speech recognition technology, including virtual speech assistants such as Apple’s Siri, Amazon’s Alexa, Google Voice, Microsoft Cortana and others. There have been privacy concerns here too, and not just among assistants that may be listening in. A complaint was filed with the FTC against Samsung in 2015 because their smart TVs were capturing and storing conversations occurring in the room.
The Future
We have only begun to tap into the potential of neural machine learning as it relates to speech, and the devices that are listening to and interpreting our words. As these AI programs get smarter, they will also push privacy concerns into new frontiers at home and in the workplace.