Microsoft designs a voice translator Star Trek style
It may be the best alternative for learning a new language. Researchers at Microsoft have made a demonstration of a software that translates spoken English also oral Chinese almost instantly, keeping the Cadence of the voice of the person who speaks, which could make that talks were more efficient and personal. The first public demonstration was made by Rick Rashid, director of research at Microsoft, on October 25 in an event in Tianjin, China. “I am speaking in English and you listen to my words in Chinese with my own voice,” Rashid told the audience. The system works through recognition of the words of a person, quickly converts text in sentences correctly sorted in Chinese and then sends the data to a software voice synthesis which has been trained to play the speaker’s voice.
The video recorded by members of the public has been circulating by Chinese social media from the demonstration. Rashid published a blog entry the presentation that made before an audience of English-speaking last Friday, including a video.
Earlier this year, Microsoft proved for the first time a technology capable of modifying the synthesized voice to coincide with the voice of a person. However, this system only was able to convert in oral written text. The software requires about an hour of training to be able to synthesize speech from a person’s voice. To do this, it uses as base a template model of conversion from text to voice and adjusts to make to generate certain sounds in the same way in which the speaker makes it.
AT & T Spanish has already done demonstrations in advance of a simultaneous translation system and it is known that Google has built its own experimental live translation services. However, the prototypes developed by these companies do not have the capacity to generate a synthesized speech that matches the sound of the voice of a person.
The Microsoft System is a demonstration of the latest technology of speech recognition company, based on a learning software inspired by the mode of functioning of neural networks. In a blog entry about the demo system, Rashid said that the use of this technology has enabled the most significant leap in recent decades regarding the accuracy of recognition. “Instead of one of each four or five words being incorrect, the error rate now is a word of every seven or eight”, he wrote.
Microsoft isn’t the only company that has used neural networks to improve speech recognition. Google recently began using its own technology based on neural network in speech recognition services and their applications. This approach has achieved between 20 and 25 percent improvement in the rates of error for Word, according to Google engineers.
Rashid said by e-mail that the researchers of Microsoft Research Asia, in Beijing (China), and he have not yet used the system to have a conversation with someone outside the company, but the public demonstration has attracted great interest. “I’ve observed a mixture of excitement, wonderment and optimism when we talk about what this technology could bring us in the future,” he indicated.
Rashid ensures that the system is far from being perfect, but points out that it is good enough to allow communication in situations that would otherwise be impossible. The dedicated engineers working on the approach based on neural network of Microsoft and Google are optimistic and believe that the technique will give much more than itself, since it is in its initial stage of implementation.
“Still do not know the limits of accuracy of this technology really is too new ‘, says Rashid. “As we continue ‘training’ to the system with more data it gives the impression that improves more and more.”