With HeyGen’s AI translator, the war between pro-VOST and pro-VF is over?

The feud is almost as old as the history of cinema. On the one hand, the purists, those who swear by the original version, the “real voice” of Will Smith and other Benedict Cumberbatch, and criticize the bad lip synchronization. On the other, looked at with contempt by the first class, the more popular proponents of VF and a cinema which allows the brain to rest, without worrying about translation defects. Divided families, broken friendships… Could AI bring everyone back together?

The company HeyGen has just developed a technology allowing not only to translate what a person says on video, but also to adapt the movement of their lips. A tool at the limit of deep fake, translation category. How does this technology work? Which sectors can take advantage of it? Can we prevent such a tool from being used to create fake news? Claire Larsonneur, linguist and lecturer at the University of Paris-8, sheds light on 20 minutes the issues around this technology.

Translating and adapting the movement of the lips, how does it work?

We already know how to translate a text, whether written or spoken. Transform the movement of a body part, like the lips, too. Making someone say things they have never said while retaining their voice, such as fakes surrounding political figures or covers of songs by Internet stars (the little summer fashion), all the same. The novelty of the Hey Gen tool lies in the fact of “combining all these tools that already exist”, explains Aurélien Capdecomme, director of New Technologies within 20 minuteswhen we see the demo video for the first time.

In this case, HeyGen superimposes “three layers”, explains Claire Larsonneur, linguist and lecturer at the University of Paris-8. First, “the translation itself, with an engine trained on a corpus”, as Google Translate or DeepL do. Then, a “voice-to-text then text-to-voice transcription” to oralize the translation, like the Apple Translate tool, very practical for ordering at a restaurant on vacation.

The “little extra” of HeyGen is to draw from “a database which associates sound with the movement of the lips” in several languages. Thus, the result “corresponds to what is pronounced in the target language”. Disadvantage of the technology, “it has an insane carbon footprint” and “it is very expensive in terms of energy, bandwidth and storage”, warns the linguist.

In which sectors can this technology be useful?

“Generative AI is experiencing strong growth,” Aurélien Capdecomme briefed us, who sees a future for HeyGen’s tool in luxury marketing and advertising. Claire Larsonneur leans towards a more immediate use in business communication. “Imagine the CEO of Stellantis who wants to register in his language for a shareholder meeting or the launch of a new model,” she describes.

“The advantage of this type of tool is that the person who is recording will be more comfortable in their mother tongue,” explains the linguist, emphasizing “infraverbal signals”. Less focused on the words he has to say, the user will be able to put “more force, more warmth” in his message, which will be felt even after translation. Associating the movement of the lips with the translation also makes it possible to repair a “strangeness”, the famous desynchronization between image and sound in the VF of films and video games. “It’s a question of visual comfort” for the viewer, she explains.

Can we prevent such a tool from being used to create fake news?

Despite the possible “hallucinations of the machine”, a term used for the enormous and sometimes improbable errors produced by AI, the first danger remains that such a tool falls “into the wrong hands”, warns the linguist. For several years now, we have seen more or less discreet deep fakes flourish, putting the Pope in a luxury down jacket or making Emmanuel Macron evoke a real war. “For uninformed users, there is a real issue. »

If, technically, it is possible to demonstrate when an image has been retouched (this is also the job of certain journalists), the verification time is far too long to contain the damage. “Google proposed putting information in the metadata to indicate when an image was created by AI,” indicates Claire Larsonneur, who pleads for “putting a stamp or a watermarkwhich cannot be removed, on the videos produced”, including with the HeyGen tool.

Is this the end of the VOST/VF feud?

“The real question is who watches the VOST,” the linguist gently tackles. Those who make the effort “are interested in the language for a reason,” she explains, and will no doubt continue to prefer the original version. “The fact that it’s a machine doesn’t change anything”: lip synchronization or not, those who want to choose their language on Netflix will do so, for the specialist. She also notes that on certain platforms, “certain films are only offered in France in French”, a question of storage and bandwidth.

Furthermore, the world of dubbing should not shake too quickly. “There is the question of cost benefit” depending on the type of film: for a low-budget film, AI will be enough, but “if we want to convey emotions”, it is difficult to do without the work of dubbers, argues Claire Larsonneur.

Because even if HeyGen reuses our voice to give a relatively natural rendering, its own base is made up of voice actors, which gives, for example, a slight Quebecois accent to the French translations. Furthermore, “only 20 languages ​​are supported by HeyGen, while there are around 7,000 languages ​​in the world,” points out the linguist. Rest assured, your masterpiece of Hungarian cinema will not be distorted by a robotic translation right away.


source site