SAN FRANCISCO, June 22 — After virtual reality, the Meta group is now entering the audio arena. The American tech giant has unveiled Voicebox, a handy online studio for transforming text into audio, in six different languages. For the time being, Meta has decided not to share its new AI tool with the general public.

After the world of virtual reality, Mark Zuckerberg is now jumping into audio with Voicebox. In a blog post, the social networking giant describes this new tool as “a generative AI model that can help with audio editing, sampling and styling.”

More natural voices

First and foremost, Meta’s studio will enable text-to-speech generation, i.e., it will be able to transform written text into spoken audio using a synthetic voice. Among other options, users will be able to benefit from cross-lingual style transfer. “Given a sample of speech and a passage of text in English, French, German, Spanish, Polish, or Portuguese, Voicebox can produce a reading of the text in that language,” says Meta.

Even more impressive is Voicebox’s ability to reproduce the audio style from an extract of just two seconds. This can then be used to generate other audio content. The style used is thus more representative of the way people speak in everyday life, more natural and therefore more pleasing to the ear.

In addition to transforming text into audio and reproducing an audio style, the studio offers the option of editing an extract. In fact, the user can delete a sound or any other part of an audio track to make the content perfect without having to make a new recording.

“We trained Voicebox with more than 50,000 hours of recorded speech and transcripts from public domain audiobooks in English, French, Spanish, German, Polish, and Portuguese. Voicebox is trained to predict a speech segment when given the surrounding speech and the transcript of the segment,” explains Meta.

However, the American group is not the first to have taken an interest in synthetic voices. TikTok caused a buzz with its own text-to-speech tool when it launched in 2020. The Chinese giant even made it possible to use the voices of Disney movie characters such as Rocket Raccoon from Guardians of the Galaxy, C-3PO from Star Wars and Stitch from Lilo and Stitch to read text in audio format. More engaging and more inclusive, the use of synthetic voices continues to appeal to users and major players in social networking. For Meta, “this type of technology could be used in the future to help creators easily edit audio tracks, allow visually impaired people to hear written messages from friends in their voices, and enable people to speak any foreign language in their own voice.” A way of strengthening ties and attracting new users. — ETX Studio