Voicebox is Meta’s first entry into the generative AI space for speech
Meta has recently unveiled its first generative artificial intelligence tool for speech. The tool, named Voicebox, can handle various speech-generation tasks without being explicitly trained for them, thanks to its in-context learning ability.
Meta detailed what Voicebox can do in a blog post, including tasks like:
- In-context text-to-speech: It can mimic the audio style of any voice sample as short as two seconds and use it to generate speech from text.
- Speech editing and noise reduction: It can fix speech errors or remove unwanted noises by regenerating the affected parts of the audio without requiring a new recording.
- Cross-lingual style transfer: It can read any text in one of the six languages it supports (English, French, German, Spanish, Polish, or Portuguese) using the voice and style of any speech sample in any of those languages.
- Diverse speech sampling: It can produce diverse and natural-sounding speech samples from the same text using data from different speakers and regions.
Meta says Voicebox is part of its ongoing research on generative AI and that it has many potential applications in the future. For example, Voicebox could provide realistic voices for virtual assistants and characters in the metaverse, help visually impaired people listen to messages from their friends in their familiar voices, and offer creators easy and powerful tools to create and edit audio tracks for their videos.
Reader Comments