【】

发布时间：2026-03-24 15:52:45 作者：玩站小弟

Alibaba wants you to compare its new AI video generator to OpenAI's Sora. Otherwise, why use it to m 。

Alibaba wants you to compare its new AI video generator to OpenAI's Sora. Otherwise, why use it to make Sora's most famous creation belt out a Dua Lipa song?

On Tuesday, an organization called the "Institute for Intelligent Computing" within the Chinese e-commerce juggernaut Alibaba released a paper about an intriguing new AI video generator it has developed that's shockingly good at turning still images of faces into passable actors and charismatic singers. The system is called EMO, a fun backronym supposedly drawn from the words "Emotive Portrait Alive" (though, in that case, why is it not called "EPO"?).

EMO is a peek into a future where a system like Sora makes video worlds, and rather than being populated by attractive mute people just kinda looking at each other, the "actors" in these AI creations say stuff — or even sing.

Alibaba put demo videos on GitHub to show off its new video-generating framework. These include a video of the Sora lady — famous for walking around AI-generated Tokyo just after a rainstorm — singing "Don't Start Now" by Dua Lipa and getting pretty funky with it.

The demos also reveal how EMO can, to cite one example, make Audrey Hepburn speak the audio from a viral clip of Riverdale's Lili Reinhart talking about how much she loves crying. In that clip, Hepburn's head maintains a rather soldier-like upright position, but her whole face — not just her mouth — really does seem to emote the words in the audio.

SEE ALSO:What was Sora trained on? Creatives demand answers.

In contrast to this uncanny version of Hepburn, Reinhart in the original clip moves her head a whole lot, and she also emotes quite differently, so EMO doesn't seem to be a riff on the sort of AI face-swapping that went viral back in the mid-2010s and led to the rise of deepfakes in 2017.

Mashable Light SpeedWant more out-of-this world tech, space and science stories?Sign up for Mashable's weekly Light Speed newsletter.By signing up you agree to our Terms of Use and Privacy Policy.Thanks for signing up!

Over the past few years, applications designed to generate facial animation from audio have cropped up, but they haven't been all that inspiring. For instance, the NVIDIA Omniverse software package touts an app with an audio-to-facial-animation framework called "Audio2Face" — which relies on 3D animation for its outputs rather than simply generating photorealistic video like EMO.

Despite Audio2Face only being two years old, the EMO demo makes it look like an antique. In a video that purports to show off its ability to mimic emotions while talking, the 3D face it depicts looks more like a puppet in a facial expression mask, while EMO's characters seem to express the shades of complex emotion that come across in each audio clip.

It's worth noting at this point that, like with Sora, we're assessing this AI framework based on a demo provided by its creators, and we don't actually have our hands on a usable version that we can test. So it's tough to imagine that right out of the gate this piece of software can churn out such convincingly human facial performances based on audio without significant trial and error, or task-specific fine-tuning.

Related Stories

China's live streaming factories are bleak. Now TikTok wants to open one in the U.S.
The White House is cracking down on brokers selling your data to China and Russia
Tesla faces new potential challenge in China: Xiaomi's first EV cars

The characters in the demos mostly aren't expressing speech that calls for extreme emotions — faces screwed up in rage, or melting down in tears, for instance — so it remains to be seen how EMO would handle heavy emotion with audio alone as its guide. What's more, despite being made in China, it's depicted as a total polyglot, capable of picking up on the phonics of English and Korean, and making the faces form the appropriate phonemes with decent — though far from perfect — fidelity. So in other words, it would be nice to see what would happen if you put audio of a very angry person speaking a lesser-known language into EMO to see how well it performed.

Also fascinating are the little embellishments between phrases — pursed lips or a downward glance — that insert emotion into the pauses rather than just the times when the lips are moving. These are examples of how a real human face emotes, and it's tantalizing to see EMO get them so right, even in such a limited demo.

According to the paper, EMO's model relies on a large dataset of audio and video (once again: from where?) to give it the reference points necessary to emote so realistically. And its diffusion-based approach apparently doesn't involve an intermediate step in which 3D models do part of the work. A reference-attention mechanismand a separate audio-attention mechanismare paired by EMO's model to provide animated characters whose facial animations match what comes across in the audio while remaining true to the facial characteristics of the provided base image.

It's an impressive collection of demos, and after watching them it's impossible not to imagine what's coming next. But if you make your money as an actor, try not to imagine too hard, because things get pretty disturbing pretty quick.

Featured Video For You

Sora Explainer

TopicsArtificial Intelligence

Tag：

The five guys who climbed Australia's highest mountain, in swimwear
Climbing a freezing cold mountain is already hard enough work. But in briefs? Nope.。It's too late fo
2026-03-24
克蘇魯音樂
前言：答：這個不太清楚。你查一下資料看有沒有。答：是這樣，國外有個叫“H.P.LovecraftHistoricalSociety”的歌手此人最喜歡把各種聖誕頌歌翻唱成克蘇魯頌歌。像那個舊日之歌就是他
2026-03-24
籃球一級運動員
前言：國家一級籃球運動員有什麽麽條件籃球運動員凡符合下列條件之一，都可以申請一級運動員稱號。1.參加全國運動會獲決賽權的各隊運動員從事三年以上專業訓練。上場時間累計不少於50％，申請人數：第一至四名，
2026-03-24
吃甲魚是什麽梗
收甲魚什麽梗收甲魚就是去收甲魚,不是什麽梗收甲魚就是去收甲魚,不是什麽梗殺鱉是什麽梗dota?鱉指的是DOTA2職業選手Burning。殺鱉最出名的一次是在AME直播時說出了殺鱉,導致自己被Burni
2026-03-24
Nate Parker is finally thinking about the woman who accused him of rape
Nate Parker is getting a crash course in male privilege after, in his own words, not thinking about
2026-03-24
牛皮涼席能用多少年
牛皮涼席能用多少年-業百科牛皮涼席正常使用,可用10年或20年以上。牛皮涼席是指采用頭層水牛皮作為原料,采用對人身體無毒害作用的鞣質原料,結合先進的鞣質工藝製作的水牛皮涼。牛皮涼席可以用多久?牛皮涼席
2026-03-24

探索

綜合

百科

熱點

娛樂

知識

時尚

焦點

【】

相关文章

最新评论

文章分类

大家感兴趣的内容

最近更新的内容

友情链接