Lumiere: Google achieves breakthrough in AI videos

A research team from Google has Lumiere, an AI technology and a model of the same name, was presented, which can be used to create videos from text input, images or to simply edit videos. The special thing about it is that those involved have chosen a completely new approach “a central challenge” should solve, as those involved write: the “Synthesizing videos that depict realistic, diverse and coherent movements”.








As the team explains in their scientific description of the model (PDF) writes, text-to-image models are now very advanced. However, video production with comparable quality is not yet possible. According to the creators of Lumiere, this is due to the temporal dimension, which brings with it numerous problems, especially the error-prone modeling of natural movements. To overcome this, the Lumiere team is taking a new approach.


So far, models for video generation have relied on the generation of key frames; the missing time dimension was later added with a second model layer. A technology presented by Nvidia last year relies on image generation like in Stable Diffusion and expands this to include a temporal component. For Lumiere, however, the technology should map the entire space-time in the model architecture and thus not put together a video from individual images later, but rather create it directly in one run. This should lead to a coherent movement.

Can be used in many ways

This approach has been overlooked in previous research, it is said. With Lumiere it is possible to generate 80 frames at 16fps, which corresponds to the typical scene length of 5 seconds in modern videos. In addition to creating short video clips from input formulated as text, Lumiere should also be able to create simple videos from images.




In addition, existing videos can be edited in a specific style through simple text input, the animation of certain areas of the image or so-called inpainting, in which areas within an image section can be completely replaced. This allows you to add missing information or replace elements in a video. The researchers show this with clothing that can be exchanged or added.


source site