DeepMind’s AI model builds Super Mario-like 2D games

When OpenAI recently unveiled its impressive generative model “Sora,” the limits of what can be done with Text-to-video is possible, very broad. The next leap in artificial intelligence (AI) development didn’t take long to arrive: Now Google brings us DeepMind Text-to-video games. This new model, called Genie, can turn a short description, a hand-drawn sketch or a photo into a playable video game in the style of classic 2D platformers like Super Mario Bros.

Advertisement

So far the games are not particularly fast. They run at one frame per second, as opposed to the typical 30 to 60 frames per second of most modern games. Still, “this is great work,” says Matthew Guzdial, an AI researcher at the University of Alberta, a few years ago developed a similar game generator. He learned from videos, abstract platform games to create. Nvidia also used video data to train a model called GameGAN, which could create clones of games like Pac-Man.

In all of these examples, however, the model was trained not only with video material, but also with input actions such as pressing a button on a controller: For example, a video image showing Mario jumping is paired with a “jumping action”. However, this tagging of video footage with input actions is very labor intensive, which limits the amount of training data available because it is not pre-tagged.

In contrast, Genie was trained using only video footage (the corresponding paper was published on arXiv and has not yet been peer reviewed) with 30,000 hours of hundreds of 2D platform games from the Internet. It then learned which of eight possible actions cause the character to change position in a video. In this way, countless hours of existing online videos became potential training data.

Genie generates each new image of the game depending on the action the player takes. When the player clicks Jump, Genie updates the current image so that the character jumps; when he clicks on “Left”, the image changes so that the character moves to the left. The game proceeds action by action, with each new image being generated from scratch as the player plays.

Future versions of Genie could run faster. “There is no fundamental limitation that prevents us from achieving 30 frames per second,” says Tim Rocktäschel, researcher at Google DeepMind and head of the development team. “Genie uses many of the same technologies as current large language models, where there have been significant advances in improving inference speed.”

Genie even learned and copied some common visual quirks found in platforming games. Many games of this type use so-called parallaxes, where the foreground moves to the side faster than the background. Genie also often included this effect in the games he created.

Although the model is an internal research project and will not be made public, Guzdial said the Google DeepMind team has said it could one day be turned into a game creation tool. He’s also working on that himself. “I’m definitely excited to see what they develop,” he says.

The researchers at Google DeepMind are not only interested in game development. The Genie team is also working on open learning, in which AI-driven bots are dropped into a virtual environment and have to solve various tasks through trial and error (a technique known as reinforcement learning).

In 2021, another DeepMind team developed a virtual playground called XLand where bots learned to cooperate on simple tasks like overcoming obstacles. Test environments like XLand will be crucial to training future bots for challenges before deploying them in real-world scenarios. The examples from the video games show that genius can be used to create such virtual playgrounds.

Other researchers have developed similar world-building tools. David Ha from Google Brain and Jürgen Schmidhuber from the AI ​​laboratory IDSIA in Switzerland developed a tool in 2018 that allows bots to… game-based virtual environments, so-called world models, can be trained. However, unlike Genie, the training data had to contain input actions.

The team demonstrated how useful this ability is in robotics. When Genie was shown videos of real robotic arms manipulating a variety of household objects, the model learned what actions that arm could perform and how to control it. Future robots could learn new tasks by watching video tutorials.

“It is difficult to predict which use cases will be possible,” says Rocktäschel. “We hope projects like Genie will give people new tools to express their creativity.”




(jle)

To home page


source site