OpenAI company now creates videos from text specifications

As of: February 16, 2024 9:53 a.m

In the future, the chatbot ChatGPT will be able to create short videos from text specifications. The AI ​​model is initially being tested to explore security risks and counterfeit risks.

The makers of the chatbot ChatGPT have developed software that can generate videos from text specifications. The AI ​​model called Sora will initially be made available to selected creatives, wrote OpenAI boss Sam Altman on the online platform X (formerly Twitter). Experts should also explore security risks before the program can be used widely.

AI technology that generates moving images from text templates could transform video production over time. But there are great concerns that this technology could be used to create fake videos on a large scale that would be difficult to distinguish from real recordings. The developers of the technology are therefore working on ways to incorporate unique identifying features such as watermarks into the videos.

Errors in physical laws

Videos created by Sora can be up to a minute long. It should be clear that they were created by AI. On the software website, OpenAI published several examples along with the description on which they were based. One of them shows a woman running across a street.

The video was generated entirely by artificial intelligence, with the text specifying that the woman should wear a leather jacket and a red dress and that the street should be reminiscent of Tokyo and have lots of neon signs that were also reflected in puddles.

Several other companies have already developed software that can create videos from text. OpenAI admits that Sora still has weaknesses: the model sometimes makes mistakes when implementing physical laws. It could also happen, for example, that someone takes a bite of a cookie in the video – and the cookie later still looks whole.

Google is making progress in the analytics space

In the race for software with artificial intelligence, Google is also reporting an improvement, but in the areas of video and analysis. The Internet company presented the further development Gemini 1.5, which can, among other things, evaluate longer videos and texts. As a test, the software was searched for funny moments in the 400-page log of conversations from the Apollo 11 space mission to the moon, Google wrote in a blog entry. Gemini 1.5 found three of them.

After uploading a drawing of a boot without further comment, the software automatically linked it to the moment Neil Armstrong took his first step on the moon.

Gemini 1.5 Pro can capture and analyze up to one hour of video, up to eleven hours of audio recordings, texts up to 700,000 words long and up to 30,000 lines of software code, Google explained. Google recently combined its apps and services with AI under the brand name Gemini. The Gemini 1.5 model will initially be available to developers and enterprise customers before everyone can benefit from it.

source site