Open source DALL-E competition runs on your graphics card


Image: Stable Diffusion

The article can only be displayed with activated JavaScript. Please enable JavaScript in your browser and reload the page.

OpenAI’s DALL-E 2 gets free competition. Behind it is an AI open source movement and the start-up Stability AI.

Artificial intelligence, which can generate images from text descriptions, has been making rapid progress since early 2021. At that time, OpenAI showed impressive results DALL-E 1 and CLIP. The open source community used CLIP for numerous alternative projects over the year. Then, in 2022, OpenAI released the impressive one DALL-E 2Google showed images and party, mid-journey reaches millions of people and Crayon flooded social media.

The startup Stability AI has now announced the release of Stable diffusionanother DALL-E-2-like system that will initially be gradually made available to new researchers and other groups via a Discord server.

After a test phase, Stable Diffusion will then be released free of charge – the code and a fully trained model will be published as open source. There will also be a hosted version with a web interface that users can use to test the system.

Stability AI funds free DALL-E 2 competitors

Stable Diffusion was created in a cooperation between researchers at Stability AI, RunwayML, the LMU Munich as well as EleutherAI and LAION. The research collective EleutherAI is known, among other things, for its open source language models GPT-J-6B and GPT-NeoX-20B known and is also researching multimodal models.

The non-profit LAION (Large-scale Artificial Intelligence Open Network) provided the training data with the open source data set LAION 5B, which the team filtered with human feedback in a first test phase and thus created the final training data set LAION-Aesthetics.

Patrick Esser from runway and Robin Rombach of LMU Munich led the project, building on their work in the CompVis group at Heidelberg University. That’s where the frequently used came into being VQGAN and latent diffusion. The latter served as the basis for stable diffusion with research from OpenAI and Google Brain.

The mathematician and computer scientist Emad Mostaque is behind the Stability AI, which was founded in 2020. He worked as an analyst for various hedge funds for a number of years before turning to public work. In 2019, he helped found Symmitree, a project that seeks to reduce the cost of smartphones and internet access for vulnerable populations.

With Stability AI and his private fortune, Mostaque wants to promote the open source community of AI research. His start-up previously supported the creation of the “LAION 5B” data set, for example. For training the stable diffusion model, Stability AI provided servers with 4,000 Nvidia A100 GPUs.

“No one but our 75 employees has voting rights — no billionaires, big funds, governments, or anyone else who controls the company or the communities we support. We’re totally independent,” Mostaque told TechCrunch. “We use our computing power to accelerate open-source AI.”

Stable Diffusion is an open source milestone

A test for Stable Diffusion is currently running, new additions are distributed in waves. The results, which can be seen on Twitter, for example, show that a real DALL-E-2 competitor is emerging here.

Stable Diffusion is more diverse than Midjourney, but has a slightly lower resolution than DALL-E 2. | Picture: github

Unlike DALL-E 2, Stable Diffusion Pictures of prominent people and generate other motives, which OpenAI forbids with DALL-E 2. Other systems such as Midjourney or Pixelz.ai can do this, but none of them achieve comparable quality with the high variety visible in Stable Diffusion – and none of the other systems is open source.

Stable Diffusion is said to already run on a single graphics card with 5.1 gigabytes of VRAM – the project brings AI technology to the edge that was previously only available via cloud services.

Stable Diffusion thus offers researchers and interested parties without access to GPU servers the opportunity to experiment with modern generative AI models. The model should also run on MacBooks with Apple’s M1 chip. However, the image generation takes several minutes instead of seconds.

OpenAI’s DALL-E 2 gets open source competition. Behind it are the open source community and the startup Stability AI. | Picture: github

Stability AI itself also wants to enable companies to train their own variant of stable diffusion. Multimodal models are thus following the path that large language models have already taken: away from a single provider towards the wide availability of numerous alternatives through open source.

Runway is already researching text-to-video editing enabled by stable diffusion.

Stable Diffusion: Pandora’s Box and Net Benefits

Of course, with open access and the ability to run the model on a widely used GPU, the possibility of misuse increases dramatically.

“A certain percentage of people are just awkward and weird, but that’s human,” Mostaque said. “We are convinced that this technology will take off and the paternalistic and somewhat condescending attitude of many AI aficionados is a mistake because they do not trust society.”

However, Mostaque emphasizes that the free availability enables the community to develop countermeasures.

“We take extensive security measures, including the development of modern tools, to mitigate potential damage to sharing and our own services. With hundreds of thousands working on this model, we are confident that the net benefits will be immensely positive and with billions of people using this technology, the harms will fade into the background.”

More information is available in Stable-Diffusion-Github. Many examples of Stable Diffusion’s image generation capabilities can be found in the Stable Diffusion Subreddit. Here it goes to Beta signup for Stable Diffusion.



source site