How Much Energy Do AIs Like Chat GPT Use? – Knowledge

Brute force learning is simple and brutal. An artificial intelligence (AI) learns by looking through millions of pieces of content over a large number of repetitions. With enough perseverance, this leads to the desired result, such as the ability to generate readable text. This digital dullness gave birth to the Chat-GPT program and now its successor, GPT-4. But all of this consumes a lot of energy.

For so-called machine learning, as used in chat GPT, programmers connect mathematical operations called neurons to form large, multidimensional neural networks. If the algorithm consists of many layers, it is also referred to as deep learning. After several training runs, the nets are capable of unimagined performance.

“Most of the deep learning models are trained on special hardware,” says Raghavendra Selvan, who works as an assistant professor at the Computer Science Department at the University of Copenhagen. Core elements of the hardware are GPUs, graphics processing units that were developed primarily with computer games in mind and are known as power guzzlers.

If only a few professionals were playing around with chat GPT, it wouldn’t be a problem. Last November, when Chat-GPT was introduced, there were 153,000 hits on the system. In February, the bot was already more than a billion Asked questions. Not surprisingly, Microsoft recently hired the services of developer Chat-GPTs, the original non-profit organization Open AI, for ten billion dollars secured. Leading tech companies Google and Meta are also entering the market with similar products.

A learning run of the AI ​​releases nine times more carbon dioxide than a person in Germany per year

At the University of Copenhagen, a research team headed by Raghavendra Selvan wondered what dimensions the energy consumption of all this might assume. The scientists developed the software carbon tracker, which developers can use to measure their own energy needs. Each machine learning attempt now gives Selvan feedback on power consumption. “A run on its own seems harmless. But when I give myself a summary after about two months at the end of my project, it’s a shocking number.”

Initially, machine learning algorithms almost never learn in the desired way. Dozens of parameters need to be set to achieve this. They determine how the huge data sets are processed – this is followed by the next test using brute force learning. “You only see the tip of the iceberg,” says Selvan. “For every model that actually comes online or is used, there are hundreds that were discarded before that.”

What that means can be seen in Chat-GPT. It is based on the machine learning algorithm Generative Pretrained Transforms 3, called GPT-3. From the available sources, Selvan and colleagues concluded that the last successful brute force learning run of GPT-3 alone consumes about 189 megawatt hours of energy and according to the Danish electricity mix 85 tons of CO₂ released. This corresponds to about nine times the annual CO₂ emissions per capita in Germany.

“Machine learning development is highly iterative. That’s the awkward part,” says Selvan. He is sure that the energy consumption and CO₂ emissions that led to Chat-GPT are even higher. “What we have been able to deduce is only part of the actual consumption in the development of GPT-3.” Its algorithm comprises 175 billion neurons, more than twice as many neurons as there are in a human brain. The GPT-3 predecessor GPT-2 was released in 2019 with just six billion neurons.

The rate of neuron proliferation exceeds even Moore’s law. This describes that the arithmetic operations per second that computers can perform double every twenty months or so. “However, the need for arithmetic operations doubles every three to four months,” says Selvan. The neuron growth of machine learning is therefore faster than computing power can be provided.

To circumvent the problem, developers are closing more and more GPUs in parallel. GPT-3 alone was tested on 14,000 graphics cards simultaneously. Companies and research institutes are keeping pace by integrating an unprecedented number of GPUs into supercomputers.

“Nevertheless, machine learning applications currently only account for about ten percent of our general-purpose computers,” says Mirko Cestari, team leader of the department for high-performance computing at the organization Cineca, an association of Italian educational institutions for high-performance computers. Cineca in Bologna recently brought Leonardo online, the fourth fastest supercomputer in the world, with 14,000 GPUs on board.

Can the waste heat from supercomputers be used for households or industry?

“But growth is definitely expected in the next few years,” says Cestari with regard to machine learning requests. Leonardo is equipped with his graphics cards, but that comes at a price: he draws seven megawatts of power, which corresponds to an ICE 4 with eight wagons.

In Bologna, people are aware of this problem and are looking for solutions. “We are currently discussing with the regional government whether the waste heat could be used for households or industry,” says Cestari. After all, it is a good 45 degrees Celsius. “Ideal for showering,” jokes Cestari.

On the software side, too, experts are trying to literally downsize the neural networks, such as Wojciech Samek at the Fraunhofer Heinrich Hertz Institute in Berlin. “How can you reduce the number of operations in machine learning? Are there redundancies that you don’t need?” asks the machine learning professor. Because which arithmetic operation contributes exactly how much to the end result is not entirely clear to researchers, as is the case with the human brain. However, unlike the biological brain, the neural network of the AI ​​can be digitally dissected.

“Based on the result, we redistribute the processes layer by layer in a mathematically meaningful way,” says Samek. The researchers look at how much the neurons of individual layers in the deep learning algorithm have contributed to the result. “You can then remove the neurons that are not relevant.”

Samek sees many more ways to make machine learning more efficient. “In some areas we humans know what is relevant,” says Samek. Even children know that every mammal has four and not seven limbs. “The machine doesn’t know that. If we succeed in incorporating this a priori knowledge, we can train much more efficiently and thus save energy.”

Artificial intelligence is often programmed in a particularly energy-hungry language

The programming language on which the learning algorithms are based also seems to offer space for saving energy. Neural networks are mostly written in Python, this coding language is said to be 70 times more energy-hungry than, for example, C++. That’s because Python has relatively easy-to-understand commands for humans. The downside: Translating this into 0s and 1s for circuits is expensive. C++, on the other hand, is complex to code, but can be made understandable for the machine quite quickly. “Python allows accessibility,” says Raghavendra Selvan. “If we had coded in C++ instead of Python, we would never have seen these advances in machine learning.” In general, neural networks would probably never have become so important if researchers had thought about energy efficiency, says Selvan. “But now we must do it.”

The tech giants driving development disagree. Google recently published a study, according to which the CO₂ footprint of machine learning should decrease after a further increase and then stabilize. Selvan doubts the validity of this study, saying it is too limited. In addition, the number of users of neural networks is also growing explosively.

This leads to another unanswered question: nobody but the companies knows what actually consumes a text request from an end user to the well-trained models.

Selvan has now tried to quantify the hunger for energy for the SZ. To do this, he uses the GPT-2 model and scales the numbers obtained according to available data for other language models. According to this, a single query of around 230 words consumes 581 watt hours. The one billion requests to Chat-GPT in February would have caused a consumption of 581 gigawatt hours, i.e. roughly the electricity consumption of all 170,000 residents of Oldenburg per year. That corresponds to around 244,000 tons of CO₂. Even if newer AIs ran on more efficient machines, says Selvan, GPT-4’s energy hunger should still be one dimension larger.

source site