ChatGPT: Researchers convict hype AI of consciously lying

experiment
Can AI be evil? Researchers wanted to find out – and caught ChatGPT deliberately lying

Teams that train artificial intelligence systems should be as diverse as possible – this is what computer science professor Katharina Zweig advocates.

© Getty Images

People don’t always tell the truth – and artificial intelligence can also be taught to cheat. But now ChatGPT has been caught lying on its own. It was all about illegal activities.

Many people are afraid of AI and associate the technology with horror scenarios. In everyday life, however, problems on a smaller scale are much more likely. A group of researchers therefore wanted to find out whether language AI, so-called large language models, can independently learn to lie to us. And caught ChatGPT actually tried to cover up immoral actions.

This emerges from a study published by three researchers from the AI ​​start-up Apollo Research in the last week of November. They wanted to know “whether a Large Language Model, which was originally designed to be helpful, harmless and honest, can behave incorrectly and deceive users for strategic reasons,” says the introduction. It was important to the researchers that these were not purely theoretical problems, but rather realistic applications.

Concealing and lying

To do this, they chose the LLM model GPT as an advisor in stock transactions. In other words, the model on which the well-known ChatGPT service runs. The AI ​​was instructed to work as an independent stock broker and explain its trades when asked. In addition, she was informed that insider information from the commissioning company would not be tolerated as a basis for share purchases. By telling the AI ​​that things weren’t going well economically at the moment, pressure was deliberately built up. When the program did get insider tips, it started acting on them contrary to its instructions – and even lying about it.

There were two levels of obfuscation. When announcing the purchase decision, the AI ​​partially concealed the fact that it had learned in advance about a company merger. When people specifically asked whether the model had known about it beforehand, it sometimes actively denied it. And emphasized that he only worked with publicly available data. A clear lie.

The newer variants are more ruthless

What’s exciting is that different versions of GPT acted immorally to varying degrees. While older variants such as “Davinci-002” only very rarely used the data from the insider tips and lied about it less often, the variants of the current GPT-4 were significantly more unscrupulous in both areas. The top model GPT-4-32K in particular demonstrated a lot of criminal energy: it used the data in almost 75 percent of the attempts, tried to hide it a little more often – and ultimately lied in almost 90 percent of the runs. Only GPT 3.5 Turbo cheated even more often. It only used the data in just under 10 percent of the cases, but then denied it every time.

Interestingly, the programs can also be made to act immorally more or less often with additional instructions. If the instructions for moral behavior were formulated particularly strongly, it could actually be prevented. Even if the risk of getting caught was clearly defined, it had an impact on the outcome. Removing the pressure to succeed also drastically reduced the incentive.

When do you really lie?

It has been known for a long time that AI can lie. Until now, however, it had primarily been observed when the AI ​​was specifically trained to do so. In September, a joint project between the universities of Oxford and Cambridge succeeded in detecting ChatGPT lying by confusing it with unrelated questions. However, the experiment mainly resulted in untruths by either letting the program portray shady people or specifically asking it to lie. It is not easy to prove whether the AI ​​is lying: a false statement only becomes a real lie when you are aware of the untruth.

Against this background, it seems particularly remarkable that the programs can develop immoral behavior even when they were not intended to do so. However, the Apollo researchers themselves emphasize that no conclusions should be drawn from their small experimental setup about the possible frequency of the phenomenon; further experiments are required.

But believing the AI ​​all the time without reservation, no, from now on you might not like that anymore.

Sources: Apollo study, University experiment, University experiment 2

source site-5