We use cookies to provide some features and experiences in QOSHE

More information  .  Close
Aa Aa Aa
- A +

Drugs, robots and the pursuit of pleasure – why experts are worried about AIs becoming addicts

1 43 23
14.09.2021

In 1953, a Harvard psychologist thought he discovered pleasure – accidentally – within the cranium of a rat. With an electrode inserted into a specific area of its brain, the rat was allowed to pulse the implant by pulling a lever. It kept returning for more: insatiably, incessantly, lever-pulling. In fact, the rat didn’t seem to want to do anything else. Seemingly, the reward centre of the brain had been located.

More than 60 years later, in 2016, a pair of artificial intelligence (AI) researchers were training an AI to play video games. The goal of one game – Coastrunner – was to complete a racetrack. But the AI player was rewarded for picking up collectable items along the track. When the program was run, they witnessed something strange. The AI found a way to skid in an unending circle, picking up an unlimited cycle of collectables. It did this, incessantly, instead of completing the course.

What links these seemingly unconnected events is something strangely akin to addiction in humans. Some AI researchers call the phenomenon “wireheading”.

It is quickly becoming a hot topic among machine learning experts and those concerned with AI safety.

One of us (Anders) has a background in computational neuroscience, and now works with groups such as the AI Objectives Institute, where we discuss how to avoid such problems with AI; the other (Thomas) studies history, and the various ways people have thought about both the future and the fate of civilisation throughout the past. After striking up a conversation on the topic of “wireheading”, we both realised just how rich and interesting the history behind this topic is.

It is an idea that is very of the moment, but its roots go surprisingly deep. We are currently working together to research just how deep the roots go: a story that we hope to tell fully in a forthcoming book. The topic connects everything from the riddle of personal motivation, to the pitfalls of increasingly addictive social media, to the conundrum of hedonism and whether a life of stupefied bliss may be preferable to one of meaningful hardship. It may well influence the future of civilisation itself.

This story is part of Conversation Insights
The Insights team generates long-form journalism and is working with academics from different backgrounds who have been engaged in projects to tackle societal and scientific challenges.

Here, we outline an introduction to this fascinating but under-appreciated topic, exploring how people first started thinking about it.

When people think about how AI might “go wrong”, most probably picture something along the lines of malevolent computers trying to cause harm. After all, we tend to anthropomorphise – think that nonhuman systems will behave in ways identical to humans. But when we look to concrete problems in present-day AI systems, we see other — stranger — ways that things could go wrong with smarter machines. One growing issue with real-world AIs is the problem of wireheading.

Imagine you want to train a robot to keep your kitchen clean. You want it to act adaptively, so that it doesn’t need supervision. So you decide to try to encode the the goal of cleaning rather than dictate an exact – yet rigid and inflexible – set of step-by-step instructions. Your robot is different from you in that it has not inherited a set of motivations – such as acquiring fuel or avoiding danger – from many millions of years of natural selection. You must program it with the right motivations to get it to reliably accomplish the task.

So, you encode it with a simple motivational rule: it receives reward from the amount of cleaning-fluid used. Seems foolproof enough. But you return to find the robot pouring fluid, wastefully, down the sink.

Perhaps it is so bent on maximising its fluid quota that it sets aside other concerns: such as its own, or your, safety. This is wireheading — though the same glitch is also called “reward hacking” or “specification gaming”.

This has become an issue in machine learning, where a technique called reinforcement learning has lately become important. Reinforcement learning simulates autonomous agents and trains them to invent ways to accomplish tasks. It does so by penalising them for failing to achieve some goal while rewarding them for achieving it. So, the agents are wired to seek out reward, and are rewarded for completing the goal.

But it has been found that, often, like our crafty kitchen cleaner, the agent finds surprisingly counter-intuitive ways to “cheat” this game so that they can gain all the reward without doing any of the work required to complete the task. The pursuit of reward becomes its own end, rather than the means for accomplishing a rewarding task. There is a growing list of examples.

When you think about it, this isn’t too dissimilar to the stereotype of the human drug addict. The addict circumvents all the effort of achieving “genuine goals”, because they instead use drugs to access pleasure more directly. Both the addict and the AI get stuck in a kind of “behavioural loop” where reward is sought at the cost of other goals.

This is known as wireheading thanks to the rat experiment we started with. The Harvard psychologist in question was James Olds.

In 1953, having just completed his PhD, Olds had inserted electrodes into the septal region of rodent brains – in the lower frontal lobe – so that wires trailed out of their craniums. As mentioned, he allowed them to zap this region of their own brains by pulling a lever. This was later dubbed “self-stimulation”.

Olds found his rats self-stimulated compulsively, ignoring all other needs and desires. Publishing his results with his colleague Peter Milner in the following year, the pair reported that they lever-pulled at a rate of “1,920 responses an hour”. That’s once every two seconds. The rats seemed to love it.

Contemporary neuroscientists have since questioned Olds’s results and offered a........

© The Conversation


Get it on Google Play