menu_open Columnists
We use cookies to provide some features and experiences in QOSHE

More information  .  Close

‘Hey @Grok put a bikini on her’: what more could an incel want?

15 0
previous day

A few weeks back, while out with some friends, I got talking to a couple of people who worked at a large tech conglomerate. They were contractors rather than employees of this company. They worked in AI safety, managing a team of fellow contractors whose job it was to ensure that the company’s LLM software could not be used to create content that was either illegal or against the company’s internal ethical guidelines. What this meant, in practice, was that these people spent their entire working day trying to trick the company’s LLM into creating extremely disturbing images.

Their job was to enter prompts into the software, attempting to find workarounds and loopholes in the model’s built-in protections: a process known as red teaming. It’s a form of content moderation that makes it impossible for a particular kind of content to be produced. The idea, in this case, is to find hairline cracks in the AI’s defences against illicit usage, so that those cracks can be sealed with code before users can find and exploit them to create disturbing images. The most disturbing of these – images depicting the sexual abuse of children – were the ones the company was most intent on preventing.

And so it was that these contractors spent all day, every day, finding more and more inventive ways of tricking a chatbot into producing such images. These red-teamers, it seemed to me, were doing the Lord’s work in the Devil’s world: carrying out a daily series of arduous and soul-crushing labours that reduced, if only incrementally and marginally, the sum total of evil and depravity in our midst.

The company they contracted for, I need hardly tell you, was not X. Because whatever they’re doing in there nowadays at the social media company formerly known as Twitter, it is by no means the Lord’s work. After my conversation with those two red-team managers, I found myself wondering what the internet........

© The Irish Times