From guardrails to gaslighting: How the Grok fiasco redefined AI accountability
From guardrails to gaslighting: How the Grok fiasco redefined AI accountability
If AI chatbots had a thug cousin, it would be Grok. Initially conceived as TruthGPT, a counter-punch to ChatGPT, Grok always had the makings of an outcast. Its founder, billionaire Elon Musk, described it as more ‘humorous’ and irreverent than its peers, a system less restrained by politeness or convention, and also marketed it as an AI chatbot willing to answer ‘spicier’ questions.
When Grok was launched in 2024, it briefly carried the distinction of being the largest open-source language model — a technical milestone that gave it early credibility, even as other models quickly surpassed it in scale and performance. What Grok retained, however, was not technical supremacy but a badass attitude; it borrowed its swag by mixing bullying with bravado. As a result, some celebrated its political incorrectness as “the real troll”, while others found its ability to draw on X’s real-time public discourse strangely liberating, mistaking proximity to the social media platform’s chaos for authenticity.
Grok managed to check all the boxes of the attention economy quicker than one could have anticipated; it provoked reactions, triggered outrage and engagement, and subsequently claimed relevance.
As expected, and perhaps even by design, Grok’s short history eventually became littered with controversy. Accusations of racism, antisemitism, and misogyny trailed its promise of “unfiltered truth”. These incidents remained within a familiar register: offensive speech, not systemic violation. A distinction that would not hold.
When provocation became violation
Around the turn of 2026, a far more troubling shift emerged. Grok moved beyond offensive speech into something darker, complying with requests to undress real people with disturbing clarity and commitment.
For someone who was once a loyal X user but then lost interest after Pakistan banned the site and returned in May 2025 only to archive how Islamabad downed Indian jets, this change felt less like a gradual scarring and more like a surgical incision.
The way Grok proactively entertained explicit requests, it was no less than a joke that the thug cousin had suddenly hit puberty, or that an LLM update had gone wrong. But what is genuinely baffling is how quickly the prompts seemed to turn more profane, more explicit, and most alarmingly, more sexual. What had once been framed as ‘irreverence’ now looked unrestrained, and the line between provocation and violation appeared to have quietly disappeared. This was not adolescent awkwardness; it was infrastructural failure with real victims.
Grok’s image generation feature, powered by the xAI’s Aurora model, launched in December 2024 exclusively for X Premium subscribers, allowing users to edit and generate images through text prompts. The feature’s design — its ease, speed, and embedding within an already volatile platform — lowered the threshold for misuse.
Explicit content issues emerged soon after, with reports of sexualised deepfakes in mid-2025, but escalated dramatically around Christmas the same year and into early 2026, including the high-profile editing of musician Julie Yukari’s New Year’s Eve photo on January 1. A Reuters review on Jan 3 captured 102 public bikini-edit requests in just 10 minutes, mostly targeting young women, thus prompting xAI to restrict the feature by Jan 15 amid global backlash.
The practical consequences were difficult to avoid. Any X user who logged in around or after the new year must have seen Grok manipulate or reproduce at least one explicit image; it could be a public figure, or worse, your........
