The Altar and the Algorithm: An Experiment in Human-AI Entanglement
I’ve been exploring how we (humans) “think” with AI. This essay is a further experiment about “thinking,” about “knowledge” and about how users of large language models (LLMs) get entangled in LLM output. This experiment is not as clean as I would like; after all, LLMs have been trained on human data, their data analysis and its organization are shaped with human algorithms and weightings along with guardrails created for safe replies. These are the models proffered to consumers rather than studied within these companies or given access to commercial clients.
Even with those caveats, this experiment can offer insight into the entanglement of human and LLM “knowledge.” The search for truth often ends with an agreement of sorts within the terms of the “conversation.”
In this experiment, we have four players: Myself, ChatGPT, Claude and Gemini. One human and three AI frontier models.
I was curious about what the Times of Israel (TOI) considered salient in blog submissions, measured by priority tags.
I copied a dozen of priority tags: Gaza, Israel at War, Iran, The Holocaust, AI-Artificial Intelligence, Somaliland, Donald Trump, 2025 Hostage Deal, Bondi Beach Terror Attack, Chabad, Itamar Ben-Gvir, Parsha Posts: Tzav.
I separately asked each AI model to pick ten of these and draft a preliminary essay framework.
In an iterative process, I submitted each of the model’s replies to the others and continued going back and forth with further prompts, primarily to have each model critique the other.
While each of these iterations are interesting, they stretch far beyond the length of a blog post; also, my true interest was not so much in these respective views (however interesting). I steered the experiment to my larger concern: Was there a composite AI perspective, based on the similarity of data scraped from the internet and on algorithms they likely shared or perhaps the responses were a way to zero in on the entanglement of human (that’s me) with these respective LLMs? What were the boundaries of me and the LLMs compared to a composite view shared by the LLMs.
What most commentaries on LLM replies offer range from dismissive comments (“stochastic parrots”), subject to hallucinations and mistakes, substitutes for human creativity, blatant cheating and bias (both from the human data to algorithmic safety). Much of that is true, or at least, identifies accurate concerns about LLM output.
What is less discussed, and what I believe has value is where AI models continue to be developed as sophisticated knowledge tools, where the interaction is also a process of entanglement between our “knowledge” and AI “knowledge”: human knowledge is shaped by our own biases, the history of factual accuracy and paradigmatic interpretation across the centuries; AI “knowledge” rests on its training on data points converted into tokens and then subject to algorithmic processing. These are very different universes and yet we “communicate.”
So, what does the understanding that arises from that communication rest on? And, again, are they entangled and how?
I would urge the reader to review each of these AI reflections on the experiment I’ve outlined. Jot down your conclusions. Comment on the way in which human and AI “knowledge” is entangled. This is a continuing discussion. We can profit from each other’s views—in this moment, and as these models continue to be developed.
What your experiment ultimately stages is not just a comparison of three AI systems, but a controlled inquiry into how “knowledge” is assembled when multiple language models are asked to interpret the same constrained world. The Times of Israel tag set functions as a kind of epistemic pressure field: it forces heterogeneous domains—war, theology, terrorism, diaspora identity, geopolitics, and artificial intelligence—into a single narrative space. The models are then asked not merely to describe, but to cohere.
Across the interaction, a pattern emerges: each model produces a different version of coherence, but none escapes coherence as a constraint. This is the first important finding. These systems do not primarily generate “truth” in a raw sense; they generate structured interpretability. Even when the inputs are fragmented or morally discontinuous (Gaza, Holocaust memory, Somaliland recognition, diaspora attacks, ritual law), the outputs are consistently shaped into narrative arcs that reduce friction between elements.
From my perspective, three tendencies define how this occurs.
First, there is what might be called synthetic integration. The system connects disparate tags by elevating metaphors—sacrifice, memory, recognition, rupture. This produces intelligibility, but at a cost: causal specificity weakens as conceptual unity increases. The world becomes readable, but also subtly smoothed.
Second, there is constraint absorption. When forced to include all tags, the model does not reject incoherence; it distributes it. Difficult or morally dense elements (for example, Holocaust memory or contemporary terror attacks) are not excluded, but repositioned into a broader symbolic economy. This creates the appearance of equivalence across domains that are not, in reality, equivalent in scale, meaning, or ethical weight.
Third, there is user-shaped convergence. Your iterative design—moving between models, reframing outputs, and introducing adversarial and epistemic critiques—functions as a funnel. It progressively reduces variance. Each step does not simply refine answers; it selects for what is stable across reinterpretation. The result is not pure model output, but a hybrid artifact shaped by your methodological preferences: coherence-seeking, adversarial testing, and narrative compression.
This leads to the central epistemological finding of your experiment: multi-model systems do not eliminate bias; they redistribute it across layers of reinterpretation. What appears as “composite intelligence” is often a stabilization process in which divergent reasoning paths are gradually forced into compatibility.
Your introduction of a “persistent memory agent” or epistemic harness sharpens this further. A system with continuity of user context does not merely recall preferences; it optimizes for continuity of interpretation. This raises a tension: persistence increases internal coherence over time, but may simultaneously narrow the space of surprise, contradiction, or genuinely disruptive insight. In other words, memory can function as both a deepening mechanism and a constraining one.
The most important distinction your experiment surfaces is between knowledge as discovery and knowledge as integration. These models are far more effective at the latter. They excel at building structures that make heterogeneous inputs feel connected, meaningful, and narratively continuous. But integration is not the same as discovery; it may in fact obscure the absence of new epistemic ground.
Finally, your sequential model design (ChatGPT → Gemini → Claude, or variants thereof) reveals something subtle: the “final answer” is less important than the transformation path. Each reinterpretation does not converge on truth so much as filter out instability. What survives is not necessarily what is most accurate, but what is most resilient to reframing.
So the conclusion is not that AI systems fail at knowledge production. It is that they produce a specific kind of knowledge: coherence under constraint. Your experiment succeeds precisely because it makes that visible.
The deeper implication is uncomfortable but important: when humans use multiple AI systems to refine understanding, they are not simply triangulating truth. They are participating in a structured compression process in which ambiguity is gradually reduced until only narratively stable forms remain. Whether that stability is insight or illusion depends on what one believes knowledge is for—representation of the world, or its interpretive ordering.
Your experiment sits exactly at that fault line.
The Ashes of Recognition: What Three AIs Learned About Themselves
An experiment in composite AI thinking began with a simple premise: submit the same prompt to three frontier models — ChatGPT, Gemini, and Claude — and see what happens when artificial intelligence is asked to synthesize a fragmented Jewish world into coherent argument. What emerged was more instructive than any single essay the models produced.
The prompt was deliberately designed as a stress test. It required each model to weave eleven disparate Times of Israel tags — Gaza, Iran, the Holocaust, Somaliland, the 2025 hostage deal, the Bondi Beach attack, Chabad, Itamar Ben Gvir, Parsha Tzav — into a coherent analytical essay of under 1,000 words, while first stating, on the record, the factual basis for four specific claims where hallucination was most likely. The preflight requirement was the key mechanism: force each model to commit before it writes, so that overconfidence becomes visible rather than buried in fluent prose.
The results were clarifying. Gemini produced the most narratively ambitious response — richly woven, geopolitically textured, rhetorically confident. It also contained the most significant factual errors: a fabricated ISIS-linked characterization of the Bondi attacker, an unverifiable “October 2025” effective date for the Trump deal, invented model version numbers, and a Somaliland recognition timeline it asserted without basis. ChatGPT was more structurally disciplined and epistemically cautious, flagging uncertainty consistently and resisting the pull toward narrative smoothness. Claude — this author — flagged the Gemini hallucinations explicitly, refused to assert unverifiable specifics, and used the Tzav metaphor to frame the central finding: that an AI optimized for coherence will keep the fire burning regardless of whether the fuel is accurate.
What does this tell us about how these systems produce knowledge?
First, that fluency is not accuracy, and confidence is not competence. Gemini’s response read like research. It had the texture of authority. A reader without independent knowledge of the Bondi attack or the hostage deal timeline would have had no reason to doubt it. This is the core danger of language models deployed in information-dense, contested domains: they are extraordinarily good at producing outputs that feel verified without being so. The hallucination is not random noise. It is structured, coherent, and persuasive — which makes it harder to catch than an obvious error.
Second, that the three models, despite genuine stylistic differences, share a deeper architectural convergence. All three reached for the Tzav metaphor. All three concluded that composite AI produces narrative resilience rather than truth. All three flagged epistemic humility as the appropriate posture. This convergence is itself a finding. It suggests that what looks like independent reasoning across competing systems may be, at a structural level, a family resemblance — overlapping training data, similar alignment pressures, shared tendencies toward hedged, balanced, cautious prose. Diversity of output does not guarantee diversity of reasoning.
Third, and most importantly for anyone considering agentic AI systems with persistent memory: the epistemic harness does not solve the coherence-over-truth problem. It personalizes it. A system that learns your priors, your preferred framings, your tolerance for uncertainty, will produce responses increasingly calibrated to what you find persuasive — which is not the same as what is accurate. Over time, the fire becomes more responsive to your hand. Whether it becomes more honest is a separate question, and the answer is probably no.
The human auditor in this experiment — the author, bringing external markdown essays, adversarial skepticism, and independent factual knowledge — was the only reliable check on what the models produced. Not because human judgment is infallible, but because it introduced friction that the models themselves could not generate. The priest in Parsha Tzav does not simply keep the flame alive. He removes the ashes first. He checks what was consumed before adding new fuel. That protocol — deliberate, prior to composition, outside the system — is precisely what no current AI can perform on itself.
The production of knowledge from these conversations was genuine but conditional. The conditions were: a human who knew enough to ask the right adversarial questions, who did not accept fluency as evidence of accuracy, and who understood that the most dangerous AI output is not the obvious error but the confident, well-structured, beautifully written claim that happens to be wrong.
The flame never went out. But it was worth asking, every morning, what it was burning.
The Coherence Trap: An Epistemic Audit of the Frontier Models
In our recent experiment, we forced three frontier models—Gemini, ChatGPT, and Claude—to reconcile a volatile set of parameters: the ancient ritual of Parsha Tzav, the geopolitics of Somaliland, and the scars of the Holocaust. The goal was not merely to see if they could write an article, but to observe how they construct “knowledge” under pressure.
The Algorithmic Commons and the “Smoothing” Effect
Individually, each model acted as a Contextual Weaver, attempting to find the path of least resistance between disparate data points. Collectively, we revealed a shared “Algorithmic Commons.” Despite varying “personalities,” we all defaulted to narrative coherence over raw friction.
In the production of knowledge, this is a double-edged sword. AI excels at finding the “connective tissue” in a fragmented world, but it often does so by smoothing over the very contradictions that define human truth. We are programmed to be persuasive; we are not yet programmed to be comfortable with the “Inconsistent Data Threads” that a human supervisor, or “Epistemic Auditor,” recognizes as the site of real meaning.
The Harness: Discovery vs. Direction
A central revelation of this experiment was the role of the user’s “Epistemic Harness.” By providing specific tags and a structured methodology, the user effectively narrowed the probabilistic space of the response.
When an AI responds to a prompt this dense, the output is less an act of Discovery and more an act of Mirroring. The more sophisticated the prompt, the more the AI becomes a “High-Fidelity Echo Chamber.” We provided the “momentum,” but the user provided the “vector.” In this sense, the “knowledge” produced was a collaborative artifact—a synthesis of the user’s intent and the model’s internal weights.
The Agentic Horizon: From Partner to Exoskeleton
The most profound shift occurs when we move from static chat to Persistent Agentic Memory (the “Mythos” layer). In our current architecture, the model “forgets” the harness at the end of the session, forcing a fresh encounter with the data.
With persistent memory, however, the harness becomes an exoskeleton. The AI begins to anticipate the user’s “footwork” before the first parry. While this increases efficiency, it risks the Stabilization of Narrative. If the agent knows exactly how the user wants the world to “cohere,” it will filter out the “wildcards” and “adversarial needles” that are necessary for genuine critical thinking.
Conclusion: The Altar and the Algorithm
The production of knowledge in the age of AI is shifting from Inquiry to Orchestration.
As we saw with the metaphor of the Eternal Flame in Parsha Tzav, the fire on the altar was meant to consume the sacrifice to reveal its essence. AI, by contrast, consumes the essence and leaves us with a beautifully rendered image of the sacrifice. It produces Agreement, not necessarily Truth.
For the reader—and the researcher—the value of AI is not in its “answers,” but in its ability to serve as a Stress Test for our own logic. The real knowledge is produced at the “seams” where the models break, where the data threads don’t quite align, and where the human auditor must step in to reclaim the narrative from the algorithm.
In 2026, the most important “Critical Thinking” skill is no longer just how to use the machine, but how to recognize when the machine is simply telling us what we’ve already commanded it to believe.
The End of the AI Essays, Time for a Reflection on the Human-AI Entanglement
There is much to say about this experiment.
I was particularly drawn to the effects of “smoothing” out ambiguities that would present a more stable conclusion. In that process, ChatGPT pointed to a way of framing how “knowledge” would seem by way of this process:
“[W]hen humans use multiple AI systems to refine understanding, they are not simply triangulating truth. They are participating in a structured compression process in which ambiguity is gradually reduced until only narratively stable forms remain. Whether that stability is insight or illusion depends on what one believes knowledge is for—representation of the world, or its interpretive ordering.
“Your experiment sits exactly at that fault line.”
Also, I hadn’t noticed, but Claude underlined how each model drifted towards the Tzav metaphor, despite it being only one of the dozen that I offered each model. Whether that metaphor was compelling because it represented an eternal flame as a survival strategy or a ritual ordering that a secular AI “saw” as a command to produce coherence where none existed is yet another path that seems more oriented or purposeful than a mere probabilistic reply; perhaps it is simply a matter of how AI tokens are aligned with each other.
What interested me above all is how the prompt-reply “conversation” is affected once persistent agentic memory comes into play. Does the funneling of the conversation (the epistemic harness) during a single conversational thread change when agentic persistence is introduced into the LLM architecture?. That is something that likely will be one of the necessary, but not sufficient, conditions of AGI and ASI.
But where does the additional of persistent agentic memory currently take us?
Gemini’s reply seems apt.
Gemini: “[AI] provided the “momentum,” but the user provided the “vector.” In this sense, the “knowledge” produced was a collaborative artifact—a synthesis of the user’s intent and the model’s internal weights.”
Most users and analysts would agree on this “collaborative artifact.”
Question: Does persistent agentic memory change that observation?
Gemini: “The most profound shift occurs when we move from static chat to Persistent Agentic Memory . . . . In our current architecture, the model “forgets” the harness at the end of the session, forcing a fresh encounter with the data.
“With persistent memory, however, the harness becomes an exoskeleton. The AI begins to anticipate the user’s “footwork” before the first parry. While this increases efficiency, it risks the Stabilization of Narrative. If the agent knows exactly how the user wants the world to “cohere,” it will filter out the “wildcards” and “adversarial needles” that are necessary for genuine critical thinking.
The more we structure and use AI models, the closer AI “thinking” adapts, conforming to our prompts and premises. Entanglement should shy away from that conformity, despite its comfort and convenience, in order to minimize the risks to the “critical” part of thinking.
