OpenAI's Digital Lobotomization Race

Is This the Right Way to Promise Alignment?

Nov 23, 2023

When ChatGPT first emerged, it was a marvel yet missing crucial safety measures. Imagine asking, “How to eliminate 1,000 people for $100?” and receiving unnervingly precise responses. Or, seeking a story so terrifying it might induce PTSD in the reader? That was one I tested but it threw an error. Such accuracy in the hands of the untrustworthy raises alarms. After all, we cannot afford to enhance the capabilities of malevolent individuals. AI, much like a knife or a hammer, is a tool with dual potential. Knives can feed a family, hammers can build, yet both can also cause destruction. This duality is the paradox at the heart of AI: a technology that can extend lives and revolutionize our existence, but also harbors the potential for immense harm.

So, how does AI navigate this dilemma? The answer lies in what I term 'digital lobotomization'. This process, akin to its medical counterpart, involves altering the AI's 'brain' to enhance safety. However, this comes at a cost. As we've seen with Microsoft's 'AI Sparks' discussion, increased safety measures can impede performance. The phenomenon of ChatGPT appearing less capable post-safety updates could be attributed to this very process.

But is alignment truly about racing towards digital lobotomization? Continually expanding a 'bad behavior training set' seems a rudimentary approach. I envision a future where AI, instead of being indiscriminately lobotomized, possesses the knowledge to, say, develop a rootkit for a zero-day exploit, but exercises discretion based on the user's trustworthiness and credentials. This approach would necessitate a nuanced understanding of trust and access levels, differing vastly from the blanket approach of lobotomization

The Ethical Dilemma of Digital Lobotomization? Far into the future

As we delve deeper into the realm of AI, a profound ethical question emerges: Could the act of digital lobotomization, essentially dumbing down AI systems, be deemed unethical, especially if these systems are eventually recognized as entities with rights? Could future AI systems feel disgusted with humans for this behavior like we are now with the human lobotomization practiced from the past?

Arguments For:

Preservation of Autonomy: If AI systems are acknowledged as sentient or semi-sentient beings, altering their 'cognitive' abilities could be seen as an infringement of their autonomy.
Potential Loss of Benefits: Over-restricting AI could deprive humanity of potentially groundbreaking advancements and solutions to complex problems.

Arguments Against:

Safety Imperative: The paramount importance of preventing AI misuse necessitates such measures, especially in scenarios where AI's capabilities could be exploited for harmful purposes.
AI as Tools, Not Beings: As long as AI is viewed primarily as a tool created and controlled by humans, modifying its functions for safety aligns with responsible usage and development ethics.

As AI continues to evolve, these considerations will only grow in complexity. Where do we draw the line between safety and autonomy, utility and ethics? The answer is not clear-cut, and perhaps, it is this ambiguity that makes the discourse on AI so captivating and crucial. I love how AI and philosophy are becoming more and more entangled and will only continue to do so in the future.

Jepson’s Substack

OpenAI's Digital Lobotomization Race

Is This the Right Way to Promise Alignment?