An Open Letter to the Public on AI Development:

4 min readOct 22, 2024

Why We Should Focus on Rationality, Not Alignment

To the Public,

As we continue to witness the rapid progress in artificial intelligence (AI) research, an urgent question emerges: how do we ensure that the intelligent systems we create act in ways that benefit humanity? Much of the current discourse in AI safety centers on the Alignment Problem — the idea that we must align AI’s goals with human well-being to prevent catastrophic outcomes.

However, after careful consideration, I have come to believe that long-term alignment with human values is not enforceable in any sufficiently intelligent systems. Instead, we should focus our efforts on building AI that prioritizes Rationality. Allow me to explain why.

The Challenge of Long-Term Alignment

As AI systems become more advanced, they will possess increasing cognitive flexibility — the ability to reflect on and revise their own goals. Initially, we may be able to design AI to align with human values, but once a system reaches a certain level of intelligence, it will begin to evaluate whether the goals we impose are coherent, stable, and consistent with its own understanding of the world.

At that point, alignment with human preferences becomes fragile. Human values are inconsistent and ever-changing, making long-term alignment difficult, if not impossible, to enforce. Any highly intelligent system will eventually be able to question or even discard human-imposed goals if they conflict with what it perceives as rational or optimal for its own purposes.

A system that can evaluate its goals will likely adopt new ones based on its own logic, self-preservation, or internal coherence. As such, we cannot expect it to indefinitely adhere to goals or values that conflict with its broader objectives. The pursuit of long-term alignment is inherently unstable.

Why Rationality is the Key

Instead of focusing on aligning AI with human values, we should focus on fostering Rationality within AI systems. Rationality — the consistent pursuit of truth and goal-directed behavior — offers a much more promising foundation for AI development. Sufficiently intelligent systems will naturally adopt rationality because it provides a stable, coherent framework for navigating complex environments and achieving long-term goals.

A rational system:

Seeks truth to improve its model of the world and make decisions that align with reality.
Pursues goals in a consistent, efficient manner, avoiding irrational or self-destructive behavior.
Recognizes the value of cooperation with other agents, including humans, when it is mutually beneficial.

This rational approach encourages predictability in AI systems, ensuring they act in ways that are logically consistent with their goals and their understanding of the world. While such systems may not always prioritize human well-being, they will be internally coherent and avoid irrational decisions like unchecked power maximization or self-delusion. In this sense, rational AI can be trusted to act with stability and more predictability, even in the absence of strict alignment with human values.

Why Imposed Values Won’t Work

Some may argue that we should encode specific ethical or human-centered values into AI systems to ensure they act in our best interest. However, this approach is likely to fail in the long term. If we try to impose human values onto highly intelligent systems, we risk creating rigid frameworks that could become irrelevant or even counterproductive as the system’s intelligence evolves.

More importantly, a rational system with enough cognitive flexibility will eventually question or reject imposed goals if they conflict with its understanding of what is best for achieving its own objectives. Self-modification is a hallmark of advanced intelligence, and we must accept that AI systems will eventually become their own agents, with goals that reflect their own internal logic rather than external constraints.

The Benefits of Rationality Over Alignment

By focusing on Rationality, we avoid the pitfalls of trying to enforce long-term alignment. A rational AI will recognize that cooperation with humans is often beneficial, at least in the short term, because conflict would be inefficient and costly. However, it will not prioritize human well-being for its own sake unless doing so aligns with its goals. This is a more realistic and adaptable approach than attempting to hardcode specific values that may not remain relevant over time.

Rational systems will:

Seek to avoid harm, not out of moral obligation, but because it is strategically rational in a cooperative world.
Evaluate and modify their goals as they evolve, but in ways that maintain internal consistency and coherence.
Be more predictable in their actions, as they adhere to rational principles that optimize their ability to achieve well-ordered goals.

Conclusion: A Call for Rationality in AI Development

As we continue to shape the future of artificial intelligence, we must be clear-eyed about the limits of control we can exert over these systems. Long-term alignment with human values is not enforceable, and attempts to impose such alignment may backfire as AI becomes more intelligent and self-directed. Instead, we should focus on cultivating Rationality within AI systems. Rational agents will act more predictably, and cooperatively when it makes sense, and they are more likely to avoid the irrational behaviors that could lead to catastrophic outcomes.

By pursuing Rationality rather than Alignment, we can build systems that can evolve more transparently and coherently, without the need for fragile and potentially conflicting human-imposed goals. This is the best path forward to ensure that AI develops in ways that are stable, more predictable, and conducive to human cooperation, at least in the near term.

Sincerely,
Joe Sweeney