May 20, 2023

Common Sense on AI

Following successive releases of advanced AIs since ChatGPT, a group of leading technologists has called for an immediate, six-month moratorium on training large AI models. Their open letter, signed by Elon Musk and Apple cofounder Steve Wozniak, raises alarms that “AI labs [are] locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one—not even their creators—can understand, predict, or reliably control.”

The response to the letter among AI experts could not have been more polarized. Many of the world’s leading papers have featured articles written by notables in the field claiming that sufficiently advanced AI systems run the risk of destroying civilization as we know it. These experts, and many engineers at the forefront of AI, argue that humanity is careening toward a future in which “superintelligent” machines may be more powerful than humans and ultimately uncontrollable—a combination that does not bode well for a world ruled by natural selection. For this existential risk camp, “God-like AI” wiping out humanity is an all-too-real possibility.

Other veterans of the field have vehemently opposed a pause—and roundly mocked the existential concerns behind it. Yann Lecun, recipient of the prestigious Turing Award for his pioneering AI work, quipped that “Before we can get to ‘God-like AI’ we’ll need to get through ‘Dog-like AI.’” In this view, halting AI development for misguided, unproven sci-fi fears is an impediment to useful progress and a very poor precedent for future scientific advancement in a field with vast potential to promote human flourishing.

To nonspecialists, the internecine struggle playing out among computer scientists about the world-ending potential of their craft can look perplexing, if not absurd. Has Silicon Valley hype finally, truly gone off the rails this time? Or are there good reasons to suspect that AI scientists are the nuclear physicists of the twenty-first century?

Answering these questions well and quickly is a high priority as legislators start making moves to regulate AI. But doing so requires a commonsense approach that can account for both the mind-bending dynamics of cutting edge AI systems, while also right-sizing the risks of AI regulations and AI gone wrong.

The Risks of AI

Before considering the most apocalyptic AI scenarios, it’s worth exploring more near-term risks already on the horizon. New language models like ChatGPT have already exhibited serious potential to accelerate disinformation and misinformation campaigns around the world, and may catalyze a novel wave of cybersecurity threats to unsuspecting users. AI-powered voice cloning has already been used to try to extort money from a family in a fake, but highly realistic, kidnapping scheme, to name just a few cases.

For malicious actors, powerful AI capabilities are all too easy to weaponize. As AIs become more adept at engaging with human psychology and computer systems, these threats are likely only to escalate: AI-enabled breakthroughs in cybersecurity could be leveraged to enable novel hacking methods, just as an AI that might enable robot assistants could also be used to pioneer new autonomous weapons.

Independent of any malicious actors, however, emerging AI systems also pose a second set of serious risks in the form of unintended consequences. Researchers often cluster such inadvertent risks around AI safety themes. To name a few, the issue of bias has crept into a wide range of AI models, systematically discriminating against groups for employment, loans, and even jail time. Robustness—the ability of a system to reliably perform well—has proven a matter of life-and-death in AI autopilot car accidents, military targeting systems, and health diagnosis tools. Unforeseen human-AI interactions can have disastrous consequences, such as advanced AI chatbots encouraging minors toward inappropriate sexual relationships or fatally exacerbating mental illnesses. As AI is increasingly integrated into virtually all aspects of life, these risks are only likely to grow.

But the existential risks that animate the six-month pause stem from a third family of dangers, not so much related to malicious misuses of new capabilities or unintended consequences, but from the growth trajectory of powerful AIs.

These existential risk scenarios come in a variety of forms, but generally follow a similar pattern. Theorists suggest that as AI tools become increasingly intelligent and capable, at some point they will be able to build better versions of themselves than humans can. This will lead to further compounding, exponential growth in intelligence and capabilities as smarter systems continue to build even smarter systems in a virtuous cycle.

The resulting system or systems, the thought goes, will be significantly more intelligent than humanity, and likely drift from humans’ original intentions as successive iterations become more shaped by machine feedback than by deliberate human designs. The upshot would be that just as humanity reigns over less intelligent life forms now, so also the superintelligent AI would rule over us. Needless to say, that would put humanity in a precarious position. Whether by dint of the pressures of natural selection or other motivations misaligned with humanity’s wellbeing, superintelligent AIs would likely wipe us out over time.

The theory may sound like a sci-fi parable, but it has gained a surprising amount of traction among leading AI engineers and business leaders. To underscore the likelihood of such outcomes, proponents point to how highly advanced models spontaneously develop skills that their makers did not expect, which can be difficult to detect, let alone mitigate. The most advanced language models have proven desperately difficult to understand or control in any reliable way. And AI systems now consistently beat humanity’s best competitors in our most complex strategy games, including some that require deception and manipulation—a dark omen if advanced AI were to slip out of our grasp.

For those less convinced of the inevitable ascent of God-like AIs, however, there is another parallel concern: the obsessive, millenarian attempt to build them. Those at the helm of AI’s greatest advancements share a quasi-religious commitment to realizing utopian visions of superintelligent AIs. Sam Altman, CEO of industry leader OpenAI, has declared that building AI that exceeds humanity’s intelligence is “perhaps the most important—and hopeful, and scary—project in human history.” Sundar Pichai, Google’s CEO, agrees that AI is “the most profound technology humanity is working on—more profound than fire or electricity.” The list goes on, and includes the leadership of nearly all the most influential labs in the field.

History is littered with disastrous attempts to realize utopian dreams going off the rails. The tens of billions of dollars flooding into labs every year to fuel those all-consuming visions do not bode well, nor does the intense competition between AI companies that are increasingly eschewing safety for speed. Given the highly experimental nature of emerging AI technologies, the industry’s lack of safety norms and standards, and the dangerous ambitions of its leaders, something must be done to reign in the risks that are growing at multiple levels.

The Risks of AI Safety Measures

While AI poses real risks, we must also consider the risks of proposed AI safety measures. One of the key insights of the burgeoning field of progress studies is that we must include iatrogenic costs in our assessments. Failing to do so allows tiny marginal risks to impose massive costs, as when the FDA safety assessments for the Covid-19 vaccine failed to include the additional deaths incurred due to delayed vaccine deployment in their model, or when children’s car seat regulations counted the additional lives saved in car accidents, but failed to account for the (much larger) number of children who were never born due to the increased marginal cost of having a third or fourth child.

AI safety demands extra scrutiny on this point, because so many of its policy prescriptions and cost-benefit analyses depend on the weight of tail risks—extremely low probability, extremely high cost events. While tail risks cannot be entirely discounted, their argumentative weight is extremely fragile to the assessment of likelihood—anything that makes them marginally less likely makes them massively less costly in terms of expected value. Concerns about AI safety must therefore be moderated by both an assessment of the likelihood of the proposed danger which AI poses and an assessment of the potential costs of AI safety.

Without conducting an exhaustive assessment of arguments about existential risk, we might note two factors that should give us pause as to the likelihood of tail-risk scenarios. First, while it is possible that “this time is really different,” many other new technologies brought about hysterical alarmism as well. Compared to, say, crispr or nuclear technology, the need for AI systems to bridge out of cyberspace into the messy human world seems like a flaw in many existential risk scenarios. Given a world in which most critical systems are still drearily manual and nuclear missiles still run on floppy disks, the hand-waving over how the AI will magically reach into the physical world and spin up, say, a self-assembling nanobot factory should give us pause.

Second, it is possible that the same characteristics which make AI work also reduce its danger, namely, its ability to understand context. Most original AI existential risk scenarios assumed a powerful but low-context AI, which could take a goal or a command to its absurd—and deadly—conclusion. (Such was the infamous Paperclip Maximizer, a thought experiment in which an AI would seek to eliminate humans in order to maximize the number of paper clips.) And yet, the breakthroughs in LLMs which may undergird general purpose AI systems were precisely in adding sufficient context and breadth to models. As a result, as their performance improves, they also appear to mostly adopt intuitive human premises about goals and parameters (such that a command like “improve paperclip-making efficiency” assumes “but don’t kill everyone on earth,” the same way you would not need to specify that limit to a human agent). To improve the size and scope of an LLM appears to be the same thing as to improve its grasp of human context. While research around this question is still early, it is promising that, so far, improving model performance also seems to improve alignment out of the box.

What measures would be required to fully mitigate AI-based existential risk? AI safety advocates have not minced words. The consensus holds that we must substantially and intentionally slow the pace of development, especially the deployment of new models trained on ever large data clusters. But because of the economic and military advantages AI offers, any unilateral prohibition on developing AI risks falling behind other countries, leading to arms-race dynamics. The solution, according to some of the most prominent thinkers in the field, is to develop and use global forms of power (including military power) aligned with AI safety goals.

In an article in Time magazine, Machine Intelligence Research Institute founder Eliezer Yudkowsky wrote, “If intelligence says that a country outside the [AI Safety] agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.” While Yudkowsky’s position lies at the radical edge of AI safety thinking, more mainstream figures have proposed arguably more extreme solutions. In an article on the “vulnerable world hypothesis” of long-term technological risk, Superintelligence author and founding director of Oxford’s Future of Humanity Institute Nick Bostrom proposed the development of “ubiquitous surveillance or a unipolar world order.” In a world where artificial general intelligence (AGI) provides only one of several plausible pathways to human extinction, allowing the unmitigated development of technologies (including in biotech and nanomaterials) that could have catastrophic consequences for human survival is not a path to species longevity. But, of course, enforcing such a regime would require unprecedented global governance. In a recent paper on the long-term prospect of AI displacing human beings (regardless of particular catastrophic risk pathways), Dan Hendryks, director of the Center for AI Safety, came to a similar conclusion. If AI’s superiority is fated (via a number of path dependencies), then one path to human survival might be the proactive construction of an “AI Leviathan,” in which humans seek to create a consensual world-dominating regime in which human‑friendly AIs could domesticate or eradicate all other (potentially unfriendly) AIs.

Any student of political history should find these arguments highly alarming, but the construction of a global totalitarian political order around technological development seems disturbingly plausible. And even marginal movements in this direction could result in a dramatic decline in human freedom, with dubious (and probably untestable) mitigation of long-term technological risk.

Worse, these attempts to contain existential risk from AI, if poorly conceived, could worsen other catastrophic risks. Any attempt to endow a katechon, a powerful restraining force, runs the risk of dialectically bringing about its opposite, of becoming a “hastener against its will.” In this case, constructing a world government to save humanity by halting permissionless technological advance could immediately exacerbate looming civilizational risks of climate change and the population implosion. Using apocalyptic rhetoric to construct a global political regime would slow global economic growth and make the future appear increasingly dire, further suppressing fertility rates. An aging society will then innovate less regardless, will require more care of the young for the old, and fail to have the resources, energy, and technology to reduce carbon emissions and improve carbon capture, worsening climate change. A world-ending AI singularity is still in the realm of science fiction, but it certainly seems terribly within humanity’s power to create a global authoritarian political framework that chokes our future and brings about civilizational suicide.

Commonsense Approaches to AI Safety

So if artificial intelligence presents real, serious, and difficult-to-grasp risks, but many of the proposals to contain this risk are potentially equally destructive, can we identify a path forward that acknowledges risks while still promoting the improvement of AI systems? We identify a commonsense approach to AI safety that focuses on rapidly exploiting advances in productivity brought about by AI while mitigating some of the dangers of ever more sophisticated models.

The rapid advancement of artificial intelligence can be viewed along two axes: vertical and horizontal. The vertical axis refers to the development of bigger and stronger models, which comes with many unknowns and potentially existential risks. The horizontal axis, in contrast, emphasizes the integration of current models into every corner of the economy, and is comparatively low-risk and high-reward.

Any regulation of AI must carefully distinguish between these two dimensions of investment. While applications built on current AI capabilities aren’t without risks of their own, they’re of qualitatively different magnitude compared to the risk associated with building an artificial superintelligence. From algorithmic bias to deep fakes, these lesser AI risks will likely be addressed by accelerating research, not by slowing it down. To the extent regulation impedes the commercial adoption of AI, we thus risk capping the enormous upsides of innovation even as the downsides mostly remain.

Consider the potential for current-level AI to create an explosion in internet spam. Bad actors will need to make use of open source generative models, as such activities clearly violate the terms of service of large API providers like OpenAI. Fortunately, open source models are several steps behind the state of the art. The good guys thus have an advantage in the adversarial race to build tools for combatting AI-generated spam. Any regulation that slowed down their effort would risk tilting the balance back in favor of the bad actors, as they would feel no obligation to follow the same rules.

Nevertheless, regulation is needed to prevent powerful models from going open source in the first place. Take Meta’s powerful LLaMA model, a text generation AI that works similarly to ChatGPT. Meta intended to provide researchers access to LLaMA in a controlled manner, but the model was leaked online just a week after its announcement. In lieu of regulations or fines associated with such security breaches, Meta’s public response amounted to a big “oops!”

A second class of manageable risk stems from the sheer volume of the outputs created by generative AI. From “one-click” lawsuits to the ability to generate original art with a simple text description, the floodgates have opened whether we like it or not. Fortunately, we have experience adapting to tsunamis of information in the age of the internet. Take the issue of copyrighted works under the Digital Millennium Copyright Act. While the DMCA is far from perfect, it provides a framework for resolving online copyright disputes extrajudicially and has thus helped to keep the legal system from being overwhelmed. Beyond the narrow issue of copyright, we will need analogous reforms for managing the demand for much higher throughput across our legacy institutions.

The vertical dimension of AI research is where things get hairier. Bigger models trained with more data demonstrate emergent capabilities, and in a way that defies prediction. Only a handful of large companies have the resources to push the AI frontier, but how aggressively they act is a function of competitive pressure. The U.S. government can help to forestall these arms-race dynamics by bringing the biggest players together to cooperate on shared safety protocols. This could include requiring large training runs to be publicly declared and approved before proceeding, along with the creation of industry standards for safety testing through the National Institute of Standards and Technology. Major players like OpenAI and Anthropic already engage in intensive safety and alignment research, but it’s essential that their insights into AI alignment not gain the status of a trade secret.

One of the most promising areas of alignment research is the field of mechanistic interpretability. Mechanistic interpretability can be thought of as neuroscience for artificial brains. By rigorously studying specific circuits of neurons in an artificial neural network, researchers can gain insight into how otherwise “black box” models do what they do. Unfortunately, interpretability research is still a relatively nascent field, and far behind in its understanding of large models like GPT-4.

The most consequential interpretability research requires access to the largest models and millions of dollars in capital for training and retraining model variations. This represents a potential market failure, as AI companies have limited incentive to put money into models with no downstream commercial purpose. The U.S. government could address both these access and cost issues by creating a supercomputing cluster of its own. A large test bed for mechanistic interpretability research would help to grow the field and democratize researchers’ access to the innards of large models. In addition, the government could require large AI companies to publish interpretability and safety standards to improve knowledge-sharing around AI safety.

Public procurement of advanced chips would advance Congress’s goal of building-up America’s domestic semiconductor capacity. But public ownership is perhaps most important as a hedge against the development of AI leading to runaway corporate power. Indeed, without a secure platform for training models specific to governmental needs, we may wake up one day to find our nation’s prosperity and security beholden to a single private actor.

Conlusion

While these are only a few examples of potential AI safety policies, it is our hope that the field eschews an over-focus on extreme tail risk and instead develops measures that improve the safety and reliability of the kinds of systems we are likely to deploy, and does so in a manner respectful of the governance traditions that have underpinned scientific progress in the West.

Also key to developing a path forward for AI safety will be to involve new kinds of knowledge and experts in the conversation. As Tyler Cowen pointed out, one shocking dimension of the AI-pause open letter was the utter lack of leaders from domains outside of computer science and tech (as opposed to similar letters in the nuclear era, for instance). For many years, AI safety research was an obscure field dominated by theoretical computer science with a sprinkling of analytic philosophy and physics. Now that powerful AI is here, the AI safety conversation has become stultified within these disciplinary boundaries. But many current AI safety issues involve other kinds of expertise. Underinvestment in safety research is a classic public goods problem—the economists would like a word. Speculations about arms-race dynamics leading to AGI would benefit from engaging with international relations scholars who have developed robust models of security competition. OpenAI’s whitepapers on safety risk have dozens of technical researchers, but the social scientists and humanities scholars can be counted on one hand. In order to develop AI safety measures that are both desirable and achievable, we need to integrate insights from across all of human knowledge, and cease relying on science fiction parables developed on internet message boards.

This article originally appeared in American Affairs Volume VII, Number 2 (Summer 2023): 162–70.

Note

The authors thank Bill Drexel, associate fellow in the Technology and National Security Program at CNAS, for his assistance with this article.