Automatic Diplomatic

Last autumn, one of Meta’s AI research groups built “Cicero”, an AI program able to achieve human-level performance in the board game Diplomacy. A little under a month later, though to considerably less fanfare, a separate team working at DeepMind developed their own set of Diplomacy agents. The game has been a longstanding “grand challenge” for AI: agents must not only develop their own plans, but also communicate and cooperate with other (human) players. The success of these two projects demonstrated that AI can perform well in highly strategic, partially collaborative situations, and inspired suggestions that human diplomats should be given AI “co-pilots” as a means of improving decision quality as well as efficiency.

Should career diplomats be worried about their profession being the next to fall victim to technology-induced obsolescence? Perhaps. But for those whose mission is to ensure global security and stability, a more pressing concern should be the wider harms of integrating AI into diplomacy. Though enthusiasm about AI in foreign affairs is widespread, the world’s great powers have largely directed their focus towards military applications of AI on the battlefield and in war planning. Tentative proposals have started to be made for AI’s applications in diplomacy, but there has so far been limited discussion of the associated risks, and how they might be mitigated. Fortunately, there is still time for these conversations to happen, and for the international community to come to agreement on a set of norms and standards surrounding the use of AI in diplomacy before its use becomes widespread – avoiding the need to retrospectively regulate the technology, unlike with, for instance, lethal autonomous weapons.

Although often used as a catch-all term, AI is a broad category describing technologies with vastly different levels of sophistication. More advanced diplomatic artificial intelligence agents (diploAIs) bring the allure of greater advantages for the nations which create them, but also come with significantly greater downside risks. To help identify the specific set of benefits and failure cases associated with different levels of AI integration, I propose a taxonomy of diploAIs modelled on the Society of Automotive Engineers’s classification of autonomous driving technologies:

Level	Definition	Example or explanation
Level 1	Automated assistance to human diplomats	As a “co-pilot” suggesting courses of action when requested. For instance, drafting diplomatic correspondence; devising strategies for trade negotiations.
Level 2	Automation of aspects of diplomacy	For instance, negotiating prisoner exchanges or arranging diplomatic expulsions.
Level 3	Automation of peacetime diplomacy with human oversight	Has a human-determined option space from which to select policies that meet a narrowly specified objective function. For instance, “pick the option from {take no action, expel diplomats, institute retaliatory economic sanctions} that minimises the chance of further incursions on US airspace by Chinese military surveillance”.
Level 4	Full automation of peacetime diplomacy	Selects from and acts on any policy short of war that meets a broadly specified objective function.
Level 5	Full automation of diplomacy including declarations of war

At present, only Level 1 technologies are publicly known to be in use, and have been for several years (such through IBM Watson, which was applied to assist diplomats engaged in trade negotiations).

There are conceivable benefits which would incentivise states to attempt to develop diploAIs, with the hope of gaining a valuable advantage over adversaries. However, the inherent competition within international relations risks leading to harmful feedback loops and the outbreak of a corner-cutting race to be the first country to build such a technology first. Disregarding safety as a result of hasty adoption would be harmful at both the national and international level, because of the variety of failure cases which may emerge from these technologies.

There are three sources of potential failure: unavoidable technical shortcomings of the models, their interactions with humans, and the process of diploAIs’ adoption.

The threats falling into the first category are not unique to diploAIs, but rather features of all advanced AI models. These systems gain capabilities through a training regime, where they gradually learn to predict outputs based on a broad set of input data. The limited amount of historical training examples in international relations means it is difficult to be sure that the conclusions drawn by the AI model will generalise to the present day. If, for example, cultural and technological changes of context mean that lessons learned from studying historical data cannot be meaningfully applied in today’s geopolitical landscape, this “distributional shift” may lead to diploAIs making misguided and potentially dangerous recommendations. In medium-stakes environments such as finance, the effects of AI errors such as flash crashes are, though undesirable and costly, sometimes a price those using the technology are willing to pay in exchange for overall gains in the long run. In international relations, on the other hand, the catastrophic downside consequences of AI missteps mean that even a small increase in error rate compared with humans is unacceptable.

Giving an AI access to a state’s economic, diplomatic, and potentially military resources would also open up new avenues of attack for adversary nations and maleficent nonstate actors. DiploAIs would inevitably be a target for manipulation or reverse engineering, potentially leading to a dangerous further expansion of “Gray Zone” cyberwarfare, where the rules of engagement are poorly-defined and the risks of escalation substantial. Moreover, even if diploAIs were invulnerable to external influence, they may still act contrary to the intentions of those who designed them. Although human diplomats have strong intuitions around how literally to interpret a goal such as to “avoid war”, these are difficult to codify in a way AI models can understand. In other words, imprecise definition of the system’s goals may result in what is known as “outer misalignment”, where the diploAI’s programmed objectives do not correctly capture what the humans creating it actually want – and are potentially diametrically opposed (to take an extreme example, one way to naively “minimise conflict” would be for a diploAI system to kill all humans).

The risk of outer misalignment illustrates the tensions which emerge at the interface between humans and AI. The trust-based nature of diplomacy therefore creates a particular class of obstacles to AI integration, even in the absence of any technical flaws. Superhuman AI systems can often uncover novel and creative approaches never thought of by human experts – for instance, discovering more efficient methods of matrix multiplication, or unconventional strategies in the game Go. Whilst this form of ingenuity can give AI a significant advantage in other domains, within diplomacy, wholly unexpected actions carried out by diploAIs may lead to unwanted and accidental escalation between adversaries, as both sides deal with the confusion of an unanticipated response or retaliation. For instance, how would the West have reacted if, acting on the counsel of a diploAI, Russia sought to sabotage Poland’s internet infrastructure as retribution for their supplying of tanks to Ukraine? The unexpectedness of such an assault, and its lack of precedents, would make it more difficult for Poland’s allies to formulate a response that would deter future belligerence without provoking Russia into more AI-advised aggression.

The capacity for unpredictable behaviour is exacerbated by the fact that AIs currently act as black boxes, producing outputs from inputs in a process that is not easily interpretable from the outside. This lack of clarity would make it difficult for human diplomats to understand the layers of reasoning underlying the diploAI’s strategy, preventing them from working out whether these are flawed and need correction, or innovative but sound. Such opaqueness would likely lead to low trust in diploAIs by human diplomats (of the AI agents’ own nations, but also of other countries who may not yet have their own diploAIs).

Finally, the deployment and diffusion dynamics of diploAIs may pose a structural, collective threat to all nations. These technologies’ superhuman forecasting abilities may, counterintuitively, be to the detriment of global stability, because of the erosion of mutually assured damage or destruction. Human tendencies towards risk aversion, coupled with strategic ambiguity from great powers, play an important role in discouraging aggression. If, on the other hand, a nation were able to estimate the consequences of initiating unprovoked aggression at every point in time and then strike when net retaliation would be minimised, it seems likely that the amount of conflict in the world would increase. America has long practised a policy of strategic ambiguity around its commitments to Taiwan’s defence, with the aim of deterring China from invading the island. If China were able to accurately simulate the decision-making of US leaders and establish with confidence what their response would be (and then attack if the expected consequences were sufficiently small), the efficacy of America’s deliberate vagueness in discouraging such Chinese aggression would be severely reduced.

The development of diploAI by a rising power like China (or indeed India) before other nations may also disrupt the global balance of power, by giving them an advantage in foreign relations and leading to conflict arising from a “Thucydides trap”. The perceived benefits of being the first nation to create diploAI may lead to a dangerously rapid race to develop such technologies with insufficient regard for safety, creating a situation in which diploAI itself becomes a key issue in international relations.

This is not to say that the uses of AI in diplomacy should be left wholly unexplored. If implemented carefully and following multilateral agreement, the technology could have significant global benefits. The superhuman strategic reasoning abilities of diploAIs could also help nations work towards the common interest, by identifying potential alliances or areas ripe for collaboration that would otherwise be overlooked by humans. Taking a more pragmatic, issue-by-issue approach to diplomacy, as facilitated by diploAIs, would allow a greater degree of cooperation on shared challenges like AI regulation and biosecurity, even between countries that compete in other areas.

One problem studied in international relations is that of information exchange: because diplomatic speech is low-cost for nations to engage in, there are strong incentives for nations to mislead their adversaries. This in turn leads to a reliance on costlier signals such as brinkmanship, which raise the risk of conflict. The use of diploAIs could help nations engage in a less deceptive and combative forms of diplomacy: despite human players in Diplomacy often misleading their opponents, Cicero was able to achieve human-level performance even whilst being “largely honest and helpful” to others – behaviour which was achieved simply by penalising the model whenever it did not act in line with its stated intentions. Under the auspices of an existing international agency, or one newly created to regulate AI matters, a similar diploAI honesty mechanism could be agreed upon internationally, and then verified by other nations in regular algorithmic inspections (perhaps using novel cryptographic techniques such as zero-knowledge proofs), to build mutual confidence. Protocols for the transfer of data between different states’ diploAIs could then be established, as a sort of hotline for non-allied nations to communicate information each side could trust to be true.

Between adversaries, the sharing of conclusions produced by diploAIs could also help to resolve disagreements, without either side resorting to war. As an illustrative example, imagine India’s government was actively considering an annexation of Kashmir. They might use a diploAI to estimate the direct and indirect costs of such a conflict, as well as the benefits they imagined such a move could bring. Whilst we should be cognisant of the risk that being able to accurately model the outcomes of hypothetical aggression might embolden bad actors, as discussed with regard to China and Taiwan earlier, it is also possible that these sorts of simulations would help pre-emptive peace settlements to be agreed upon. India could share its findings with Pakistan’s leaders directly (who might choose to cross-check them with their own diploAI), demonstrating that, though painful for both sides, India would almost certainly win (in our imagined scenario). This communication would help bring both parties to the negotiating table, with Pakistan willing to make concessions that end up benefitting both sides compared with if a war had broken out. Of course, we would need to be wary of the risk that the true counterfactual without diploAIs in our example would not have been war in Kashmir, but rather a continuation of the stable status quo, with uncertainty and risk aversion discouraging India from aggression. DiploAI-enabled modelling of this sort may merely incentivise military arms races and sabre rattling, fuelled by a renewed might-is-right mentality.

The many double-edged applications outlined above are simply a small illustration of how the development of AI is at once a threat and an opportunity for global security. A framework for worldwide cooperation and consensus-building on how best to regulate AI is essential to minimise the risks and unlock the advantages, for international relations and beyond. The next great diplomatic effort may be negotiating a new paradigm for the practice of diplomacy itself.

The risks -- and potential benefits -- of AI in great power diplomacy