This publication is licensed under the terms of the Creative Commons Attribution License 4.0 which permits unrestricted use, provided the original authors and source are credited.
A changing threat landscape
Right now, deep in sensitive compartmented information facilities, US and UK intelligence analysts are closely monitoring geopolitical developments around the world to identify potential threats and offer strategic warnings to policymakers in Washington and London. Strategic warnings from the intelligence community (IC) can work in two ways. First, they can be critical forecasts of events that are harmful to US and UK strategic interests. For example, an alert about People’s Liberation Army Navy vessels moving to encircle Taiwan could indicate that Chinese President Xi Jinping is preparing to invade. Alternatively, a warning could pertain to an event that is beneficial or harmful to strategic interests, such as an abrupt movement of Russian military forces away from Kyiv and towards Moscow, portending a regime collapse in Russia. In both cases, the purpose of these alerts is to give political leaders a decision advantage.
The US and UK ICs have traditionally relied on the deep subject matter expertise of human analysts and exquisite intelligence collection to make qualitative predictions about the likelihood of belligerent activities, political transitions and conciliatory responses. These early warnings are crucial to policymakers who are making both tactical and strategic decisions. Analysts go through rigorous training to ensure that their work is easily explainable and defensible. However, developments such as Rhombus Power’s reportedly accurate predictions of Russia’s invasion of Ukraine and the US military’s Raven Sentry predictions of Taliban attacks in Afghanistan are drawing policymakers’ attention to the potential utility of artificial intelligence (AI) for strategic warning. Notably, these potential successes did not require access to classified data.[1]
Such predictions appear to mark a significant departure from manual weighing and analysis of information, as well as from more basic attempts to introduce machine learning tools into intelligence analysis. Manual analysis is often laborious, potentially resulting in delays or even failure to deliver insights in time for busy policymakers to react to a crisis. AI could solve this problem. Given that adversaries and competitors are exploring and exploiting developments in AI, the ICs should urgently integrate AI tools[2] into their analysis and assessment workflow. There are several national security use cases in which AI could immediately enhance human analysis.
Advancements in machine learning,[3] deep learning,[4] neural networks,[5] foundation models[6] and generative AI[7] all present immense long-term opportunities to increase both the speed and quality of strategic warnings. These benefits could include alerting analysts and policymakers to political events that are about to unfold in an otherwise unmonitored area, and warning them of military aggression in monitored regions faster than human analysts can.
The main advantage of these improvements is that policymakers would have more time to act. They could reallocate resources and better weigh the consequences of various risk-mitigation measures, such as those that form part of a deterrence strategy or that soften the blow of a pending attack. In short, AI has the potential to help policymakers spend more time on the strategic – rather than tactical – aspects of decision-making.
Limitations of current AI predictions
Political scientists have long used quantitative models to predict wars, coups and the negotiation of peace treaties. By incorporating statistics into their analysis, researchers can identify important correlative patterns, including those that occur across different regions. Analysts have used early machine learning tools to incorporate more data and thereby sharpen their assessments, as seen in the ViEWS competition, which tested models of conflict in Africa against one another.[8]
Despite the growing use of machine learning in geopolitical analysis, a fundamental challenge remains. Many geopolitical events of interest to policymakers are highly nuanced and rare, and history only unfolds once. Analysts trying to predict the regional effects of Russia’s war on Ukraine do not have data on thousands of versions of the conflict; they infer imperfectly from a heterogeneous dataset of comparable outbreaks of violence.
Although the tools emerging from industry have not yet overcome this challenge, they are taking several approaches to tackling it. Many available tools do not specify the type of AI they employ, but most fall into one of two categories: traditional risk-assessment models based on machine learning and newer systems centred on large language models (LLMs), which draw on open-source intelligence to assist human analysts. These tools help analysts distil greater volumes of information faster, and can also uncover patterns that would otherwise go unnoticed by human analysts.
However, the outputs of these AI tools highlight broad hotspots of existing violence, which often remain static over time. For example, with Burkina Faso having experienced several coups in the past decade, it would not be surprising to see coups in the future, but it is unclear whether an AI model that predicts political instability in the country will be significantly more reliable than an analyst who does the same thing. There is limited evidence that these tools can accurately predict novel outbreaks of violence, particularly in areas that have been historically peaceful. In addition, these models cannot get into the minds of leaders who make the decisions that produce such outcomes. This means that the outputs of current AI tools are best used to enhance data fed into the model rather than as a source of actionable intelligence.
Technical challenges in AI predictions
In this context, the technical challenges of existing AI systems primarily stem from suboptimal data infrastructure – the structure governing the software, policies, storage and sharing of the system – and persistent human biases in the data. As a result, the data is fragmented and inconsistent, forcing analysts to work with amalgamations of disparate datasets. While industry tools can purportedly overcome some of these technical challenges, they typically lack access to sensitive and classified information. Optimising datasets for AI systems that make geopolitical predictions would likely improve the reliability of their outputs, helping avoid issues such as false positives. Crucially, many of these technical challenges apply to AI development writ large. As a result, significant efforts have been undertaken to address them in both the private sector and academia. This suggests that many of the following challenges could become less of an obstacle as the relevant tools develop:
- Overreliance on historical data and limited data sources. Current AI tools typically do not incorporate the status quo, which leads them to overfit data to specific states and perform irrationally when dealing with information on novel events.[9] In addition, most analysts and modelling tools that make geopolitical predictions use publicly available databases such as Armed Conflict Location and Event Data (ACLED),[10] the Uppsala Conflict Data Program[11] and the Strauss Center’s Social Conflict Analysis Database.[12] Models that rely on the same limited datasets become vulnerable to any issues with that data, including those related to the coding of certain events and the frequency of updates. These factors matter because they could lead an AI-based system to make the wrong links, incorrectly classify events or miss anomalies.[13]
- Limited time frames. These affect models in two ways. First, the data in the training set will always have a cut-off date long before the target data, which means that it will not incorporate recent geopolitical events. Second, if the goal is to give policymakers more time to respond, these systems do not yet serve that purpose. Some systems focus on providing highly detailed, real-time assessments.[14] The more detailed the prediction, the shorter the time frame for a forecast. This could be an issue with, for example, predictions of the destination of troop movements – which would often be required in less than a week.
- Missing concepts in previous studies. Industry and open-source tools predominantly show intra-state – rather than inter-state – conflict and do not incorporate contested areas such as space or international waters. Moreover, existing tools do not incorporate hard-to-quantify social factors such as patterns of life, dissent, infighting, deceit, collective memory, public opinion and realistic information/disinformation flows. To develop models that help analysts reach the right conclusions, the representation of these events must be realistic.
- Hallucinations. LLMs are still prone to hallucination,[15] meaning that they can produce nonsensical or otherwise incorrect outputs. Hallucinations can also compound analysts’ biases depending on the data on which the system is trained. However, there have been concerted research efforts[16] to address the issue of hallucinations, likely reducing their impact in the future.
- A lack of reliable performance metrics. As systems evolve and use different types of AI, it will be challenging to compare performance between models, especially those that excel at different functions. When each system requires significant financial resources to acquire, policymakers will want to invest in the ‘best’ one. For example, in the VIEWS competition, some models performed better[17] at predicting stabilising events while others were more able to predict escalatory events. Developers perform internal validation checks on their models, hoping to prove their superiority. This suggests that developer-approved external performance metrics will likely emerge as more AI tools enter the market.
- Data access. All models in the public domain are trained on open-source data, creating challenges if the target data includes classified information the model is not equipped to handle, or if the data is polluted with misinformation and disinformation.[18] While open-source data may sometimes be sufficient for precise forecasts, models would be significantly more reliable if they had access to the full corpus of relevant data and were designed to handle those other forms of data.
Broader data collection and governance barriers also continue to hinder progress towards integrated data infrastructure. These include issues around data security and sovereignty, decision-making governance, affordability, the technical maturity of AI techniques and the establishment of optimal design and lifecycle-management processes[19] for AI-based information systems.
Finally, an overarching challenge remains with AI system explainability: the degree to which a human can understand how these systems generate a forecast or arrive at a prediction. If analysts and, subsequently, policymakers hope to rely on the outputs of such a system to make consequential decisions, the system’s computational processing must be sufficiently robust and explainable.
Emerging opportunities
Despite these significant technical challenges, AI capabilities are growing rapidly. A natural product of this growth is improved responses to the challenges discussed above. The most promising emerging opportunities are as follows:
- AI to improve the data foundation. AI tools that can more quickly digitise historical data while simultaneously incorporating real-time data would strengthen the foundations of training data. Graph neural networks,[20] foundation models[21] and few-shot learning techniques[22] could augment this effort by learning underlying patterns and predicting missing information in data.
- AI for collection optimisation and data processing. As large language models grow in parameter size, context length and overall sophistication, they could more rapidly synthesise fused information[23] and generate possible threat scenarios[24] and courses of action,[25] with fewer hallucinations. While LLMs have been in the spotlight ever since the advent of OpenAI’s ChatGPT, machine learning-enabled tools with improved statistical prediction could analyse a range of data sources, socioeconomic factors, political developments and past instances of violence to produce more accurate forecasts.[26]
- AI for batch analysis and pattern identification. Deep reinforcement learning could build on these enhanced datasets and potentially establish causality between events. Deep neural networks[27] and LLM-based agents[28] could approximate potential outcomes by adapting, learning and thinking in unique ways based on the input data provided by analysts. This is in contrast to traditional modelling in geopolitical forecasting, which is typically limited to fixed algorithms or scripted patterns. Developments in chain-of-thought reasoning and scaffolding,[29] as well as compound systems,[30] could sharpen forecasts through improved pattern identification, problem-solving and multitasking.
- AI to crowdsource human forecasts. Deep reinforcement learning could also merge multiple crowdsourced human forecasts[31] (similar to the Aggregative Contingent Estimation Program[32] or Cosmic Bazaar)[33] with algorithmic predictions, and could generalise from known states to novel states,[34] resulting in better predictions in underexplored areas.
- AI to automate human feedback. Automatic human feedback[35] will help calibrate and improve the outputs of AI systems, and that information will be used to fine-tune the model.
When and where to invest
In the US and the UK, ICs understand the need to consider integrating AI into analysis workflows,[36] and there is growing interest in the long-term benefits of AI for strategic warning systems. However, the development of these tools will require investment and resources. Whether developing a system in-house or procuring one from industry, there are costs related to both producing the tool and training the workforce. Decision-makers will also need to consider the cost of inaction, particularly in relation to how adversaries could combine the strengths of artificial and human intelligence to outperform current human and traditional modelling systems. These trade-offs are juxtaposed with the fact that AI has not yet demonstrated added value beyond the skills of a human.
So, why invest at all? Providing strategic warning is a core function of any IC, and while the current technological state of play might not suggest it is ready to be an immediate use-case for intelligent analysts, it will almost certainly need to be a future one. The goal is not to replace human analysis but rather to create a system that leverages the advantages of a human-machine team. Moreover, these models have already started proliferating in the commercial sector and will continue to do so with or without government involvement.
Investment now will give analysts and policymakers a seat at the table – an opportunity to provide feedback that ensures their interests shape the development of these systems, particularly in relation to key adversaries such as China, Russia, Iran and North Korea. Conversely, a lack of investment would heighten the risk of information disadvantage or intelligence failure if these adversaries developed and adopted superior AI-based strategic warning systems. With that in mind, it is imperative to think about AI as a tool for not only strategic warning but also as part of the technological landscape for which policymakers need strategic warnings.
In the same vein, it is important to consider how a strategic warning system could become the subject of an attack. For example, deep learning-based systems could be tricked with data poisoning or prompt injection,[37] which could lead to the system misclassifying event data. Investment into guardrails against adversarial AI will be paramount as these tools develop, to prevent geopolitical rivals from manipulating inputs and outputs.
It is now time for the ICs to lay the groundwork for optimal use cases of AI for strategic warning. By bridging the gap between current and future technological capabilities and the needs of government users, the US and the UK can navigate the technological landscape in a mutually beneficial way.
The views expressed in this article are those of the authors, and do not necessarily represent the views of The Alan Turing Institute or any other organisation.
References
[1] Decisive Point, “Raven Sentry: Employing AI for Indications and Warnings in Afghanistan,” July 2024, https://media.defense.gov/2024/Jul/30/2003514388/-1/-1/0/DP-5-14-SPAHR-TRANSCRIPT.PDF.
[2] Special Competitive Studies Project, “The Future of Intelligence Analysis: US-Australia Project on AI and Human Machine Teaming,” September 2024.
[3] Janosch Delcker, “Meteorologists of violence,” Politico, 15 March 2020, https://www.politico.eu/article/artificial-intelligence-conflict-war-prediction/.
[4] Giuseppe Nebbione et al., “Deep Neural Ranking for Crowdsourced Geopolitical Event Forecasting,” Computer Science and Engineering 5, 2019.
[5] Benjamin Radford, “High resolution conflict forecasting with spatial convolutions and long short-term memory,” International Interactions 48, no. 5.
[6] Joost van Oijen and Pieter de Marez Oyens, “Empowering military decision support through the synergy of AI and simulation,” NATO, November 2023.
[7] Ruben Stewart and Georgia Hinds, “Algorithms of war: the use of artificial intelligence in decision making in armed conflict,” Humanitarian Law & Policy, 24 October 2023, https://blogs.icrc.org/law-and-policy/2023/10/24/algorithms-of-war-use-of-artificial-intelligence-decision-making-armed-conflict/.
[8] Views Forecasting, “The pilot early-warning system (ViEWS),” https://viewsforecasting.org/research/the-pilot-model/.
[9] Scotty Black and Christian Darken, “Scaling artificial intelligence for digital wargaming in support of decision-making,” NATO, February 2024.
[10] ACLED, “Armed Conflict Location and Event Data,” https://acleddata.com.
[11] Uppsala Conflict Data Program, https://ucdp.uu.se/exploratory.
[12] Strauss Center, “Social Conflict Analysis Database,” https://www.strausscenter.org/ccaps-research-areas/social-conflict/database/.
[13] Mayank Kejriwal, “Link prediction between structured geopolitical events: models and experiments,” Big Data Networks 4, November 2021.
[14] GeoQuant, “GeoQuant’s World Governance Indicator Nowcast,” https://www.geoquant.com/papers.
[15] Jim Waldo and Soline Boussard, “GPTs and hallucination: why do large language models hallucinate?,” Queue 22, no.4.
[16] Sebastian Farquhar et al., “Detecting hallucinations in large language models using semantic entropy.” Nature 630: 625-630, 2024.
[17] Paola Vesco et al., “United they stand: Findings from an escalation prediction competition,” International Interactions 48, no. 4, https://www.tandfonline.com/doi/epdf/10.1080/03050629.2022.2029856?needAccess=true.
[18] James Ryseff, Brandon de Bruhl and Sydne J. Newberry, “The root causes of failure for artificial intelligence projects and how they can succeed,” RAND Corporation, August 2024, https://www.rand.org/content/dam/rand/pubs/research_reports/RRA2600/RRA2680-1/RAND_RRA2680-1.pdf.
[19] Roshanak Rose Nilchiani, Dinesh Verma and Philip S. Anton, “Joint all-domain command and control (JADC2) opportunities on the horizon,” Acquisition Research Program, May 2023.
[20] Van Oijen (2023).
[21] Van Oijen (2023).
[22] Archit Parnami and Minwoo Lee, “Learning from few examples: a summary of approaches to few-shot learning,” ArXiv (March 2022), https://arxiv.org/abs/2203.04291.
[23] David Jungwirth and Daniele Haluza, “Forecasting geopolitical conflicts using GPT-3 AI: Reali-Ty-Check One Year into the 2022 Ukraine War,” https://www.preprints.org/manuscript/202302.0065/v1.
[24] Daniel J. Finkenstadt et al., “Use GenAI to improve scenario planning,” Harvard Business Review, 30 November 2023, https://hbr.org/2023/11/use-genai-to-improve-scenario-planning.
[25] Vinicius Goecks and Nicholas Waytowich, “COA-GPT: Generative pre-trained transformers for accelerated course of action development in military operations,” ArXiv (March 2024), https://arxiv.org/pdf/2402.01786.
[26] Cordis Europa, “Using machine learning to identify political violence and anticipate conflict,” https://cordis.europa.eu/article/id/443344-using-machine-learning-to-identify-political-violence-and-anticipate-conflict.
[27] Black (2024).
[28] Guanzhi Wang et al., “Voyager: an open-ended embodied agent with large language models,” ArXiv (October 2023), https://arxiv.org/abs/2305.16291.
[29] Stanford Encyclopaedia of Philosophy, “Logic-based artificial intelligence,” https://plato.stanford.edu/entries/logic-ai/.
[30] Matei Zaharia et al., “The shift from models to compound AI systems,” Berkeley Artificial Intelligence Research, 18 February 2024, https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/.
[31] Nebbione et al. (2019).
[32] Good Judgement, “Superforecasting services,” https://goodjudgment.com.
[33] “How spooks are turning to superforecasting in the Cosmic Bazaar,” The Economist, https://www.economist.com/science-and-technology/2021/04/15/how-spooks-are-turning-to-superforecasting-in-the-cosmic-bazaar.
[34] Black (2024).
[35] Nisan Stiennon et al. “Learning to summarize from human feedback,” 24th Conference on Neural Information Processing System, https://proceedings.neurips.cc/paper_files/paper/2020/file/1f89885d556929e98d3ef9b86448f951-Paper.pdf.
[36] Frank Konkel, “The U.S. intelligence community is embracing generative AI,” Government Executive, 8 July 2024, https://www.govexec.com/technology/2024/07/us-intelligence-community-embracing-generative-ai/397867/.
[37] Stewart (2023).
Authors
Citation information
Anna Knack and Nandita Balakrishnan, "The State of AI for Strategic Warning," CETaS Expert Analysis (November 2024).