The Alan Turing Institute

Applying AI to Strategic Warning

Modelling instability risks and stabilisation factors for intelligence and national security

Research Report

Anna Knack, Nandita Balakrishnan, Timothy Clancy

27 March 2025

Abstract

This joint report from the Special Competitive Studies Project (SCSP) and CETaS explores the potential of AI systems to make assessments about geopolitical events, and the path ahead for applying AI to strategic warning for national security and intelligence. With strategic AI competition intensifying and technological change accelerating, deliberate action and foresight are indispensable. A performant AI system could give decision-makers in the US and the UK more time to respond to crises and effectively allocate resources. If AI is to make precise predictions about geopolitical events, it will need to overcome challenges relating to data scarcity and inconsistency, and to modelling the decisions of individuals. While there is currently no AI system that can accurately predict geopolitical flashpoints or forecast their implications, the advent of artificial general intelligence could change the playing field. The two most promising use cases identified in this research are AI to track conflict risk indicators and to leverage increased quantities and types of data. Any project to unlock the transformative potential of AI for strategic warning will be expensive, time-consuming and politically sensitive. Yet it could help the US and the UK maintain decision advantage over aggressors in future conflicts and geopolitical crises.

This work is licensed under the terms of the Creative Commons Attribution License 4.0 which permits unrestricted use, provided the original authors and source are credited. The license is available at: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.

Executive Summary

This joint report from SCSP and CETaS explores the potential of AI systems to make assessments about geopolitical events, and the path ahead for applying AI to strategic warning for national security and intelligence.

Strategic surprise – an unforeseen event or development often driven by an adversary – can jeopardise human lives and impose substantial security and financial costs. Consequently, national decision-makers are continuously seeking ways to improve their detection of changes to the geopolitical picture, adversarial activities and intentions, and exogenous strategic shocks that may impact their core interests. The rapid improvements in AI systems’ ability to digest and analyse enormous amounts of data, and reports of commercial AI systems successfully predicting events leading up to the invasion of Ukraine, have increased interest in using AI to improve early warning and assessments of geopolitical events.

Furthermore, strategic AI competition between Western democracies and the so-called Axis of Disruptors (China, Russia, North Korea and Iran) has intensified amid the emergence of Chinese AI models, which claim to be closing the gap on US dominance in AI technology while using fewer resources. Against this backdrop, cultivating a performant AI system that gives decision-makers more time to respond to crises and effectively allocate resources would be a significant contribution to decision advantage.

Our research has found that, while there is currently no AI system that can reliably predict geopolitical flashpoints or forecast their implications with high accuracy and precision, the state of the technology is changing rapidly and the advent of artificial general intelligence (AGI),[1] which some experts predict could arrive within two to three years, could change the playing field. The next generation of AGI systems could provide a considerable uplift to strategic warning in several ways in the medium term, giving decision-makers more time to respond to crises. The two most promising use cases identified in this research are using AI to:

  1. Track conflict risk indicators more accurately, by leveraging increased quantities and types of data.
  2. Identify possible outcomes and scenarios immediately after a shock or trigger happens. This could be particularly valuable for regions or topics that usually receive scant attention from intelligence services.

Currently, the two main challenges inhibiting AI from making effective predictions about geopolitical events are:

  1. Data scarcity and inconsistency, since triggers of geopolitical events of interest are rare. In some regions, data may not be collected at all or may not exist in a form that is suitable for training an AI model. Historically, conflict data has been gathered in an unsystematic way and is not consistently parameterised by conflict risks, conditions, triggers and tipping points.
  2. Modelling the decisions of individuals, whose intentions may sometimes be random, impulsive or opportunistic – and which are challenging to deduce even with high-quality classified intelligence. This includes decisions by state leaders, their advisers, dissenters, individuals with deceptive intentions and the actions of individuals that lead to tipping points – such as the fruit seller who set himself on fire and triggered the Arab Spring.

Building on the findings of this project, we recommend the US and UK national security communities embark on an ambitious, collaborative three-phase project that takes the critical step of addressing systemic data infrastructure challenges and organisational muscle memory – and that will make an eventual leap to an integrated simulation of all possible risk and stabilisation factors more likely to succeed. Many of the technical challenges will be overcome not by more AI research and development, but by a moonshot project that radically tackles the data challenges of the intelligence community (IC). This project should enable the national security community to leverage available best-in-class AI models and build confidence in system reliability, without being delayed by technical bottlenecks and unfeasible costs.

  • Phase 1: Establish AI training and testing of data foundations by optimising geopolitical event data collection practices. Establish shared standards to ensure consistency in geopolitical event risk data collection and assessment, including non-traditional data sources and quantifying complex behavioural and decision-making data such as decision-makers’ and groups’ intentions, grievances and biases.

  • Phase 2: Nurture a suite of best-in-class models that assist analysts, while prototyping an initial generalisable model of how geopolitical events of interest materialise. The suite of government and industry models would assist in analytic tradecraft and be trained on different types and classifications of data to circumvent major data fusion, traceability, AI security, governance and cost challenges that directly pursuing an integrated simulation of the world will likely entail. The outputs of models will be data points for the analyst rather than a source of finished intelligence, and data points will be compared and triangulated. The models would then be continuously retrained and re-weighted while methods of scientifically validating and measuring data quality were developed.

  • Phase 3: Develop an integrated AI-based simulation platform for strategic warning. The platform should represent geopolitical risk and stabilisation factors in high fidelity with high-quality datasets, augmented by synthetic data. The simulation would run these factors thousands of times to understand the different ways conflict might erupt.

There are three key policy considerations associated with this proposed initiative:

  • Cost. This will be an expensive endeavour, with initial costs for set-up equipment, data engineering, workforce training and supporting infrastructure. But these costs must be balanced against the opportunity cost and potential consequences of not adopting AI for strategic warning.

  • AI sovereignty. There will be benefits and trade-offs around whether to pursue such a tool unilaterally or multilaterally. The involvement of allies and partners could reduce duplication, spread risks and reduce individual costs, but could also introduce delays and challenges in harmonising regulatory frameworks and compliance requirements.

  • Industry collaboration. Policymakers must consider whether they wish to prioritise autonomy by building such systems in-house – which would incur more direct research and development costs – or sacrifice a degree of control and autonomy by contracting out research and development (R&D) processes to trusted industry partners.

We assess that radical enhancements are possible, but will only be achieved by addressing the upfront costs of the systemic data infrastructure challenges in the UK and US national security communities. Unlocking the transformative potential that AI offers for strategic warning will not be achieved by procuring the lowest cost, yet technically feasible tool. This will be an expensive, time-consuming and politically sensitive project. But it could prove pivotal in maintaining decision advantage over aggressors in future conflicts or geopolitical crises.

1. Introduction

In the fraught aftermath of the 9/11 attacks against the United States, the world witnessed how a few missed warning signs could claim thousands of lives, trigger decades of conflict, and alter the national security landscape for a generation. Intelligence failures[2] will always be a risk for any state’s IC because human analysts are pressed for time, working with limited data and/or can be wedded to existing mental models of how the world works.

The stakes of strategic surprises speak to the primary role of the intelligence analytic community: to give policymakers strategic warnings about key events of interest. For US policymakers, their primary national security concerns are turned towards understanding what is happening in the Axis of Disrupters (China, Russia, Iran, and North Korea) because the actions and policies of these adversaries have the highest impact on US national security priorities. These policymakers turn to their IC to give strategic warnings as early as possible about the possibility of Chinese troops moving in the Taiwan straits, Russia escalating (or de-escalating engagement) in Ukraine, the threat of a nuclear launch from North Korea towards Seoul or the possibility of a political transition in Iran.

For UK counterparts, their analysts must thoroughly assess how middle-power states and internal conflicts could evolve, ensuring they anticipate and address potential impacts on the nation’s broader security landscape. Given the UK’s status as a global financial hub with strong security partnerships, the ascendance or destabilisation of middle-power states – particularly in the Commonwealth – can directly impact British interests and regional stability. Moreover, aggressive actions by China, Russia, Iran or North Korea in these states could amplify threats such as conflict spillover, terrorism or unchecked weapons proliferation, all of which pose significant concerns for the UK’s foreign policy and homeland security.

For analysts across the Atlantic, the goal of monitoring adversarial activities, intentions and capabilities is to give policymakers as much time as possible to craft a response, whether that is executing on a deterrence strategy, shaping diplomatic engagement or taking robust offensive or defensive measures. Early warning enables the government to allocate resources effectively and strengthen alliances, while also safeguarding US and UK interests and global stability. Such foresight not only reduces the risk of strategic surprise but also helps maintain a competitive edge in a rapidly changing geopolitical landscape.

Right now, this approach is primarily human-executed; while some AI tools are used for bespoke purposes,[3] they have not been adopted on a large scale. The benefit is that such assessments are easily explainable and the chain of accountability is clear. However, this comes at the expense of speed and the limited volume of intelligence sources that a human analyst can manually process. Analysts could misinterpret data (e.g. signals or classified intelligence), misjudge the data’s credibility or even fail to incorporate relevant data altogether, often due to a lack of knowledge of a data source’s existence. Furthermore, an analyst can struggle to manually incorporate both quantitative and qualitative data; as a result, there can be restrictions to conclusions they can draw from data, particularly when it comes to identifying anomalies or causal links. And this is all assuming that analysts have access to all the relevant data that they need and that an IC has access to all the personnel that it needs – both of which they increasingly lack.

Today, the stakes of those potential failures could not be higher. The attack surface has increased with threats from not only state adversaries but also technologically sophisticated non-state actors who have a wide range of tools that can be deployed against US and UK interests. While conventional attacks with military-grade weapons are always a threat, gray zone or hybrid conflict, as well as cyberattacks on critical infrastructure or financial systems, can be just as grave a threat.[4] Therefore, it is imperative that analysts are fully equipped with the tools necessary to give warnings on all these possible threats. This is where AI tools, especially those that help humans process and analyse data faster, could be indispensable in helping US and UK analysts and policymakers maintain their decision advantage.

While the two ICs might already be experimenting with adopting AI tools into their workstreams, they must think about longer-term benefits and capabilities as this technology evolves. Intelligence analysis offers both an easy access point[5] and an opportunity for high impact. The ICs should be looking to leverage any tool, system or technology that allows for quicker and more comprehensive indicators and warnings of geopolitical events of interest. Research and development both in academia and the commercial sector demonstrate that this is a growing area of interest, making it a prudent avenue to explore, especially in the context of a human-machine team. AI has the potential to help policymakers spend more time on the strategic – rather than tactical – aspects of decision-making.

Within the current technological state of play for geopolitical strategic warning, there is no unified AI-powered system that can perfectly predict outputs or forecast security implications of interest – even for the Axis of Disruptors, where the most attention is being paid. However, investments in the technological developments already underway will allow the ICs to potentially make significant strides in improving crucial components of the strategic warning process. With the technological landscape impacting the threat landscapethe traditional scope of strategic warning is changing. Moreover, the ICs will need to start incorporating technological vulnerabilities and opportunities into strategic warning assessments, especially where there is the expectation that adversaries are making similar investments. To be able to effectively understand how to give strategic warnings about this technology, the ICs need to be using this technology.

1.1 Research methodology

1.1.1 Research Aims

The primary research agenda is to examine the extent to which artificial intelligence (AI) and intelligent automation can enhance the ICs’ ability to forecast key geopolitical events of interest – including leadership decisions, political transitions and military aggression – such that intelligence analysts can provide policymakers with the most timely and actionable intelligence. To achieve this goal, there are three main objectives to address within the scope of the project:

  1. Establishing the current technological state of play in the field of AI and geopolitical forecasting to highlight existing models in use and under development. An expanded version of this chapter, highlighting specific technological limitations and opportunities, was published in November 2024.[6]
  2. Identifying AI technical challenges that would need to be overcome to push the current state of play towards the ideal case of a single, integrated platform.
  3. Exploring the costs, both quantifiable and non-quantifiable, of developing such a system and contextualising these costs against the benefits.

1.1.2 Methodology

In order to answer these questions, the study team conducted:

  1. A review of academic and grey literature from the past three years to establish the state of the art in global efforts to apply AI to strategic warning, the current data infrastructure for existing tools, criteria for optimal use cases, ideal dataset requirements, technical challenges and technical solutions, policy challenges and predictable costs.
  2. Industrial analysis of the state of play in AI-based conflict modelling tools to understand applications of AI to strategic warning.
  3. Semi-structured interviews with UK and US government producers and consumers of conflict and political instability modelling, academic experts, open-source intelligence (OSINT) analysis providers, and technical experts from industry to gather input on the state of play for AI that produces assessments of geopolitical events, optimal use-cases for AI and the ideal dataset requirements, and how to develop a method for calculating the overall cost of the infrastructure and capability.
  4. Analysis of the cost of developing an AI system applied to strategic warning based on methodology developed from interview participants’ input.

The findings of the literature review and industrial analysis are published in the CETaS Expert Analysis, “The State of AI for Strategic Warning.”[7]

2. Can Current AI Systems Make Assessments about Geopolitical Events of Interest?

For analysts and policymakers, an AI-powered geopolitical prediction tool would likely serve two distinct but not mutually exclusive purposes. First, it could allow analysts to better focus on key countries of concern, such as the Axis of Disruptors, by augmenting the close monitoring already done by human analysts. For example, a 24/7 monitoring system that produced real-time assessments of Chinese or Russian troop activity would allow analysts to get insights to policymakers even quicker. A second opportunity would be to apply geospatial AI innovation to monitor the rest of the world, quantify causal mechanisms for conflict,[8] or look at historical conflict data to identify conflict risk indicator patterns that human analysts may have missed.

Together, this could contribute to improved early warning that would help enable the IC to focus human personnel and other resources on their most pressing priorities, and to reallocate resources as necessary when the system detected anomalies in less closely observed countries. In either case, an AI-enabled tool would be most helpful if it could help analysts better predict discrete events, such as an impending coup or military attack, and forecast the implications of those events, such as assessing how the economy and geographic neighbours will respond.

Political scientists have long used quantitative models and simulations to predict the propensity for political stability or instability, which have allowed them to identify important correlative patterns, including those that occur across different regions and periods of time.[9] Analysts have used early machine learning (ML) tools to incorporate more data and thereby sharpen their assessments, as seen in the ViEWS competition,[10] which tested models of conflict in Africa against one another. As computing power has expanded, these models have been able to incorporate even more data and address missing data issues, and have allowed for more sophisticated techniques that have even lent themselves to some causal inferences.[11]

There is currently no AI-powered system that can perfectly predict geopolitical flashpoints or forecast their implications. There are AI-powered open-source intelligence platforms that fuse text data or satellite imagery for real-time monitoring of geopolitical events in order to identify emerging threats. Some platforms, which are used both by ICs and the commercial sector, scan social media to track potential flashpoints of violence, but their primary role is to give contemporaneous and immediate updates to users rather than to make actual predictions of events. However, when it comes to predictive analytics, there are several models and tools available that address important components.

For the purposes of this study, we focus on models that aim to predict or forecast geopolitical events. However, it is worth noting that there are similar tools under development that instead focus on the scenarios and the implications of geopolitical events.

In both the commercial and academic sectors, we are seeing three broad categories of models that profess to model geopolitical events as a dependent variable:

  1. The prediction of geopolitical risk: These models draw on large datasets to compute a quantitative risk score of political instability for a given geographic region, usually a state. The product of such models is often what is known as a heat map, which helps analysts make comparisons across regions and pinpoint areas of focus by assessing the conditions that make a geopolitical event more likely. The claimed benefit of such models is that they can be globally focused and make longer-term predictions (with some models on the market claiming to predict conflict risk two years in the future)[12] while still helping users detect important shifts. However, it can be difficult to articulate how these quantitative outputs meaningfully add to the qualitative assessments human analysts or subject matter experts in the field are already able to provide.[13] For example, if an AI model predicts relative stability in Denmark two years out and relative instability in Niger in that same timeframe, most policymakers would likely say they did not need the model to make those assessments. Some experts say that if we are monitoring early warning for tactical changes, then ground sentiment and human experts can more efficiently identify the risk of a novel conflict outbreak than AI can at present.[14]
  2. The forecasting of certain shorter-term outcomes of interest: Drawing on lessons from models like those for economic and weather forecasting – for which there is rich data that can help analysts make forecasts of an event over a slightly longer timeframe – some experts were optimistic. They believe existing models can help forecast uncertain outcomes of more expected events in situations where they have better access to robust, labelled data. One illustrative example of this is election forecasting using socioeconomic indicators coupled with polling data. Such models also draw on historical data to forecast the implications of these outcomes. These models are susceptible to fluctuations in input data, which is why they tend to perform in a shorter time window – such as weeks, as opposed to months or years.[15]
  3. The prediction of the specific location of events: One of the key questions analysts are trying to address is the exact location and timing of a particular event. Improvements in the collection of data, such as imagery data coupled with social media and traditional media information, have led to improvements in predicting the geolocation of political activity. A common use of such models is producing better surveys of where troops are located in order to make assessments about where they might go next. One of the key benefits of such models is that with improved data quality, they allow analysts to speak to subnational events of interest. These models are also increasingly being deployed to address geopolitical tail risk– rare but highly impactful events– but they come at the expense of time. These models have the shortest window of time – often kept to a week or less – and are currently deployed on very targeted geographic areas of interest, such as Ukraine and the Taiwan Straits.[16]

Model Output

Time Window

Political instability score

Years

Uncertain outcome of expected event

Months

Location of events

Days to weeks

 

The overall benefit of using AI for all three model types is clear. The explosion of big data, especially through web scraping and language translation, has expanded the capabilities beyond what a human analyst could manually do. In addition, satellite data and mobility tracking, can help analysts detect micro-level changes that precede larger instability events. The increasing amount of data supports the improved detection of anomalies,[17] time forecasting with more replicability, and high-resolution monitoring.[18]

One of the conventional criticisms of global coverage models is that important nuances of cultural contexts might get lost – for example, how the communist legacies of China, Russia and North Korea impact them differently.[19] However, as these models have become increasingly sophisticated, it has become possible to account for as much nuance as there is data available, and for analysts to extract specific variables that are driving changes. In short, analysts do not need to know in advance which indicators to feed into the model but can allow these models to help them understand which indicators to prioritise. In addition, as data at the subnational level gets better, the model types that have been traditionally focused on country-level outputs could give more refined outputs.[20]

Another promising avenue of development is the use and incorporation of generative AI (GenAI) in this field, especially in scenario generation.[21] As mentioned in reference to intelligence failure, one of the key challenges for analysts is understanding what to do with the detection of an anomaly, especially if an existing mindset stands counter to the new data. GenAI is less susceptible to the same cognitive biases of humans, and there is extensive work being done to reduce the impact of hallucinations.[22] The incorporation of GenAI into these models could help analysts with scenario development and hypothesis testing, and even policymakers with decision support, to ensure users know how to contextualise and act on the outputs.

2.1 Use by an intelligence analyst

The models focus on providing assessments of geopolitical outcomes, but how do analysts then turn that into strategic warning? When analysts have the capacity to deliver quicker, more comprehensive assessments about events of interest, you move from the tactical to the strategic. Some interviewees highlighted the goal of strategic warning, noting that analysts need to be able to inform policymakers when our strategic landscape has changed rather than simply predicting a specific event’s time or place.[23]In its immediate use case, analysts can leverage AI in the collection and synthesis of data[24] to highlight anomalies that need more attention.

Some interviewees were deeply sceptical that AI would ever be able to make accurate predictions of a conflict years in advance or identify the moment a conflict would break out[25]because of two key challenges: data limitations and the challenge of predicting leaders’ decision-making.

As a result, outputs are likely to be a source of data as opposed to finished intelligence.[26] But there can be an upside, as it leverages many pieces of information that a human analyst would otherwise manually input while moving faster. Human analysts will always be required to validate information and demonstrate knowledge of the provenance of information,[27] but if humans can play a role later in the process rather than needing touchpoints throughout, this can free up personnel to achieve other goals.

2.2 Challenge 1: Data scarcity, inconsistency and fusion

AI systems designed for geopolitical forecasting face a range of technical challenges, many of which stem from fragmented data infrastructure and persistent human bias. Take the example of weather forecasting: one of the reasons why weather forecasting is much easier to do than geopolitical forecasting is because of the rich data available. Even the rare events in weather modelling, like cyclones, are merely extensions of normal storms – about which much is known.[28] Rare geopolitical events are much more nuanced, with far less linear relationships driving them. In some relevant regions, which could nevertheless have enormous conflict escalation repercussions, the US and UK ICs may not be gathering data at all. Very little data on factors that provide protection from risks or that create stability after a crisis is systematically gathered with AI training data in mind.[29]

Overreliance on limited or historical datasets can lead to overfitting and missed anomalies,[30] especially as most publicly available models lack access to sensitive or classified information. These constraints are compounded by short forecasting windows, meaning that recent geopolitical shifts may not be reflected in the training data. In addition, many existing tools fail to capture key social or inter-state factors, often relying on incomplete or outdated information that can reinforce analyst biases.

Evaluating and comparing different systems also remains difficult, due to inconsistent performance metrics.[31] The more disparate datasets are integrated, the more likely we are to encounter traceability, AI security and data validation challenges as different models are layered with increasing opacity regarding the assumptions and limitations behind different models.

Broader problems – such as those with data governance, security and affordability[32] – further hinder the creation of integrated data infrastructure. Lack of suitable data to train AI models is particularly common in organisations applying AI for the first time or to a new use case.[33] Data on conflict or instability risk is not collected with AI training in mind, so data engineers may have issues with the quality of testing data.

2.3 Challenge 2: Modelling the decisions of individuals

Predictions of specific ‘trigger’ events will always be challenging, especially for events in which the element of surprise is paramount.[34] For example, you might be able to assess the overall likelihood of a coup or an international war occurring based on overall risk factors coupled with factors that suggest an event could be imminent (e.g. troop locations) but face difficulty in predicting the exact event for two reasons. First, although the CIA has made attempts to use AI to help its analysts understand and anticipate leadership decision-making,[35] it is not possible to get into the head of a leader perfectly even with very good classified information. Second, it is not possible to perfectly account for random,[36] impulsive or opportunistic decisions by individuals. This could include influential factors such as dissenting voices in a government, advisers for a state or military leader, or psychological instability. Individuals with deceptive intentions are difficult to capture as training data for a model. Furthermore, the decisions of individuals who may not appear sufficiently high priority to track, but who could act in ways that trigger regional stability (e.g. the self-immolating fruit vendor who triggered the Arab Spring) cannot feasibly be captured in geopolitical modelling tools. Therefore, while an AI model can identify when a situation or country is highly volatile, it will face challenges predicting the exact moment an event of interest will occur.

Despite these challenges, it is important to remember that these technologies are rapidly maturing. Next-generation AI tools will not only monitor broader swaths of activity to detect anomalies early but also generate new scenarios that analysts may not have considered, leading to more comprehensive and nuanced threat assessments. Most importantly, they offer a time advantage: even a marginally earlier alert can be the difference between proactive action and reactive response – an edge that becomes more vital as adversaries also invest heavily in AI. The growing involvement of commercial innovators means that intelligence services need not build everything from scratch, allowing them to integrate and fine-tune AI systems more quickly. Over the short term, AI can augment human analysts as an additional source of intelligence, and in the longer term, this human-machine teaming will steadily refine warning mechanisms, producing faster, richer and more reliable insights for decision-makers.

3. What is the Path Ahead for Applying AI to Strategic Warning in the Ideal Case?

The previous chapter covered the landscape of current tools utilising AI for strategic warning in the present day. This chapter looks to the future and presents the study team’s analysis of the path towards AI-enabled strategic warning tools that will bring the most benefit to the IC.

AI may never be 100% accurate in its predictions because of the inescapable role of randomness, but we can greatly improve the quality and accuracy of the outputs and overall forecasting capabilities of an AI-enabled strategic warning system and secure more time for policymakers to respond to crises. Instead of predicting the time and place where geopolitical flashpoints will occur, experts suggested there would be great utility in helping analysts track conflict risk indicators more accurately by leveraging more data, as well as more rapidly helping analysts identify the possible outcomes of a shock immediately after it happens.[37] Essentially, we can track the tinder and the kindling and predict the path of a forest fire without predicting exactly when the match will light.

While it might be tempting to jump to the idea of a single super simulation of all the possible factors that could affect conflict and instability or the reverse, several projects have tried and failed due to a series of recurring bottlenecks. The simple reality is that while there are some factors that largely predict conflict across the globe, the factors that ultimately lead to the outbreak of violence in a particular state or region are highly nuanced.

Improvements in conflict modelling tools could be pursued in phases to first address low-hanging challenges while building confidence and setting the stage for a more ambitious moonshot project that would work towards a model that comprehensively simulates all possible instability risk and stabilisation indicators. This moonshot project, broken down in a phased approach, would enable the IC to pragmatically make the most of existing models and to start small and scale responsibly without being delayed by technical bottlenecks and costs.

Figure 1. Overview of the three phases

 

Figure 1

3.1 Phase 1 – Establish training and testing of data foundations

In Phase 1, the first priority will be to set solid data foundations by filling in missing conflict warning data, by collecting them in priority countries where this does not yet occur and systematising data collection for future AI systems for strategic warning.[38] There are three categories of data of relevance:

  1. Available measured data – This includes data we already collect (e.g. weather data, stocks indices and passive signal emissions)[39] but there is also a need to supplement available measured data by systematically labelling current conflict indicators, as well as historical conflict triggers and tipping points. Automating routine, low-value bureaucratic tasks and obtaining available measured data (e.g. geospatial data, mobile phone data and signals emissions) on the conditions of each conflict of interest will be important to eventually free the analyst’s time for sense-making.[40] Leveraging OSINT to create time maps of unit movements could help analysts more closely investigate actual doctrine.[41]
  2. Quantifiable non-traditional data – This could include a wide gamut of non-traditional data sources (e.g. hacker data such as that from the Discord Leak, in which allegedly leaked classified documents were released on the Discord platform); humanitarian health, famine and economic data (which could build a picture of swelling grievances); and lifestyle apps (e.g. dating app data, which can be used to map troop movements).[42]
  3. Mental model data – This includes data on key decision-makers’ and groups’ beliefs, biases, emotional responses and reasoning patterns, especially from multilingual sources within the Axis of Disruptors.[43] This can also include more systematic capture of anthropological or biographical knowledge for the explicit purpose of developing AI training data sets. 

Today, most repositories only have a few dozen points of data in a catalogue with only dozens or hundreds of samples[44] but, ideally, gathering the data above on tens of thousands of examples of conflict and instability events[45] including the following parameters would help us develop a standard model of conflict – which would provide a framework of the assumptions, rules and strategic choices to relevant actors in a conflict, allowing for predictions to be made about:

  • Conditions – the individual characteristics necessary to describe a region, people, group, leader or a set of key relationships.

  • Trends – how those conditions change dynamically over time.

  • System state – a summary combination of conditions and trends sufficient to describe the state of the region, people, group, leader or set of relationships.

  • Triggers – events that may result in system state changes, from stability to instability or into conflict (e.g. a protest).

  • Tipping Points – when conditions and trends favour significant destabilisation (e.g. the protest that led to the Arab Spring).

  • Scenarios – forecasts of what might happen after a trigger, whether it results in a tipping point and what impacts various policies may have on the future.

As part of this, there is a need to continuously identify novel trends that challenge assumptions about conflict risk indicators.[46]

A critical technical challenge to developing solid data foundations will be the ability to validate data and identify intentionally hidden, poisoned or corrupted data.[47] This upfront investment will help address output and system validation challenges further down the line, including whether the dataset supports the development of a standard model of conflict. A standard model of conflict would explain the social science of conflict in the same way we can explain why atoms and molecules in physics work, with high confidence.

Part of this phase also involves overcoming data sharing challenges across intra- and inter-governmental agencies, as well as across the public and private sector. Data classifications, proprietary information or data privacy regulations are no trivial challenges and will continue to inhibit the IC from exploiting data-driven capabilities if they are not considered from the onset of any AI-based capability development effort.[48]

Phase 1 will enable the IC to start to build muscle memory of how to create the data infrastructure and the analyst familiarity with these systems[49] that are needed for subsequent phases.

3.2 Phase 2 – Nurture a suite of best-in-class models that assist analysts

This phase describes a suite of large and small, controlled, fast and frugal models that assist analysts with forecasting conflict risks and stabilisation opportunities.[50] The outputs of several models ingesting different types of data that show a different understanding of how the world works could be crowdsourced to triangulate perspectives and contribute towards mitigating bias.[51] The third-party or government developer of each model could create models using different methodologies, applying different types of AI to augment each stage of the intelligence collection and analysis cycle in whichever way they know works.[52] For example, an ML-based system could model the economic forces that affect conflicts globally and a different deep learning-based model could look at links between geopolitical events on a global scale.[53] Both government and third-party developers would create system cards outlining the limitations, assumptions and assurance case for each model. In this phase, the models do not have to be accurate – they simply have to be thought-provoking for the analyst.[54]

Mega models leveraging open-source internet data over large geographic areas could be built by industry suppliers.[55] For example, different vendors could develop separate models mining open-source data and covering slow-building dissent and grievances in China, Russia, Iran or the Middle East and North Africa, while another vendor could develop a model mining open-source conflict risk data on Southeast Asia. Models that track intergroup conflicts and opinion dynamics[56] (e.g. between government forces, foreign fighters and private security groups in Ukraine) could also yield new insight. Government agencies could develop in-house models mining official-sensitive data on smaller areas and routes, secret data on specific groups and top-secret data on specific individuals.[57]

Appropriately cleared human analysts could then synthesise the crowdsourced outputs in addition to traditional intelligence artefacts, developing a better understanding of the conflict picture. This would enable the analyst to do what humans are best at – human reasoning, while allowing the analyst to take more data into account through the help of lots of different AIs. AI is proficient at understanding patterns, but the state of the art in AI can overlook important nuance and contextual elements – for example, food subsidies in less developed countries are important instability risk factors, but it is not equally important in Nigeria as it is in Egypt.[58] In Egypt, government subsidies on pita bread are more relevant to instability risk, whereas oil is more relevant in Nigeria. Maintaining human-machine teaming is also necessary because current AI-based systems have a tendency to fight the last war that they trained on – and human expertise is needed to balance some indicators and warnings, and to detect the potential overfitting of data.[59]

Identifying ways that the different model outputs could be validated is also important. One way of doing this would be to see if the model could have predicted the progress of 9/11, Pearl Harbour or the Arab Spring if given the same information that analysts had before the outbreak of each crisis.[60] Operators could also compare the results of the model with the work of an analyst who has applied structured analytic techniques.[61]

The family of large and small models would be continuously reviewed, retrained, reweighted and annotated overnight with a human in the loop during several cycles until the final models were better than the original models.[62] The cascading models should also feed both bottom-up and top-down feedback signals to continuously refine weights in each model.[63]

Over time, Phase 2 could help the IC develop a standard model of conflict. The suite of models would instrument existing theories and, over time, eliminate theories that are incorrect.

Interviewees discussed the fact that middle decision-makers in the defence and security sector can be very old-fashioned and resistant to the adoption of AI,[64] so having a suite of models that are continuously improved could also contribute towards changing this risk-averse culture. In addition, focusing on a multitude of capabilities could stem decision paralysis regarding how to prioritise a specific capability. Maintenance and progress reports on each model could increase confidence in AI over time and help more users understand its strengths and limitations – for example, some models may be better at predicting positive rather than negative changes.[65] Performance metrics for each model[66] could further increase confidence.

In Phase 2, data collection should focus on “filling in dark spots on the map” left over by Phase 1[67] and would be driven by the need to test theories for inclusion or rejection in a standard model of conflict. This may require fusing data streams collected in Phase 1, combining available measured data, quantifiable non-traditional data and mental model data, as well as enabling the collection of new data suggested by novel hypotheses that have emerged.

Another key breakthrough needed in this phase is in developing metrics to compare different models and dataset provenance to be able to grade and communicate dataset quality.[68]

By early Phase 2, analysts should no longer need to manually sort for alerts, as automated data streams would ping analysts when key risk parameters moved into outlier conditions – comparable to how data centre administrators do not manually consider server performance but respond to alerts.[69] These alerts can be tuned over time through human-machine feedback loops, resulting in fewer false positives.[70]

A simplified visualisation of Phase 2 is presented in Figure 2.

Figure 2. Phase 2

Figure 2

It is possible that adversaries may waste time and resources chasing the unachievable, and a pragmatic approach that enables the US and UK ICs to spend valuable resources on more proven methods could generate advantage. Phase 2 would allow the US and UK ICs to test models, theories and datasets against one another both to identify bad ones but also improve confidence in analysis.

3.3 Phase 3 – Develop an integrated AI-based platform

This phase describes an integrated AI for strategic warning simulation platform built by a single system integrator, which could be used to run and test different hypotheses about conflict.[71] This phase will only be possible after significant developments in agentic AI to pursue complex tasks and after a standard model of conflict is developed – and it will likely require considerable investment in interdisciplinary technical and social science research and development to overcome the fundamental challenges. The technical challenges for each phase are presented in Figure 3.

Figure 3. Technical challenges for each phase

A diagram of a process

AI-generated content may be incorrect.

In Phase 2, narrow and siloed models would be generating outputs but, if we expect AI to be able to fuse all the information we know about conflict, then we not only require better data, but better math to establish causality between actors’ decisions and actions, and their outcomes.[72] The rewards would likely be considerable and an integrated model may be able to elucidate the inter-relationships between conflict indicators.

Phase 3 will likely require more globally and sectorally comprehensive and longitudinal data, and will need to run the data tens of thousands of times in an emulation where we can hypothesise potential outcomes based on known patterns.[73] Phase 3 will also require more hardware and data sharing agreements between industry and government, as well as enormous compute resources.[74]

A method is also needed to validate the Phase 3 system’s methodology because, in addition to validating individual models’ outputs, integrating diverse data – each with their own assumptions, limitations and conceptual frameworks – complicates traceability and explainability.[75] For this reason, smaller, more focused models are architecturally the better choice. There are few human interactions more complex than military aggression and, in simulations of this, models typically face a trade-off between the complexity of the parameters and the performance of the AI.

Developers must understand whether the corpus of human knowledge already possesses the right theories on what is happening in the AI system and what is happening in reality.[76] At the tool development and testing stages, it will be important to ensure it is clear if the model is focusing on the correct relationships, and that the multiple streams of information with different pedigrees have been weighed appropriately.[77] To mitigate bias, developers must also assess if the single, integrated model is trained on data that is representative of all demographic cohorts in the world.[78]

It remains to be seen if a single integrated system can be as flexible to several types of possible questions analysts may wish to answer or patterns they may wish to focus on.[79] Temporally, models should be able to operate at different time horizons towards the past and the future at 6-month, 1-year, 10-year and 20-year time horizons. Strategic warning tools may be able to develop a valid methodology, but analysts and senior decision-makers will need configurability to different questions as the strategic landscape develops.

With a standard model of conflict in hand, the dramatic development in Phase 3 is the application of synthetic data to broaden the training set beyond the available historical data.

Modelling techniques in Phase 3 bear more similarity to the frontier model development currently occurring in less complex areas. Large models of tens or hundreds of billions of parameters trained on the optimised datasets built throughout Phases 1 and 2 and synthetically generated in Phase 3 will radically advance AI for strategic warning.

4. The Price and Payoff of Applying AI to Strategic Warning

At the beginning of the study, the team set out to develop a cost-benefit analysis of either completely building an AI-based strategic warning capability within government from the ground up, procuring an industry solution or maintaining the human-led process. As the study progressed, it became apparent that it would take a moonshot effort carried out in phases to develop the data infrastructure that would bring significant performance enhancement in strategic warning, and that a mixture of these elements could appear in each phase.

Traditional cost-benefit analysis in a field where the technology is not yet developed is challenging, given the difficulty of quantifying several unknown variables and limited publicly available information on analogous AI-based capability costs. For example, much of the investment in data infrastructure would benefit not just an AI for strategic warning system but many other analytic priorities as well. Yet it is possible to identify quantifiable rough order of magnitude (ROM) costs and non-quantifiable costs that should be considered.

4.1 Quantifiable ROM costs

If a government customer is simply procuring a system, the cost of the system would be the price of and subscription to the system, but if the system is being developed from the bottom up, there are significantly more investment costs to consider. These include direct, indirect, start-up, sustainment, procurement, salary and benefits costs,[80] such as those for IT infrastructure (e.g. computing hardware, data storage, servers, bandwidth and connectivity, cloud-based services and physical infrastructure), personnel and data.[81] A ROM estimate of data acquisition can range from annual licenses of $5,000 to $500,000 per year.[82] This might involve some configuration of simulations, models and data, as well as the costs of data rights or intellectual property.[83] The granularity of data labelling activities will also affect how cost-intensive this endeavour is. The IT infrastructure to train foundational models on acquired data is becoming more expensive,[84] and there are other tools to consider (e.g. digital engineering tools, workflow tools, translators and graphics emulators).[85]

As illustrated in the data capture requirements in the previous chapter, an ambitious effort to enable AI for strategic warning is not just about procuring an off-the-shelf, narrow AI-based tool but addressing systemic data challenges. This effort would require research and development at a scale similar to frontier AI efforts. This means that resourcing the effort with strategic-level funding is essential. Many leaders are not prepared for the cost and time it takes to acquire, structure and explore their own organisation’s datasets, and they expect AI adoption to be possible in weeks instead of months.[86]

Industry leaders estimate that the costs of foundational frontier models that are tens to hundreds of million dollars now[87] may be $10 billion by 2026, and $100 billion clusters in 2027 – and even those may not be sufficient for capabilities envisioned by Phase 3 strategic warning AI.[88] Task-specific, fit-for-purpose models with less than 10 billion parameters, as envisioned by Phase 2, can take up to 60,000 KWh to train and fine-tune[89] while multi-purpose frontier models with hundreds of billions of parameters, as envisioned in Phase 3, may exceed 1,500–2,000 MWh (barring energy saving innovations).[90] Current plans for some data centres even estimate that they may take up as much as 2.8 GWh.[91] An AI-based capability is particularly costly because the high algorithm complexity and upfront capital requirements make it unlike a typical IT project.[92]

Once the models are built, inference energy costs – the energy used per inquiry – will become more efficient. Moreover, given the far fewer likely inquiries in an AI-based strategic warning capability versus a commercial public model, over time the costs per inquiry will be incremental to the overall costs.[93] However, as inference complexity increases – from generative chat to reasoning, to autonomous tasking – the inference energy costs increase by an order of magnitude at each step.[94]

The suite of digital tools necessary to manage the training, fine-tuning and other efforts around the models are incremental to the costs of the models themselves, but vital to get right. This is why a diversity of providers is important in Phases 1 and 2 rather than picking one system and forcing all providers to work within the same framework. As organisational muscle memory improves in Phases 1 and 2, however, the knowledge of what digital tools primary integrators will need to succeed in Phase 3 should emerge and solidify as industry best practices. There may also be cost and commercial challenges in concurrently carrying out data acquisition from several models and suppliers. Furthermore, given how sensitive these models are, there will be costs associated with storing them in air-gapped systems to prevent nefarious actors from gaining access.

The workforce needed to staff an AI for Strategic Warning capability goes beyond the analysts who use it. Given the effort it will take to reset the way the IC captures and prepares data for AI, as well as the variety of different roles that will be needed, the costs are more likely to be analogous to when the United States established Space Command – requiring entire new career paths, management structures and facilities.[95] A common concern is the possibility of relegating talented and skilled analysts to the role of data labellers. This could happen if the appropriate workforce requirements for data scientists who understand AI, suitably certified assurance experts who specialise in AI for strategic warning and others are not considered. This workforce will also need appropriate training and development.

Using estimates from a 2019 Congressional Budget Office analysis for Space Command new hires and facilities in 2020,[96] the costs of Phase 1 could be similar to a Policy Directorate in Phase 1 (40–300 people, <$10 million start-up funding and $10m–60m annually).

Evolving into Phase 2, with its increased focus on governance of managing a tiered ecosystem of models, costs may resemble those of a new Development and Acquisition Agency in size and composition (1,200–1,300 people, $220m–$560m start-up and $240m–$460m annually).

As the Phase 3 capability matures into a single or few primary integrators with more advanced AI capabilities, personnel needs may decrease – and, when combined with other AI capabilities, more closely resemble those of a Combatant Command (400–600 people, $520m–$1,060m start-up and $80m–$120m annually). If the total workforce that requires government funding, military and civilian, ends up being substantially larger – reaching that of a Military Department (5,400–7,400 people, $1,400m–$3,240m start-up and $1,080m–$1,540m annually) or that estimated to be required to support digital engineering of ~40,000 personnel – the costs could be $500m–$800m start-up and $500m annually.[97]

4.2 Non-quantifiable costs

4.2.1 Opportunity costs of not adopting AI

It will simply not be possible to have good coverage on conflict risk indicators without AI, and the race for AI for strategic warning might set the conditions for the next several decades of international affairs. At stake is not simply that adversaries might use AI capabilities to outthink us but to out-imagine us and create an environment in which the rules of engagement remain undefined and ever-changing.[98]

At the time of writing, Chinese hedge fund High-Flyer AI announced the DeepSeek-V3 foundational training model followed shortly after by the DeepSeek-R1 reasoning model that stood “toe-to-toe with the best OpenAI, Google, and Anthropic.”[99] Not only did this represent a large leap in performance but DeepSeek was allegedly developed for a fraction of the cost ($5.5m) on export-controlled, less powerful GPUs than are available to US firms.[100] In contrast, in the same week, a coalition including OpenAI, Oracle and the UAE MGX announced a $500bn investment in additional US data centres named “Stargate.”[101] An additional concern is that within a week of the DeepSeek moment, Alibaba announced its own frontier reasoning model, “Moonshot,” which is competitive with the o1 OpenAI released in September – and behind the performance of the o3, but still a remarkable feat. Alongside the two companies are a host of other Chinese firms looking for similar gains.[102] There are doubts about whether the claimed costs represent true costs, and it is unclear if High-Flyer, having developed DeepSeek, has the compute power to implement it broadly.[103] But, regardless, the DeepSeek moment highlighted the fact that Chinese AI firms are closing the distance to US firms on frontier model development, training and ability to creatively innovate.

There may also be opportunity costs in terms of cost savings and avoidances if enhanced foresight prevents resource wastage, increases analyst productivity by automating some tasks, or reduces error rates by helping analysts triangulate hypotheses.

It is possible there may be non-quantifiable opportunity costs when it comes to enhancing safety outcomes or saving lives, reducing uncertainty, increasing strategic choices, reducing redundancy and achieving strategic objectives.

Finally, if the US and UK ICs do not pursue AI for strategic warning, they could struggle to understand adversaries’ capabilities and how to counter them. In addition to its commercial sector, China’s intelligence apparatus is rapidly building up its AI capabilities,[104] which means a Chinese AI-based strategic warning system could be on the horizon. Integrating AI tools for high-impact IC purposes is how we keep our competitive advantage, especially as AI itself becomes part of the threat landscape that necessitates strategic warnings.

4.2.2 Unilateral versus multilateral development costs

A single government pursuing this ambitious programme might be able to fund the dollar costs, but might experience challenges harnessing the necessary partners if some of the technical breakthroughs emerge from outside a particular country. There may also be duplicated costs if a nation pursues AI for strategic warning unilaterally.

Given that enormous opportunity cost, pooling resources among bilateral or Five Eyes allies could reduce the financial burden of the capability for each partner but, equally, each partner would have to consider and harmonise different legal and regulatory frameworks and policy requirements. Data access challenges could also be compounded. If the multilateral approach is to be pursued, data sharing and data sovereignty considerations will need to be addressed upfront.

4.2.3 Preserving autonomy versus vendor lock-in

It may be easier to maintain existing security hierarchies if the majority of models are built in-house, but this would bring the upfront set-up costs to government instead of industry. Furthermore, governments are inhibited by data-sharing constraints and a lack of infrastructure.[105] While there are pockets of excellence, government models are not likely to be the centre of gravity for the future of AI applied to strategic warning.[106]

Instead, governments are turning to industry to lead on research and development to reduce costs, but government involvement at the prototyping stage will remain necessary, and it will remain essential to keep the programme of work classified.[107] If the development of technology does not need to be done by someone with a clearance, it is typically more economical to do it outside of government.[108]

It is tempting to ‘lock-in’ with first-mover companies, hoping to leverage efficiencies of scale by organising a common platform now. But the field of AI is evolving quickly and the nature of the standard model of conflict and where the technical breakthroughs will come from are still too unknown to select a single provider now.

Maintaining a healthy and diverse ecosystem of suppliers could contribute towards preventing the same cost escalation challenges Defense experiences with traditional defence suppliers for major platforms. It is well documented that reducing competition in the market reduces suppliers’ incentives to bear down on costs.[109] Technology companies may charge more for their products if the government does not have alternatives.[110]

5. Conclusion

One of the key challenges of moving from the status quo to the human-dominated Phase 1, to a more AI-adaptive Phase 2 is understanding where to begin. There are many aspects of a strategic warning system – data processing, data cleaning, predictive analytics and scenario generation – that are constantly facing improvement. Trying to select one component to focus on while eschewing the others could exacerbate the technological gap and existing vulnerabilities. Therefore, a concerted, large-scale effort – a true moonshot – is critical to transitioning from Phase 1 to Phase 2, in order to ultimately push the IC to a possible Phase 3. Taking such a comprehensive approach comes at financial cost but provides the necessary structure, resource and, most critically, momentum to accelerate innovation across multiple sectors. By establishing clear milestones and benchmarks within this broader framework, stakeholders can measure progress more effectively and maintain accountability. These explicit goals should reflect technical complexity, ethical considerations and security requirements, ensuring that every step forward is deliberate, transparent and aligned with shared values.

Given the immense scope of this challenge, no single government or organisation can drive the entire process alone. While existing programs like the US’s Intelligence Advanced Research Projects Activity (IARPA) and the UK’s Advanced Research and Invention Agency (ARIA) offer pathways for funding and collaboration, the burden is simply too large for any one nation to bear. Consequently, a partnered approach – either bilaterally between the US and the UK or even across the Five Eyes alliance – can serve as an optimal foundation. Such a coalition allows for the pooling of resources, expertise and strategic vision. At the same time, it diminishes financial risks by distributing them among multiple stakeholders, increasing the likelihood of sustained progress.

In forging these partnerships, it is crucial to strike a balance between broadening the coalition through public-private partnerships, ensuring the national security community has access to the AI/ML skills needed to build this capability, and preserving a level of agility. Too many participants can convolute decision-making processes, dilute accountability and slow the pace of innovation. Conversely, too few partners risk overlooking the diverse expertise needed for meaningful breakthroughs. Therefore, the optimal balance lies in assembling a robust yet manageable consortium capable of multiple parallel experiments, data-sharing and consistent refinement of best practices.

All these efforts must remain firmly rooted in the principles of human–machine teaming. The interplay between human creativity and machine efficiency provides the strongest foundation for transformative progress, offering advantages in everything from data analysis to operational execution. This cooperative dynamic also highlights the importance of rigorous safeguards to maintain the integrity and security of each partner’s systems. Regulatory frameworks, ethical guidelines and technical standards should evolve in tandem to minimise vulnerabilities and ensure the responsible development of emerging technologies.

With the pace of technological change accelerating, deliberate action and foresight are indispensable. However, if policymakers are given warnings even earlier than before, they must take full advantage of them. Receiving strategic warnings earlier always risks policymakers thinking that the issue is not one that needs immediate attention. In doing so, they will position the project – and the broader international community – to move confidently through each phase, culminating in a lasting, transformative impact.

References

[1] Artificial general intelligence is a system that is at least as capable as a human at most tasks, according to Meredith Ringel Morris at her co-authors. Meredith Ringel Morris et al., “Position: Levels of AGI for Operationalizing Progress on the Path to AGI,” Proceedings of the 41st International Conference on Machine Learning, 2024, https://openreview.net/pdf?id=0ofzEysK2D.

[2] National Commission on Terrorist Attacks Upon the United States, The 9/11 Commission Report, 8.

 

[3] Frank Konkel, “The US intelligence community is embracing generative AI,” Government Executive, 8 July 2024, https://www.govexec.com/technology/2024/07/us-intelligence-community-embracing-generative-ai/397867/; IARPA, “Rapid explanation, analysis and sourcing online,” https://www.iarpa.gov/research-programs/reason.

 

[4] Sean Monaghan and Tim McDonald, Campaigning in the Grey Zone (RAND Corporation: October 2024).

 

[5] Special Competitive Studies Project, The Future of Intelligence Analysis: U.S.-Australia Project on AI and Human Machine Teaming, September 2024.

 

[6] Anna Knack and Nandita Balakrishnan, “The State of AI for Strategic Warning,” CETaS Expert Analysis (November 2024), https://cetas.turing.ac.uk/publications/state-ai-strategic-warning.

 

[7] Anna Knack and Nandita Balakrishnan, “The State of AI for Strategic Warning,” CETaS Expert Analysis (November 2024), https://cetas.turing.ac.uk/publications/state-ai-strategic-warning.

 

[8] Alan Turing Institute, “Global urban analytics for resilient defence,” .https://www.turing.ac.uk/research/research-projects/global-urban-analytics-resilient-defence

 

[9] Michael Mobius et al., “AI-Based Military Decision Support Using Natural Language,” Institute of Electrical and Electronics Engineers, 23 January 2023, https://ieeexplore.ieee.org/abstract/document/10015234; Black and Darken, “Scaling Artificial Intelligence for Digital Wargaming in Support of Decision-Making,” NATO, 8 February 2024, 

https://arxiv.org/abs/2402.06075; Joost van Oijen and Pieter de Marez Oyens, “Empowering Military Decision Support Through the Synergy of AI And Simulation,” NATO, https://www.researchgate.net/profile/Joost-Van-Oijen/publication/375584489_Empowering_Military_Decision_Support_through_the_Synergy_of_AI_and_Simulation/links/6550ae2fce88b87031df79e5/Empowering-Military-Decision-Support-through-the-Synergy-of-AI-and-Simulation.pdf; Cordis Europa, “Using Machine Learning to Identify Political Violence and Anticipate Conflict,” https://cordis.europa.eu/article/id/443344-using-machine-learning-to-identify-political-violence-and-anticipate-conflict.https://cordis.europa.eu/article/id/443344-using-machine-learning-to-identify-political-violence-and-anticipate-conflict; Mirco Musolesi and Akin Unver, “Computational Modelling of Civil Wars,” Alan Turing Institute, https://www.tandfonline.com/doi/abs/10.1057/palgrave.jors.2600889computational-modelling-civil-wars.

 

[10] Special Competitive Studies Project, “AI & the Future of Intelligence,” 3 September 2024, https://www.scsp.ai/resource/scsp-aspi-intel/.

 

[11] Rachel Myrick and Cheng Wang, “Domestic Polarization and International Rivalry: How Adversaries Respond to America’s Partisan Politics,” The Journal of Politics 86(1),January 2024, https://www.journals.uchicago.edu/doi/epdf/10.1086/726926.

 

[12] Author interview with industry participant, 15 November 2024; author interview with academic participant, 9 October 2024; author interview with industry participant, 18 September 2024; author interview with academic participant, 24 October 2024.

 

[13] Author interview with academic participant, 9 October 2024.

 

[14] Author interview with academic participant, 9 October 2024.

 

[15] Author interview with industry participant, 15 November 2024; author interview with academic participant, 10 October 2024; author interview with academic participant, 24 October 2024; author interview with academic participant, 15 October 2024.

 

[16] Author interview with industry participant, 18 September 2024; author interview with industry participant, 11 October 2024.

 

[17] Author interview with government participant, 10 October 2024; author interview with industry participant, 11 October 2024; author interview with academic participant, 17 October 2024.

 

[18] Author interview with industry participant, 15 November 2024.

 

[19] Author interview with academic participant, 24 October 2024.

 

[20] Author interview with industry participant, 15 November 2024.

 

[21] Author interview with industry participant, 15 November 2024; author interview with academic participant, 24 October 2024; author interview (2) with industry participant, 8 October 2024; author interview with industry participant, 18 October 2024.

 

[22] Sebastian Farquhar et al., “Detecting hallucinations in large language models using semantic entropy,” Nature, 630: 625–630, 2024.

 

[23] Author interview with academic participant, 24 October 2024.

 

[24] Author interview (2) with industry participant, 8 October 2024; author interview with academic participant, 8 October 2024; author interview with academic participant, 24 October 2024.

 

[25] Author interview with industry participant, 10 October 2024; author interview with academic participant, 15 October 2024.

 

[26] Author interview with industry participant, 18 September 2024.

 

[27] Author interview with industry participant, 8 October 2024; author interview with academic participant, 18 October 2024.

 

[28] Author interview with academic participant, 10 October 2024.

 

[29] CETaS workshop, 4 September 2024.

 

[30] Black and Darken, “Scaling Artificial Intelligence for Digital Wargaming in Support of Decision-Making.”

 

[31] Mayank Kejriwal “Link Prediction between Structured Geopolitical Events: Models and Experiments,” Frontiers in Big Data, 30 November 2021, https://www.tandfonline.com/doi/epdf/10.1080/03050629.2022.2029856?needAccess=true.

 

[32] Roshanak Rose Nilchiani, Dinesh Verma and Philip Anton, “Joint all-domain command and control (JADC2) opportunities on the horizon,” Acquisition Research Program, May 2023.

 

[33] James Ryseff et al., “The Root Causes of Failure for Artificial Intelligence Projects and How they Can Succeed,” RAND Corporation, 13 August 2024. https://www.rand.org/pubs/research_reports/RRA2680-1.html.

 

[34] Author interview with academic participant, 14 October 2024; author interview with academic participant, 15 October 2024.

 

[35] Scott Nover, “Can the CIA’s AI Chatbot Get Inside the Minds of World Leaders?,” GZeroAI, 21 January 2025, https://www.gzeromedia.com/gzero-ai/can-the-cias-ai-chatbot-get-inside-the-minds-of-world-leaders.

 

[36] Author interview with industry participant, 11 October 2024; author interview with academic participant, 18 October 2024; Benjamin Jones and Benjamin A. Olken, “Hit or Miss? The Effect of Assassinations on Institutions and War,” American Economic Journal: Macroeconomics 1 (2): 55–87, 2009.

 

[37] Author interview (2) with government participant, 9 December 2024.

 

[38] Author interview with industry participant, 15 November 2024; author interview with industry participant, 16 October 2024.

 

[39] Author interview with industry participant, 8 October 2024.

 

[40] Author interview with academic participant, 15 October 2024.

 

[41] Author interview with academic participant, 18 October 2024.

 

[42] Simon Newton, “Tinder Trap: Ukraine and Russia Using Fake Profiles to Trick Soldiers Into Revealing Intel,” BFBS Forces News, 18 October 2024, https://www.forcesnews.com/ukraine/tinder-trap-ukraine-and-russia-using-women-glean-intel-enemy-soldiers.

 

[43] Author interview (2) with industry participant, 8 October 2024; author interview with industry participant, 16 October 2024.

 

[44] Author interview with academic participant, 9 October 2024.

 

[45] Author interview with academic participant, 9 October 2024, author interview with academic participant, 10 October 2024.

 

[46] Author interview with government participant, 10 October 2024.

 

[47] Author interview with academic participant, 15 October 2024.

 

[48] Author interview with academic participant, 18 October 2024.

 

[49] Author interview with academic participant, 15 October 2024.

 

[50] G3 and G4; Daniel M. Benjamin et al., “Hybrid forecasting of geopolitical events,” AI Magazine 44, no.1, March 2023: 112–128.

 

[51] Author interview with academic participant, 24 October 2024.

 

[52] Author interview with academic participant, 8 October 2024.

 

[53] Kejriwal “Link Prediction between Structured Geopolitical Events: Models and Experiments.”

 

[54] Author interview with academic participant, 9 October 2024.

 

[55] Author interview with academic participant, 8 October 2024.

 

[56] Kejriwal “Link Prediction between Structured Geopolitical Events: Models and Experiments.”

 

[57] Author interview with academic participant, 8 October 2024.

 

[58] Author interview with academic participant, 24 October 2024.

 

[59] Author interview with academic participant, 15 October 2024.

 

[60] Author interview with government participant, 11 December 2024; author interview with government participant, 4 December 2024.

 

[61] Author interview with academic participant, 24 October 2024.

 

[62] Author interview with academic participant, 24 October 2024.

 

[63] Author interview with academic participant, 8 October 2024.

 

[64] Author interview with academic participant, 9 October 2024.

 

[65] Paolo Vesco et al., “United they stand: Findings from an escalation prediction competition,” International Interactions 48, no. 4, 860–896.

 

[66] Ibid.

 

[67] Author interview with academic participant, 15 October 2024.

 

[68] CETaS workshop, 4 September 2024.

 

[69] Author interview with academic participant, 15 October 2024.

 

[70] Author interview with academic participant, 15 October 2024; author interview with academic participant, 8 October 2024; author interview with industry participant, 18 September 2024.

 

[71] Author interview with academic participant, October 2024.

 

[72] Author interview with academic participant, 14 October 2024.

 

[73] Author interview with academic participant, 9 October 2024; author interview with industry participant, 10 October 2024.

 

[74] Author interview with academic participant, 9 October 2024; author interview with academic participant, 10 October 2024.

 

[75] Author interview (2) with government participant, 9 December 2024.

 

[76] Author interview with academic participant, 14 October 2024.

 

[77] Author interview with government participant, 10 October 2024; Giuseppe Nebbione, “Deep Neural Ranking for Crowdsourced Geopolitical Event Forecasting,” Computer Science and Engineering, May 2019.

 

[78] Author interview with academic participant, 14 October 2024.

 

[79] Author interview with government participant, 10 October 2024.

 

[80] Federal Government of the U.S., U.S. Army Cost Benefit Analysis Guide, Office of the Deputy Assistant Secretary of the Army (Cost and Economics), https://www.asafm.army.mil/Portals/72/Documents/Offices/CE/US%20Army%20Cost%20Benefit%20Analysis.pdf.

 

[81] N. Peter Whitehead et al., A Framework for Assessing the Costs and Benefits of Digital Engineering: A Systems Approach, RAND Corporation, March 2023, https://www.rand.org/pubs/research_reports/RRA2418-1.html.

 

[82] Eugenio Caterino, “What Is AI Training Data? Examples, Datasets and Providers,” Datatrade, last modified 13 January 2025, https://datarade.ai/data-categories/ai-ml-training-data.

 

[83] Whitehead et al., A Framework for Assessing the Costs and Benefits of Digital Engineering: A Systems Approach.

 

[84] Gaël Varoquaux et al., “Hype, Sustainability, and the Price of the Bigger-Is-Better Paradigm in AI,” arXiv, September 2024, https://arxiv.org/abs/2409.14160.

 

[85] Whitehead et al., A Framework for Assessing the Costs and Benefits of Digital Engineering: A Systems Approach.

 

[86] Ibid.

 

[87] Dylan Patel and Nathan Lambert, “DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate and AI Megaclusters,” Lex Fridman Podcast, https://lexfridman.com/deepseek-dylan-patel-nathan-lambert-transcript/.

 

[88] Dario Amodei, “Anthropic CEO on Claude, AGI & the Future of AI & Humanity,” Lex Fridman Podcast, https://lexfridman.com/dario-amodei-transcript.

 

[89] Alexandra Sasha Luccioni, Yacine Jernite, and Emma Strubell, “Power Hungry Processing: Watts Driving the Cost of AI Deployment?,” The 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024, 85–99.

 

[90] Alex De Vries, “The Growing Energy Footprint of Artificial Intelligence,” Joule 7, no. 10, October 2023, 2,191–2,194.

 

[91] Patel and Lambert, “DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate and AI Megaclusters.”

 

[92] James Ryseff, Brandon F. De Bruhl and Sydne J. Newberry, The Root Causes of Failure for AI Projects and How They Can Succeed, RAND Corporation, 2024, https://www.rand.org/pubs/research_reports/RRA2680-1.html.

 

[93] Alexandra Sasha Luccioni, Yacine Jernite and Emma Strubell, “Power Hungry Processing: Watts Driving the Cost of AI Deployment,” arXiv, 15 October 2024, https://arxiv.org/abs/2311.16863.

 

[94] Patel and Lambert, “DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate and AI Megaclusters.”

 

[95] Author interview with academic participant, 14 October 2024; author interview with academic participant, 15 October 2024; author interview with academic participant, 24 October 2024.

 

[96] Congressional Budget Office, “The Personnel Requirements and Costs of New Military Space Organizations,” May 2019, https://www.cbo.gov/publication/55178.

 

[97] Whitehead et al., A Framework for Assessing the Costs and Benefits of Digital Engineering: A Systems Approach.

 

[98] Author interview with academic participant, 9 October 2024.

 

[99] Timothy Morgan, “How Did DeepSeek Train Its AI Model On A Lot Less – And Crippled – Hardware?,” TheNextPlatform, 27 January 2025.

 

[100] Morgan, “How Did DeepSeek Train Its AI Model On A Lot Less – And Crippled – Hardware?.”

 

[101] Lucinda Shen, “How Is Stargate’s $500B Getting Funded?,” Axios, last modified 22 January 2025, https://www.axios.com/2025/01/23/stargate-trump-open-ai.

 

[102] Scott Singer, “DeepSeek and Other Chinese Firms Converge with Western Companies on AI Promises,” Carnegie Endowment for International Peace, 28 January 2025, https://carnegieendowment.org/research/2025/01/deepseek-and-other-chinese-firms-converge-with-western-companies-on-ai-promises?lang=en.

 

[103] Patel and Lambert, “DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate and AI Megaclusters.”

 

[104] Ibid.

 

[105] Author interview with academic participant, 8 October 2024.

 

[106] Author interview with government participant, 10 October 2024.

 

[107] Author interview with academic participant, 15 October 2024.

 

[108] Author interview (2) with industry participant, 8 October 2024.

 

[109] HMG Government, Evidence Summary: The Drivers of Defence Cost Inflation, Ministry of Defence, 2022, https://www.gov.uk/government/publications/evidence-summary-the-drivers-of-defence-cost-inflation.

 

[110] Author interview with academic participant, 10 October 2024; author interview with industry participant, 11 October 2024.

Citation information

Anna Knack, Nandita Balakrishnan and Timothy Clancy, "Applying AI to Strategic Warning," CETaS Research Reports (March 2025).

Back to top