When will LLMs be better at Paradox grand strategy games than the in-game AI for NPCs?
3
Ṁ455
2031

Invalid contract

Resolution Criteria

This market resolves to the date when Large Language Models (LLMs) are demonstrably better at playing Paradox grand strategy games (such as Europa Universalis, Crusader Kings, Hearts of Iron, Stellaris, or Victoria) than the built-in AI that controls non-player characters (or nations.)

The relevant Paradox games are those current at the time of resolution.

If Paradox integrates LLMs into the AI for NPCs, that counts as admitting that LLMs are better at the task, and this market will resolve to the date the relevant game (or patch, or DLC) is released to the public.

Otherwise, this market will resolve when there is publicly available code I can run, alongside a copy of one of the then-current generation of Paradox GSGs, which consistently plays the game well (in single-player mode.) It doesn't need to achieve world conquest or anything, or even play as well as any given human player would play. But it needs to consistently avoid faceplanting. If it semi-consistently achive success (relative to its starting position), the way even a significantly less-than-median human player can, that's enough to resolve the market.

The level of skill I'm talking about here is one a human player can reach within tens of hours of play time; this isn't meant to be a high bar.

The LLM-based AI can be specialized for playing Paradox games, or one particular game. It can be fine-tunes to the task, or include e.g. specialized tool-calling. I need to be able to run it against a game running on my computer (or in a virtual machine), but the model itself need not be a local one; i.e. it can call the API of a proprietary hosted LLM like Claude or GPT.

As the resolution criteria is somewhat subjective, I will not bet on this market.

  • Update 2025-06-06 (PST) (AI summary of creator comment): The creator has clarified the allowed input mechanisms for the LLM when evaluating its ability to play the game:

    • The LLM should interact with the game using an interface similar to what a human player uses.

    • Allowed inputs include sensory information a human would receive, such as the screen and audio.

    • Save files are not considered a primary input method for the LLM's ongoing gameplay.

    • The creator mentions the style of "Claude Plays Pokemon" as an example of the intended interaction.

Get
Ṁ1,000
and
S3.00
Sort by:

What's the allowed input to the LLM? Screenshots, save files, etc?

I was thinking something in the style of Claude Plays Pokemon. Some harness connecting the LLM to the same basic interface that humans use. Not save files, but the definitely the screen, audio, and so forth—anything a human would receive while normally playing the game is certainly valid.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules