[RFC] Add experimental support for AI/RL agent integration #7974

rouming · 2025-05-04T20:19:42Z

Hi folks,

I've been working on a machine learning project called DevilutionX-AI for a while now purely for fun and personal enjoyment. It is a reinforcement learning framework set in the Diablo environment. The framework uses DevilutionX with some modifications, and in this PR, I'd like to upstream these changes. Currently, this PR is marked as RFC (request for comments) because there's a possibility that something might break, tests might fail, and the tweaks I made to allow an RL agent to run DevilutionX instances, learn, and control the player might not be necessary for the Diablo game. However, I believe some of the fixes might be beneficial for the DevilutionX project, certain gameplay options might be interesting, and the approach to memory sharing in headless mode could be extremely useful for testing automation. So, my primary goal is to receive your feedback.

Moreover, with this PR, I hope to attract people who are interested in reinforcement learning (disclaimer: I don't do any RL research professionally) or those who already have experience in this field, which would be amazing. DevilutionX-AI is still far from completion, and the main goal of an RL agent that can clear at least the first level has not yet been achieved, so there are still many experiments and tweaks ahead.

With this PR, I'd like to ask you to take a look at the patches. I am ready to tidy up everything that aligns with the spirit of the DevilutionX project and anything you'd be interested in merging into the master branch. Here's a brief list of the main changes:

AI-Oriented Gameplay Changes

Shared memory implementation for reinforcement learning agents. Supports external key inputs and game event monitoring. Allows a third-party application to connect to a DevilutionX instance, obtain the state of the game engine, and send key presses. Gives control over the player and game state monitoring.
Enabled line-buffered stdout (setlinebuf) for the AI agent to consume the log from the DevilutionX engine. When many DevilutionX instances are started in parallel, the log from the engine is extremely useful for identifying abnormal situations or even seeing the backtrace of a crash.
Added a headless mode option to start the game in non-windowed mode. Major mode for the AI training.
Added an option to launch the game directly into a specified dungeon level.
Enables deterministic level and player generation for reproducible training by setting a seed.
Added an option to remove all monsters from the dungeon level to ease the exploration training task.
Added an option to skip most animation ticks to accelerate training speed.
Added an option to disable monster auto-pursuit behavior when pressing a primary action button does not lead to the pursuit of a nearby monster.

Various Fixes

Fixed missing events in the main event loop when running in headless mode, which was causing the AI agent to get stuck after an event had been sent, but no reaction occurred.
Fixed access to graphics and audio objects in headless mode. A few bugs were causing random crashes of the DevilutionX instance.

If DevilutionX project would benefit from adding an additional directory "ai/", with all the Python scripts responsible for reinforcement learning training and evaluation of an agent, I would be happy to upstream this part as well.

AJenbo · 2025-05-04T20:25:38Z

@NiteKat competition?

AJenbo · 2025-05-04T20:39:48Z

@rouming reading though your readme i would suggest making thing work coordinates relative to the player rather then absolute to the world. That makes it a lot closer to how a player interacts with the game. I'm also not sure why you expose the fully world at all time rater then just the currently visible tiles?

AJenbo · 2025-05-04T22:39:18Z

By the way do you know about --timedemo which disables frame limitation and only runs game ticks? Might be useful for speeding up training.

glebm · 2025-05-05T07:56:48Z

Source/utils/mapping.cpp

+struct entry_ptr;
+static struct entry_ptr *tail_entry;
+
+struct entry_ptr {


How is this meant to be used? AFAICT, it's a bunch of pointers in a memory-mapped file.

Is this to allow another application to get/set the values directly?

Why is this file memory-mapped? The things that are being pointed to don't change their address during execution.

Not quite. We place all static objects of interest in the ".shared" [section] (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-section-variable-attribute). Everything else is handled by the linker. All we need to ensure is that all objects are properly aligned in the shared file and that there are no gaps between them. This is the only purpose of the entry_ptr structure: every static object is registered by a pointer in a singly linked list, allowing us to traverse all the registered static objects (these pointers) during early start and check that there are no gaps (padding) in between, which is the only purpose of the verify_no_padding() function.

When everything is settled and no new objects are planned for addition, the entry_ptr and verify_no_padding() function can be safely dropped. I use this verification to catch alignment bugs: when new object has added, then you access it from a python script and see some garbage, because the object was placed with 4 bytes padding.

Answering your questions:

No, 3rd party application does not modify anything, always reads. The only exception is the ring buffer, then 3rd party app advances the write index. But this is not related to your question.

Well, I assume the question would not make sense after you read the description above.

You ask valid questions. That indicates there was no description from my side about all these mechanics, sorry about that. Have to describe this carefully.

I get it now, thanks for explaining.

Forcing a particular placement of static variables can impede optimization and isn't cross-platform (e.g. the mold linker doesn't support linker scripts).

Have you considered an IPC approach instead, e.g. sending events over a unix socket?
This will be more portable and also decouple internal representation from the communication protocol.

There is onging work to change the layout of some of these static structs to make them more flexible for modding, so relying on the particular layout is going to cause quite a bit of churn on your python script side.

I intentionally avoided using sockets or pipes. These can introduce delays when fetching the latest state due to buffering issues. The questions about a protocol also arise: should I send everything on any small change or just send deltas? etc. While this is all doable, accessing the memory chunk directly is so straightforward that its simplicity outweighs everything else.

I am aware that this approach is not portable, but it can be hidden under a specific macro, like SHARED_STATE, or something. It would be disabled by default and used only when needed, which is already something.

There's also a plan B. I could eliminate these linker tricks and awkward object placement code in favor of parsing DWARF debug symbols from Python to obtain the addresses of static variables for the running DevilutionX instance. This would involve working with the procfs, mapping every static object from a /proc/$pid/mem, instead of a single big chunk. The main challenge here is DWARF parsing, but instead of implementing this in pure Python, this could be done using extra tools (GDB, dwarfdump, etc) but I did not go into it in detail, just thoughts. The downside of this solution is (as you mentioned) is constantly broken data protocol: someone merged two static arrays in DevilutionX, third-party python script broke. That's why would be great to upstream everything and cover with tests :) Or I'll need to sit in a deep branch chasing the latest master HEAD.

As long as the individual structures do not change shape won't the DWARF approach be more flexible?

It will, it will. But will also require some tricks, since you can't just mmap a part of the external address space, as I expected from the very beginning mapping the /proc/$pid/mem. Sigh. I'm still looking for something as fast as just accessing memory, no IO reads, no buffering. I'll share once have working code, let's see can we move forward with this approach or not.

I'm thinking out loud. If DevilutionX provides a straightforward way to access its internal engine states, then using SWIG or similar tools, bindings from C++/C headers for any other language could be generated. I'm personally interested in Python. I found it quite annoying to generate these bindings (mostly structures) manually, which I'm using for RL training. Do you guys think there is something practical that can be scripted and automated by accessing the internal engine state? Perhaps for tasks like unit tests or similar?

rouming · 2025-05-05T08:49:39Z

@NiteKat competition?

Awesome! Eager to see the results, @NiteKat, please share.

rouming · 2025-05-05T08:52:27Z

@rouming reading though your readme i would suggest making thing work coordinates relative to the player rather then absolute to the world. That makes it a lot closer to how a player interacts with the game. I'm also not sure why you expose the fully world at all time rater then just the currently visible tiles?

First of all, thanks for reading this through. That's a good question. Simple answer: I did that and I have the windowed approach in a separate branch. The long answer is that I'm not confident that I'm moving in the right direction with all parameters changes I make, and in order not to lose the performance already achieved ("Does the agent move to the right room? Yay! Oh no, it is stuck in the corner. Sigh" kind of a mood swing I constantly experience), I make very conservative steps forward. The windowed approach requires introducing a memory, another recurrent neural network layer. For example, RLlib, which I'm using, supports the LSTM (long short-term memory) feature, but again, I started with the simplest possible task (no monsters, full dungeon as an observation state) and am slowly increasing complexity. So of course, in the future, a windowed approach is a must, but first, I would like to see some signs of life behavior with a simpler task: "Hey agent, here is the full dungeon state! What could be easier? What will be your next action?". Another requirement in the TODO list is for the agent to make decisions based on the screenshot and pixel states, emulating a purely human-like approach, but that's another challenge.

rouming · 2025-05-05T08:58:05Z

By the way do you know about --timedemo which disables frame limitation and only runs game ticks? Might be useful for speeding up training.

Thanks for reminding me about that. I saw this mode in the code, and I recently did a timedemo recording for the visibility fix, so I'm aware of it but forgot to thoroughly investigate. I probably don't need the patch I created, which disables animation for player movements. But I remember you mentioned that the timedemo does not always work when certain "timed" objects are activated. Could you remind me which objects those are exactly? Don't want to debug the agent while training, this is quite time consuming.

NiteKat · 2025-05-06T19:16:28Z

@NiteKat competition?

Awesome! Eager to see the results, @NiteKat, please share.

Hello! I don't have anything for DevilutionX specifically, but I wrote an API for use with 1.09 Diablo executable here: https://github.com/nitekat/dapi

it's not in 1.0 status but is functional enough to play a full game, both single player and multi player (all 3 difficulties). I stream an rule-based expert system in both modes regularly at my twitch. The way it works, it isn't immediately compatible with DevilutionX. We've talked about integrating the code into DevilutionX but never really worked on it.

AJenbo · 2025-05-06T19:28:52Z

But I remember you mentioned that the timedemo does not always work when certain "timed" objects are activated.

It was already addressed in the code some time ago, i just forgot. The issue was that things recorded at one speed didn't play back properly at other speed since some events where real time based. None of that would likely be relevant to your project any way.

rouming · 2025-05-07T13:37:24Z

Hello! I don't have anything for DevilutionX specifically, but I wrote an API for use with 1.09 Diablo executable here: https://github.com/nitekat/dapi

it's not in 1.0 status but is functional enough to play a full game, both single player and multi player (all 3 difficulties). I stream an rule-based expert system in both modes regularly at my twitch. The way it works, it isn't immediately compatible with DevilutionX. We've talked about integrating the code into DevilutionX but never really worked on it.

Hey! Nice, thanks for sharing. I like your windows remote process injection tricks :) Have not seen the win api for ages, cool stuff.

A quite legitimate warning: if FMT_EXCEPTIONS is not defined we should return something from the function. Let it be an empty string. Signed-off-by: Roman Penyaev <[email protected]>

`ShowProgress` is called on any custom event, such as when a player warps to a new level. It is designed to handle only custom events and discards others from the queue. This behavior results in lost keystrokes sent by the AI agent. For instance, the issue was reproduced when attempting to pause the game by sending the PAUSE key, which was lost due to the behavior of `ShowProgress`. This change introduces a poller along with the custom handler. For the `ShowProgress` routine, the poller will peek at only custom events, leaving others in the queue. Signed-off-by: Roman Penyaev <[email protected]>

The patch includes a CMake compilation test that verifies whether the resulting binary includes the `__bss_start` and `_end` linker symbols. Should be the case for the GNU linker. If the binary contains these symbols, the `HAVE_LINKER_BSS_SYMBOLS` macro will be defined. In the next commit, the `HAVE_LINKER_BSS_SYMBOLS` macro will be used to share the .bss and .data sections with the external application. Stay tuned. Signed-off-by: Roman Penyaev <[email protected]>

Share essential game state (.bss and .data sections) over the shared memory for reinforcement learning. Also provides two shared ring queues for getting keys as input and share possible events. Sharing of the game state can be enabled by the following option in ini config: Share game state via file=<file> Signed-off-by: Roman Penyaev <[email protected]>

Patch introduces Gameplay.gameAndPlayerSeed for deterministic creation of new levels. Will be needed for headless mode and RL. Signed-off-by: Roman Penyaev <[email protected]>

This change introduces Graphics.headless option to run in headless mode by changing the options ini file and not hard-code the value. Also patch fixes HeadlessMode: * Don't create a window * Immediately start a game (exactly as for demo mode) * Create a new player with 'AI Agent' name, otherwise fatal error due to a missing save file * Fully disable audio/sndfx during headless mode * Init SDL with SDL_INIT_EVENTS, otherwise an error: "The event system has been shut down" Signed-off-by: Roman Penyaev <[email protected]>

Introduce a new option: Gameplay.gameLevel, which lets you load a specified level on new game start. Will be very useful for headless mode while AI training. Signed-off-by: Roman Penyaev <[email protected]>

This change introduces `Gameplay.noMonsters` option (`Disable all monsters=1` in `diablo.ini` file) which disables all monsters on the level. This can be useful for AI training. Signed-off-by: Roman Penyaev <[email protected]>

This change introduces `Gameplay.skipAnimation` option (`Skip animation=1` in `diablo.ini` file) which disables (or at list minimizes) ticks spent on animation. This should speed up AI training. Signed-off-by: Roman Penyaev <[email protected]>

This commit introduces the `Gameplay.noMonsterAutoPursuing` option in the `diablo.ini` file, which disables monster auto-pursuing on primary actions if set to `1`: Gameplay.noMonsterAutoPursuing=1 Once the primary action is pressed on a controller/pad, the hero begins pursuing a monster. If there are a lot of monsters around, repeatedly pressing the primary action button makes the hero run around the entire dungeon in a completely autonomous manner. The patch sets the amount of max steps to 0, disabling auto-pursuing. This should be useful for teaching the AI where to move the hero instead of always relying on the primary action button. Signed-off-by: Roman Penyaev <[email protected]>

diablo_state.py collects stdout prints and redirects them to stdout Signed-off-by: Roman Penyaev <[email protected]>

Mostly events like pause, load, save, etc. Will be collected by the AI worker. Signed-off-by: Roman Penyaev <[email protected]>

rouming · 2025-05-21T18:27:29Z

Hi, folks. With the latest push, I've simplified the state sharing logic by removing the need to know about the internal data and types, making the whole sharing state mechanism very generic. What I do is remap the whole regions containing the .bss and .data sections to a shared file. This should work on Linux where the DevilutionX binary is linked with the GNU linker. On other architectures, there shouldn't be any compilation errors, but just an exit(1) with an error message if someone tries to run the game with the "Share game state via file=" option enabled.

This is not ideal because "help" from Devilution is still required when state sharing is needed. However, the implementation remains very trivial and can be supported on Windows if needed. What is most important is that this approach maintains performance (shared memory vs socket/pipes), and there is no need to implement any protocol wrappers for internals.

What makes me happy is that the Python bindings are now generated automatically by parsing the devilutionx binary with the help of GDB (of course, debug symbols are required). With some small changes to the generator, any other language can be supported as well (up to a certain level of complexity, of course). This generator took me some time to implement, but as a result, I have generated Python structures whose sizes and fields match the original C++ structures (here is the example for those who are curious). These were extracted from the list of static variables I need for AI training or running Diablo in TUI mode, so the list of variables can be modified any time.

In order to avoid having the state-sharing feature without practical use in the project, do you have any ideas about what could be tested by sharing the state with an external application (testing script)? As a toy example, I can offer a TUI for Diablo (like here). Diablo in a terminal. Nethack, but fast ;)

AJenbo · 2025-05-22T10:26:53Z

Hm the link failed for me

rouming · 2025-05-22T12:57:16Z

Hm the link failed for me

Strange, I picked the url from my gihub repo. It's possible that when you logged in, the url is became personalized. This one?

AJenbo · 2025-05-23T16:28:44Z

No that's also 404, maybe it needs permissions

rouming · 2025-05-23T16:48:41Z

No that's also 404, maybe it needs permissions

Shit. 404 for me as well, the host thrashed the gif. Anyway, the picture I try to show I put here in the README.

glebm reviewed May 5, 2025

View reviewed changes

rouming added 12 commits May 21, 2025 15:32

utils/log.gpp: fix annoying gcc warning

31c2eeb

A quite legitimate warning: if FMT_EXCEPTIONS is not defined we should return something from the function. Let it be an empty string. Signed-off-by: Roman Penyaev <[email protected]>

options,random: introduce Gameplay.gameAndPlayerSeed

7f0bbbd

Patch introduces Gameplay.gameAndPlayerSeed for deterministic creation of new levels. Will be needed for headless mode and RL. Signed-off-by: Roman Penyaev <[email protected]>

diablo,options: introduce Gameplay.gameLevel option

03047f6

Introduce a new option: Gameplay.gameLevel, which lets you load a specified level on new game start. Will be very useful for headless mode while AI training. Signed-off-by: Roman Penyaev <[email protected]>

diablo,options: introduce Gameplay.noMonsters option

343f322

This change introduces `Gameplay.noMonsters` option (`Disable all monsters=1` in `diablo.ini` file) which disables all monsters on the level. This can be useful for AI training. Signed-off-by: Roman Penyaev <[email protected]>

diablo: add setlinebuf for stdout

bd1ed28

diablo_state.py collects stdout prints and redirects them to stdout Signed-off-by: Roman Penyaev <[email protected]>

diablo: add more logs to print to stdout

c46656d

Mostly events like pause, load, save, etc. Will be collected by the AI worker. Signed-off-by: Roman Penyaev <[email protected]>

rouming force-pushed the port-ai-patches branch from 459f4f2 to c46656d Compare May 21, 2025 18:23

rouming mentioned this pull request May 23, 2025

Various of fixes for the HeadlessMode and more #8011

Closed

[RFC] Add experimental support for AI/RL agent integration #7974

Are you sure you want to change the base?

[RFC] Add experimental support for AI/RL agent integration #7974

Uh oh!

Conversation

rouming commented May 4, 2025

AI-Oriented Gameplay Changes

Various Fixes

Uh oh!

AJenbo commented May 4, 2025

Uh oh!

AJenbo commented May 4, 2025

Uh oh!

AJenbo commented May 4, 2025

Uh oh!

glebm May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rouming May 5, 2025

Choose a reason for hiding this comment

Uh oh!

glebm May 5, 2025

Choose a reason for hiding this comment

Uh oh!

rouming May 5, 2025

Choose a reason for hiding this comment

Uh oh!

AJenbo May 6, 2025

Choose a reason for hiding this comment

Uh oh!

rouming May 7, 2025

Choose a reason for hiding this comment

Uh oh!

rouming commented May 5, 2025

Uh oh!

rouming commented May 5, 2025

Uh oh!

rouming commented May 5, 2025

Uh oh!

NiteKat commented May 6, 2025

Uh oh!

AJenbo commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rouming commented May 7, 2025

Uh oh!

rouming commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AJenbo commented May 22, 2025

Uh oh!

rouming commented May 22, 2025

Uh oh!

AJenbo commented May 23, 2025

Uh oh!

rouming commented May 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

glebm May 5, 2025 •

edited

Loading

AJenbo commented May 6, 2025 •

edited

Loading

rouming commented May 21, 2025 •

edited

Loading