-
Notifications
You must be signed in to change notification settings - Fork 931
[RFC] Add experimental support for AI/RL agent integration #7974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@NiteKat competition? |
|
@rouming reading though your readme i would suggest making thing work coordinates relative to the player rather then absolute to the world. That makes it a lot closer to how a player interacts with the game. I'm also not sure why you expose the fully world at all time rater then just the currently visible tiles? |
|
By the way do you know about |
Source/utils/mapping.cpp
Outdated
| struct entry_ptr; | ||
| static struct entry_ptr *tail_entry; | ||
|
|
||
| struct entry_ptr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this meant to be used? AFAICT, it's a bunch of pointers in a memory-mapped file.
- Is this to allow another application to get/set the values directly?
- Why is this file memory-mapped? The things that are being pointed to don't change their address during execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. We place all static objects of interest in the ".shared" [section] (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-section-variable-attribute). Everything else is handled by the linker. All we need to ensure is that all objects are properly aligned in the shared file and that there are no gaps between them. This is the only purpose of the entry_ptr structure: every static object is registered by a pointer in a singly linked list, allowing us to traverse all the registered static objects (these pointers) during early start and check that there are no gaps (padding) in between, which is the only purpose of the verify_no_padding() function.
When everything is settled and no new objects are planned for addition, the entry_ptr and verify_no_padding() function can be safely dropped. I use this verification to catch alignment bugs: when new object has added, then you access it from a python script and see some garbage, because the object was placed with 4 bytes padding.
Answering your questions:
- No, 3rd party application does not modify anything, always reads. The only exception is the ring buffer, then 3rd party app advances the write index. But this is not related to your question.
- Well, I assume the question would not make sense after you read the description above.
You ask valid questions. That indicates there was no description from my side about all these mechanics, sorry about that. Have to describe this carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it now, thanks for explaining.
Forcing a particular placement of static variables can impede optimization and isn't cross-platform (e.g. the mold linker doesn't support linker scripts).
Have you considered an IPC approach instead, e.g. sending events over a unix socket?
This will be more portable and also decouple internal representation from the communication protocol.
There is onging work to change the layout of some of these static structs to make them more flexible for modding, so relying on the particular layout is going to cause quite a bit of churn on your python script side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally avoided using sockets or pipes. These can introduce delays when fetching the latest state due to buffering issues. The questions about a protocol also arise: should I send everything on any small change or just send deltas? etc. While this is all doable, accessing the memory chunk directly is so straightforward that its simplicity outweighs everything else.
I am aware that this approach is not portable, but it can be hidden under a specific macro, like SHARED_STATE, or something. It would be disabled by default and used only when needed, which is already something.
There's also a plan B. I could eliminate these linker tricks and awkward object placement code in favor of parsing DWARF debug symbols from Python to obtain the addresses of static variables for the running DevilutionX instance. This would involve working with the procfs, mapping every static object from a /proc/$pid/mem, instead of a single big chunk. The main challenge here is DWARF parsing, but instead of implementing this in pure Python, this could be done using extra tools (GDB, dwarfdump, etc) but I did not go into it in detail, just thoughts. The downside of this solution is (as you mentioned) is constantly broken data protocol: someone merged two static arrays in DevilutionX, third-party python script broke. That's why would be great to upstream everything and cover with tests :) Or I'll need to sit in a deep branch chasing the latest master HEAD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the individual structures do not change shape won't the DWARF approach be more flexible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will, it will. But will also require some tricks, since you can't just mmap a part of the external address space, as I expected from the very beginning mapping the /proc/$pid/mem. Sigh. I'm still looking for something as fast as just accessing memory, no IO reads, no buffering. I'll share once have working code, let's see can we move forward with this approach or not.
I'm thinking out loud. If DevilutionX provides a straightforward way to access its internal engine states, then using SWIG or similar tools, bindings from C++/C headers for any other language could be generated. I'm personally interested in Python. I found it quite annoying to generate these bindings (mostly structures) manually, which I'm using for RL training. Do you guys think there is something practical that can be scripted and automated by accessing the internal engine state? Perhaps for tasks like unit tests or similar?
First of all, thanks for reading this through. That's a good question. Simple answer: I did that and I have the windowed approach in a separate branch. The long answer is that I'm not confident that I'm moving in the right direction with all parameters changes I make, and in order not to lose the performance already achieved ("Does the agent move to the right room? Yay! Oh no, it is stuck in the corner. Sigh" kind of a mood swing I constantly experience), I make very conservative steps forward. The windowed approach requires introducing a memory, another recurrent neural network layer. For example, RLlib, which I'm using, supports the LSTM (long short-term memory) feature, but again, I started with the simplest possible task (no monsters, full dungeon as an observation state) and am slowly increasing complexity. So of course, in the future, a windowed approach is a must, but first, I would like to see some signs of life behavior with a simpler task: "Hey agent, here is the full dungeon state! What could be easier? What will be your next action?". Another requirement in the TODO list is for the agent to make decisions based on the screenshot and pixel states, emulating a purely human-like approach, but that's another challenge. |
Thanks for reminding me about that. I saw this mode in the code, and I recently did a timedemo recording for the visibility fix, so I'm aware of it but forgot to thoroughly investigate. I probably don't need the patch I created, which disables animation for player movements. But I remember you mentioned that the timedemo does not always work when certain "timed" objects are activated. Could you remind me which objects those are exactly? Don't want to debug the agent while training, this is quite time consuming. |
Hello! I don't have anything for DevilutionX specifically, but I wrote an API for use with 1.09 Diablo executable here: https://github.com/nitekat/dapi it's not in 1.0 status but is functional enough to play a full game, both single player and multi player (all 3 difficulties). I stream an rule-based expert system in both modes regularly at my twitch. The way it works, it isn't immediately compatible with DevilutionX. We've talked about integrating the code into DevilutionX but never really worked on it. |
It was already addressed in the code some time ago, i just forgot. The issue was that things recorded at one speed didn't play back properly at other speed since some events where real time based. None of that would likely be relevant to your project any way. |
Hey! Nice, thanks for sharing. I like your windows remote process injection tricks :) Have not seen the win api for ages, cool stuff. |
A quite legitimate warning: if FMT_EXCEPTIONS is not defined we should return something from the function. Let it be an empty string. Signed-off-by: Roman Penyaev <[email protected]>
`ShowProgress` is called on any custom event, such as when a player warps to a new level. It is designed to handle only custom events and discards others from the queue. This behavior results in lost keystrokes sent by the AI agent. For instance, the issue was reproduced when attempting to pause the game by sending the PAUSE key, which was lost due to the behavior of `ShowProgress`. This change introduces a poller along with the custom handler. For the `ShowProgress` routine, the poller will peek at only custom events, leaving others in the queue. Signed-off-by: Roman Penyaev <[email protected]>
The patch includes a CMake compilation test that verifies whether the resulting binary includes the `__bss_start` and `_end` linker symbols. Should be the case for the GNU linker. If the binary contains these symbols, the `HAVE_LINKER_BSS_SYMBOLS` macro will be defined. In the next commit, the `HAVE_LINKER_BSS_SYMBOLS` macro will be used to share the .bss and .data sections with the external application. Stay tuned. Signed-off-by: Roman Penyaev <[email protected]>
Share essential game state (.bss and .data sections) over the shared memory for reinforcement learning. Also provides two shared ring queues for getting keys as input and share possible events. Sharing of the game state can be enabled by the following option in ini config: Share game state via file=<file> Signed-off-by: Roman Penyaev <[email protected]>
Patch introduces Gameplay.gameAndPlayerSeed for deterministic creation of new levels. Will be needed for headless mode and RL. Signed-off-by: Roman Penyaev <[email protected]>
This change introduces Graphics.headless option to run in headless
mode by changing the options ini file and not hard-code the value.
Also patch fixes HeadlessMode:
* Don't create a window
* Immediately start a game (exactly as for demo mode)
* Create a new player with 'AI Agent' name, otherwise fatal error
due to a missing save file
* Fully disable audio/sndfx during headless mode
* Init SDL with SDL_INIT_EVENTS, otherwise an error:
"The event system has been shut down"
Signed-off-by: Roman Penyaev <[email protected]>
Introduce a new option: Gameplay.gameLevel, which lets you load a specified level on new game start. Will be very useful for headless mode while AI training. Signed-off-by: Roman Penyaev <[email protected]>
This change introduces `Gameplay.noMonsters` option (`Disable all monsters=1` in `diablo.ini` file) which disables all monsters on the level. This can be useful for AI training. Signed-off-by: Roman Penyaev <[email protected]>
This change introduces `Gameplay.skipAnimation` option (`Skip animation=1` in `diablo.ini` file) which disables (or at list minimizes) ticks spent on animation. This should speed up AI training. Signed-off-by: Roman Penyaev <[email protected]>
This commit introduces the `Gameplay.noMonsterAutoPursuing` option in the `diablo.ini` file, which disables monster auto-pursuing on primary actions if set to `1`: Gameplay.noMonsterAutoPursuing=1 Once the primary action is pressed on a controller/pad, the hero begins pursuing a monster. If there are a lot of monsters around, repeatedly pressing the primary action button makes the hero run around the entire dungeon in a completely autonomous manner. The patch sets the amount of max steps to 0, disabling auto-pursuing. This should be useful for teaching the AI where to move the hero instead of always relying on the primary action button. Signed-off-by: Roman Penyaev <[email protected]>
diablo_state.py collects stdout prints and redirects them to stdout Signed-off-by: Roman Penyaev <[email protected]>
Mostly events like pause, load, save, etc. Will be collected by the AI worker. Signed-off-by: Roman Penyaev <[email protected]>
|
Hi, folks. With the latest push, I've simplified the state sharing logic by removing the need to know about the internal data and types, making the whole sharing state mechanism very generic. What I do is remap the whole regions containing the .bss and .data sections to a shared file. This should work on Linux where the DevilutionX binary is linked with the GNU linker. On other architectures, there shouldn't be any compilation errors, but just an This is not ideal because "help" from Devilution is still required when state sharing is needed. However, the implementation remains very trivial and can be supported on Windows if needed. What is most important is that this approach maintains performance (shared memory vs socket/pipes), and there is no need to implement any protocol wrappers for internals. What makes me happy is that the Python bindings are now generated automatically by parsing the In order to avoid having the state-sharing feature without practical use in the project, do you have any ideas about what could be tested by sharing the state with an external application (testing script)? As a toy example, I can offer a TUI for Diablo (like here). Diablo in a terminal. Nethack, but fast ;) |
|
Hm the link failed for me |
Strange, I picked the url from my gihub repo. It's possible that when you logged in, the url is became personalized. This one? |
|
No that's also 404, maybe it needs permissions |
Shit. 404 for me as well, the host thrashed the gif. Anyway, the picture I try to show I put here in the README. |
Hi folks,
I've been working on a machine learning project called DevilutionX-AI for a while now purely for fun and personal enjoyment. It is a reinforcement learning framework set in the Diablo environment. The framework uses
DevilutionXwith some modifications, and in this PR, I'd like to upstream these changes. Currently, this PR is marked as RFC (request for comments) because there's a possibility that something might break, tests might fail, and the tweaks I made to allow an RL agent to runDevilutionXinstances, learn, and control the player might not be necessary for the Diablo game. However, I believe some of the fixes might be beneficial for theDevilutionXproject, certain gameplay options might be interesting, and the approach to memory sharing in headless mode could be extremely useful for testing automation. So, my primary goal is to receive your feedback.Moreover, with this PR, I hope to attract people who are interested in reinforcement learning (disclaimer: I don't do any RL research professionally) or those who already have experience in this field, which would be amazing.
DevilutionX-AIis still far from completion, and the main goal of an RL agent that can clear at least the first level has not yet been achieved, so there are still many experiments and tweaks ahead.With this PR, I'd like to ask you to take a look at the patches. I am ready to tidy up everything that aligns with the spirit of the
DevilutionXproject and anything you'd be interested in merging into the master branch. Here's a brief list of the main changes:AI-Oriented Gameplay Changes
Shared memory implementation for reinforcement learning agents. Supports external key inputs and game event monitoring. Allows a third-party application to connect to a
DevilutionXinstance, obtain the state of the game engine, and send key presses. Gives control over the player and game state monitoring.Enabled line-buffered stdout (
setlinebuf) for the AI agent to consume the log from theDevilutionXengine. When manyDevilutionXinstances are started in parallel, the log from the engine is extremely useful for identifying abnormal situations or even seeing the backtrace of a crash.Added a
headlessmode option to start the game in non-windowed mode. Major mode for the AI training.Added an option to launch the game directly into a specified dungeon level.
Enables deterministic level and player generation for reproducible training by setting a seed.
Added an option to remove all monsters from the dungeon level to ease the exploration training task.
Added an option to skip most animation ticks to accelerate training speed.
Added an option to disable monster auto-pursuit behavior when pressing a primary action button does not lead to the pursuit of a nearby monster.
Various Fixes
Fixed missing events in the main event loop when running in headless mode, which was causing the AI agent to get stuck after an event had been sent, but no reaction occurred.
Fixed access to graphics and audio objects in
headlessmode. A few bugs were causing random crashes of theDevilutionXinstance.If
DevilutionXproject would benefit from adding an additional directory "ai/", with all the Python scripts responsible for reinforcement learning training and evaluation of an agent, I would be happy to upstream this part as well.