Skip to content

This project implements a Double DQN agent from scratch, including custom neural networks and GPU support in CUDA C++.

Notifications You must be signed in to change notification settings

Maxi1324/DDQN-FS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Double DQN From Scratch

This project implements a Double DQN agent from scratch, including custom neural networks and GPU support in CUDA C++.

Why could this be interesting to you?

This implementation is easy to read and tweak, self-contained (no external packages), and reasonably fast thanks to NVIDIA GPU support—great for learning, experiments, and especially off-policy workflows like Q-learning. Because the data doesn’t have to be loaded in the traditional way

  • Self-contained & zero-deps: Everything is written in CUDA C++, which makes modifications straightforward.
  • Heavy lifting on the GPU: Most computations are offloaded to the GPU.
  • Everything stays in VRAM: Keeps the working set in GPU memory, reducing copy overhead and enabling great parallelism.
  • Fast baseline: With supervised training, MNIST can be trained in under a second (once the data is loaded).

Implementation

General NN

  • Allocates all GPU memory upfront at the start of training
  • Gradient and value clipping supported
  • Optimizers: SGD and Adam
  • Full-batch training (entire batch at once)
  • Runs on CUDA with custom Kernels and with Tensor Cores via cuBLAS
  • Built-in NN inference

Q-Learning

  • DQN (vanilla) and Double DQN
  • Replay buffer fully on the GPU
  • Environment simulation fully on the GPU
  • Epsilon-greedy exploration policy

Setup

Requirements

  • Linux
  • NVIDIA driver (matching your CUDA Toolkit)
  • CUDA Toolkit (including cuBLAS support)
  • C++17 toolchain (nvcc + gcc or clang)

For Tensor Core support, use a GPU with Compute Capability ≥ 7.0 and enable FP16/TF32 math modes as appropriate.

Known Issues

  • Contains some dead code, as the project was not fully finished due to system errors that resulted in data loss and loss of motivation.
  • The most relevant parts are located in:
    • Native/Optimization/ – core training logic and GPU kernels
    • Native/Inference/ – inference examples and minimal usage
  • Everything mentioned under Implementation works correctly if used properly.
  • One original motivation was maximum speed by staying fully on GPU, but in many cases, modern frameworks like PyTorch are still faster due to extreme optimization and years of engineering.

About

This project implements a Double DQN agent from scratch, including custom neural networks and GPU support in CUDA C++.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published