Double DQN From Scratch

This project implements a Double DQN agent from scratch, including custom neural networks and GPU support in CUDA C++.

Why could this be interesting to you?

This implementation is easy to read and tweak, self-contained (no external packages), and reasonably fast thanks to NVIDIA GPU support—great for learning, experiments, and especially off-policy workflows like Q-learning. Because the data doesn’t have to be loaded in the traditional way

Self-contained & zero-deps: Everything is written in CUDA C++, which makes modifications straightforward.
Heavy lifting on the GPU: Most computations are offloaded to the GPU.
Everything stays in VRAM: Keeps the working set in GPU memory, reducing copy overhead and enabling great parallelism.
Fast baseline: With supervised training, MNIST can be trained in under a second (once the data is loaded).

Implementation

General NN

Allocates all GPU memory upfront at the start of training
Gradient and value clipping supported
Optimizers: SGD and Adam
Full-batch training (entire batch at once)
Runs on CUDA with custom Kernels and with Tensor Cores via cuBLAS
Built-in NN inference

Q-Learning

DQN (vanilla) and Double DQN
Replay buffer fully on the GPU
Environment simulation fully on the GPU
Epsilon-greedy exploration policy

Setup

Requirements

Linux
NVIDIA driver (matching your CUDA Toolkit)
CUDA Toolkit (including cuBLAS support)
C++17 toolchain (nvcc + gcc or clang)

For Tensor Core support, use a GPU with Compute Capability ≥ 7.0 and enable FP16/TF32 math modes as appropriate.

Known Issues

Contains some dead code, as the project was not fully finished due to system errors that resulted in data loss and loss of motivation.
The most relevant parts are located in:
- Native/Optimization/ – core training logic and GPU kernels
- Native/Inference/ – inference examples and minimal usage
Everything mentioned under Implementation works correctly if used properly.
One original motivation was maximum speed by staying fully on GPU, but in many cases, modern frameworks like PyTorch are still faster due to extreme optimization and years of engineering.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Application		Application
Native		Native
datasets/mnist		datasets/mnist
CMakeLists.txt		CMakeLists.txt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Double DQN From Scratch

Why could this be interesting to you?

Implementation

General NN

Q-Learning

Setup

Requirements

Known Issues

About

Uh oh!

Releases

Packages

Languages

Maxi1324/DDQN-FS

Folders and files

Latest commit

History

Repository files navigation

Double DQN From Scratch

Why could this be interesting to you?

Implementation

General NN

Q-Learning

Setup

Requirements

Known Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages