AlphaQuoridor

In Japan, the board game Quoridor is not well-known. This article applies AlphaZero, a powerful deep reinforcement learning method, to Quoridor. Despite its simple design, AlphaZero has demonstrated the ability to outperform professional players in games like Go and Shogi. The goal is to deepen understanding of both the theoretical and practical aspects of AlphaZero through this application.

What is Quoridor?

According to Wikipedia, Quoridor is a French abstract board game where players aim to reach the opposite side of the board by moving their pieces and placing walls.

Quoirdor — Wikipedia

The setup and rules are as follows:

Setup: For two players, each starts at opposite ends of the board and has ten walls. For four players, each starts in their own corner with five walls.
Gameplay: Players take turns either moving their piece one square in any direction or placing a wall on the board. Pieces can jump over other pieces but cannot cross walls. Walls cannot completely block a player’s path.
Winning: The first player to reach the opposite side wins.

The simple rule that walls cannot completely block paths makes the game exciting until the end.

Theory of AlphaZero

AlphaZero [1, 2] combines deep learning, search, and reinforcement learning. Let’s explore each component:

Deep Learning:

AlphaZero uses deep learning for intuition, similar to how professional players think about their best moves. It employs ResNet [3], a convolutional neural network used in image analysis. The game board, like an image, is a 2D array of information. The network can be represented as:

$$ \mathrm{ResNet(s)} = \pi(a|s), v(s) $$

where s is the game state, $\pi(a∣s)$ is the policy (probability of action a given state $s$), and $v(s)$ is the value ($+1$ if win,$-1$ if loss) of state $s$.

Search:

Games like Shogi, Go, Othello, and Quoridor are two-player zero-sum games with perfect information. The optimal strategy can be found using the minimax method, but it is impractical due to the large number of possible moves. Instead, AlphaZero uses Monte Carlo Tree Search (MCTS) to predict future moves.

MCTS decides the next move based on the Upper Confidence Bound (UCB) value:

$$ \mathrm{UCB}(s,a) = Q(s,a) + c\cdot\pi(a \mid s)\cdot\frac{\sqrt{N(s)}}{1 + N(s,a)} $$

$$ Q(s,a) = \frac{1}{N(s,a)} \sum_{i}^{N(s,a)} v_i\bigl(s \xrightarrow{a} s^\prime\bigr) $$

where $Q(s,a)$ is the value estimate of action a in state $s$, averaged over the number of simulations $N(s,a)$ transitioning from state $s$ to $s^\prime$. $N(s)$ is the total number of simulations for state $s$. The first term of the equation emphasizes exploiting high-value actions, while the second term focuses on exploring actions that haven’t been simulated much, balanced by the hyperparameter $c$.

By repeating simulations, actions with higher $N(s,a)$ become valuable moves.

In practice, the action $a$ to be taken in a given state $s$ is determined by the probability $\pi(a∣s)$, weighted by the Boltzmann distribution with temperature $T$.

$$ \pi(a \mid s) = \frac{N(s,a)^{1/T}} {\displaystyle \sum_{b} N(s,b)^{1/T}}. $$

Reinforcement Learning:

AlphaZero generates training data through self-play, updating ResNet parameters with this experience to create a more intelligent neural network. The process involves:

Initializing ResNet parameters.
Obtaining experience data through self-play.
Updating ResNet parameters with the experience data.
Repeating steps 2 and 3 multiple times.

Implementation of AlphaQuoridor

Game design (game.py)
Deep learning implementation with ResNet (dual_network.py)
Monte Carlo Tree Search implementation (pv_mcts.py)
Data collection through self-play (self_play.py)
Updating ResNet parameters (train_network.py)
Comparing and updating the best parameters (evaluate_network.py)
Evaluating the best player (evaluate_best_player.py)
Running the entire training cycle (train_cycle.py)
Implementing a game UI to play against the AI (human_play.py)

This project references the Japanese book “AlphaZero: Deep Learning, Reinforcement Learning, and Search” for detailed explanations.

AlphaZero: Deep Learning, Reinforcement Learning, and Search

Due to the short training time, the game was trained on a 3x3 board. While this isn’t very exciting gameplay-wise, the implementation allows for easy adjustment to larger board sizes, such as the actual 9x9 size, which would demonstrate the reinforcement learning capabilities better.

The actual gameplay screen when running human_play.py looks like this. The number of walls for both the player and the enemy (AI) is set to one, matching the game size. To place a wall, click "Place Wall" below, select the vertical or horizontal direction, and then click the desired location (red grid points).

Initial state

Mid-game

End-game scenarios for winning

End-game scenarios for losing

The AI wasn’t very strong this time due to the limited training time. However, with a longer training period, a stronger AI could be developed.

Summary

This article applied AlphaZero to the lesser-known Quoridor board game. AlphaZero can be adapted to any two-player zero-sum game with perfect information by modifying the game design. This could be an interesting project to see how strong an AI can be created for different games. If you enjoyed this article, please like it, and stay tuned for more articles.

References

[1] D. Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”

[2] D. Silver et al., “A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go through Self-Play”

[3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition”

Getting Started

Requirements

Python 3.x
Libraries specified in requirements.txt (e.g., TensorFlow or PyTorch, NumPy, etc.)

Installation

Clone the repository and install the dependencies:

git clone https://github.com/dorakingx/AlphaQuoridor.git
cd AlphaQuoridor
pip install -r requirements.txt

Usage

Training the AI

To run the complete training cycle, execute:

python train_cycle.py

This script will handle data collection through self-play, update the network parameters, and evaluate model improvements.

Playing Against the AI

To play a game against the trained AI, run:

python human_play.py

This launches the game UI, allowing you to challenge the AI directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AlphaQuoridor

What is Quoridor?

Theory of AlphaZero

Deep Learning:

Search:

Reinforcement Learning:

Implementation of AlphaQuoridor

Initial state

Mid-game

End-game scenarios for winning

End-game scenarios for losing

Summary

References

Getting Started

Requirements

Installation

Usage

Training the AI

Playing Against the AI

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
__pycache__		__pycache__
data		data
images		images
model		model
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
dual_network.py		dual_network.py
evaluate_best_player.py		evaluate_best_player.py
evaluate_network.py		evaluate_network.py
game.py		game.py
human_play.py		human_play.py
pv_mcts.py		pv_mcts.py
requirements.txt		requirements.txt
self_play.py		self_play.py
train_cycle.py		train_cycle.py
train_network.py		train_network.py

dorakingx/AlphaQuoridor

Folders and files

Latest commit

History

Repository files navigation

AlphaQuoridor

What is Quoridor?

Theory of AlphaZero

Deep Learning:

Search:

Reinforcement Learning:

Implementation of AlphaQuoridor

Initial state

Mid-game

End-game scenarios for winning

End-game scenarios for losing

Summary

References

Getting Started

Requirements

Installation

Usage

Training the AI

Playing Against the AI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages