In Japan, the board game Quoridor is not well-known. This article applies AlphaZero, a powerful deep reinforcement learning method, to Quoridor. Despite its simple design, AlphaZero has demonstrated the ability to outperform professional players in games like Go and Shogi. The goal is to deepen understanding of both the theoretical and practical aspects of AlphaZero through this application.
According to Wikipedia, Quoridor is a French abstract board game where players aim to reach the opposite side of the board by moving their pieces and placing walls.
The setup and rules are as follows:
- Setup: For two players, each starts at opposite ends of the board and has ten walls. For four players, each starts in their own corner with five walls.
- Gameplay: Players take turns either moving their piece one square in any direction or placing a wall on the board. Pieces can jump over other pieces but cannot cross walls. Walls cannot completely block a player’s path.
- Winning: The first player to reach the opposite side wins.
The simple rule that walls cannot completely block paths makes the game exciting until the end.
AlphaZero [1, 2] combines deep learning, search, and reinforcement learning. Let’s explore each component:
AlphaZero uses deep learning for intuition, similar to how professional players think about their best moves. It employs ResNet [3], a convolutional neural network used in image analysis. The game board, like an image, is a 2D array of information. The network can be represented as:
where s is the game state,
Games like Shogi, Go, Othello, and Quoridor are two-player zero-sum games with perfect information. The optimal strategy can be found using the minimax method, but it is impractical due to the large number of possible moves. Instead, AlphaZero uses Monte Carlo Tree Search (MCTS) to predict future moves.
MCTS decides the next move based on the Upper Confidence Bound (UCB) value:
where
By repeating simulations, actions with higher
In practice, the action
AlphaZero generates training data through self-play, updating ResNet parameters with this experience to create a more intelligent neural network. The process involves:
- Initializing ResNet parameters.
- Obtaining experience data through self-play.
- Updating ResNet parameters with the experience data.
- Repeating steps 2 and 3 multiple times.
- Game design (
game.py) - Deep learning implementation with ResNet (
dual_network.py) - Monte Carlo Tree Search implementation (
pv_mcts.py) - Data collection through self-play (
self_play.py) - Updating ResNet parameters (
train_network.py) - Comparing and updating the best parameters (
evaluate_network.py) - Evaluating the best player (
evaluate_best_player.py) - Running the entire training cycle (
train_cycle.py) - Implementing a game UI to play against the AI (
human_play.py)
This project references the Japanese book “AlphaZero: Deep Learning, Reinforcement Learning, and Search” for detailed explanations.
AlphaZero: Deep Learning, Reinforcement Learning, and Search
Due to the short training time, the game was trained on a 3x3 board. While this isn’t very exciting gameplay-wise, the implementation allows for easy adjustment to larger board sizes, such as the actual 9x9 size, which would demonstrate the reinforcement learning capabilities better.
The actual gameplay screen when running human_play.py looks like this. The number of walls for both the player and the enemy (AI) is set to one, matching the game size. To place a wall, click "Place Wall" below, select the vertical or horizontal direction, and then click the desired location (red grid points).
The AI wasn’t very strong this time due to the limited training time. However, with a longer training period, a stronger AI could be developed.
This article applied AlphaZero to the lesser-known Quoridor board game. AlphaZero can be adapted to any two-player zero-sum game with perfect information by modifying the game design. This could be an interesting project to see how strong an AI can be created for different games. If you enjoyed this article, please like it, and stay tuned for more articles.
[3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition”
- Python 3.x
- Libraries specified in
requirements.txt(e.g., TensorFlow or PyTorch, NumPy, etc.)
Clone the repository and install the dependencies:
git clone https://github.com/dorakingx/AlphaQuoridor.git
cd AlphaQuoridor
pip install -r requirements.txtTo run the complete training cycle, execute:
python train_cycle.pyThis script will handle data collection through self-play, update the network parameters, and evaluate model improvements.
To play a game against the trained AI, run:
python human_play.py
This launches the game UI, allowing you to challenge the AI directly.







