Extended final report: Download PDF

Introduction

Gomoku is an abstract strategy board game. Also called Gobang or Five in a Row, it is traditionally played with Go pieces (black and white stones) on a go board with 19x19 (15x15) intersections. This game is known in several countries under different names. Black plays first if white just lost in the last game, and players alternate in placing a stone of their color on an empty intersection. The winner is the first player to get an unbroken row of five stones horizontally, vertically, or diagonally.

intro349final

Figure 1: Example of Gomoku

Our task is to make a Gomoku game AI which can learn the rules of Gomoku by playing games with itself. It is similar with the AlphaGo machine which is a really popular and amazing application. This kind of machine learning algorithm has a wide application scenario. No matter whichever kind of the game or other problem we deal with, what we need to do is to make the “computer1”, which is the trainee, to play games with the “computer2”, which has all relative experience and rules of Gomoku game, it will learn automatically and get a good result. That is the reason why AlphaGo team and such kind of teams have fast development recently.

Design

Before our implementing any machine learning techniques to train a computer to play Gomoku, we need to create an interface which would allow us to simulate a game of Gomoku. This game involves two kinds of chess, player, AI, rules and so on. We implement game class, AI class, player class, gomoku class with Python by ourselves, without using any existing package.

In the game interface part, we made a very useful and interactive design. Firstly, users can choose to play with the AI which is trained before or to play with another player. The first mode allows the AI to play with itself before to train the model and the second mode tends to enhance the interactive function which allows users to play Gomoku with friends. Then, users can set their specific Gomoku rules, including the size of the chess board (from 3x3 to 10x10) and the number of chess to win (more than 2). This kind of design is to increase the interest of this game and to show that different rules won’t change the basic algorithm since they have the same mechanism.

In the game process part, we built up a “dealer” system as the original AI, with which users can play when they don’t want to train a better and clever AI. There are three kinds of strategies: attack, defend and neutral. There is a grading system to help AI make decision which kind of strategy it will take in the next step. This system is based on judging the situation, the grades of AI and player and which kind of strategy leads to a higher score.

In the AI part, we use heuristic and reinforcement learning techniques to train the model. After researching reinforcement learning techniques, we decided to use Q-learning to train our simulation player with how to play Gomoku because it learns the state-action value function and the exploration policy. It is also an off-policy method(an off-policy learner learns the value of the optimal policy independently of the agent's actions), which gathers information from partial random moves. Comparing to the traditional Q-learning method, we combine it with the heuristic method. This kind of combination can speed up the learning process since it simplifies the gradient descent learning process. We set five intelligence coefficients to use as alpha parameters and gamma parameters of the heuristic algorithm in three strategies (attack, defend, neutral). At the beginning of the training process, we initialize five coefficients to build up the original model and then make some changes in these five parameters to make a new mode. Then the new model plays with the older one. If the new one wins, we will change the five coefficients to the new version, or we will leave them unchanged in this round.

Screenshots of game process:

A. Log in

When you log in, you can choose to play with AI or another player: jynuxhkrzktiu6x95eqp7tfbp04hd5o70kaylshwajbluuiesbvbtxbd0zstxxg3ghnlkmkg1nwkzmt6sd8v7q8i5wfnlljrxppckyiojt89jfh3c6zetdi4xux8aszte4kwtds

B. Choose Mode

After you log in, you can set up your game rules 3omsewu9l19woshomrmeoktjytbyoivhmnjtuac3abupsbb5afknfgo3dap5zbqtkqvbjya1uteynwyk0efsy94s6hppilru6nrs3-0hv-46zpryzv6c1qvpugoqt9t4kw_jhp8

C. If you choose to train the AI model, it will play games with itself for a certain rounds which is set by users and optimize its parameter to learn rules better.

lvp6h10ftjjp9txgmyxt4sn2jy0thdvmh2oudfkifs2hy-i3poblv360y-i0-jxtxadncinycapfavtzaxu1vd7aqxw7i_vcfvnwirborqhh60f06ukmvawigappne_mruvcxpw wz-hdjx3wxqpwfi7cduoxpdhfjq-grg5-_lbxb3syabtmonguqz3c5yoolms3heutqcxkwxly4-ziicyq2yftv-2j4ktse3pwuxhtgaz22bj96g9y7shjuagkazu5mxievikihe

D. After the training process, users can play Gomoku with a better AI. Users can play with AI by inputting the location of chess.

6ayz_cemm_ffjmkrwjo1vwo0ybhz8soyemxzw0ztbsoi3_vsdxtxcudg0ujoo28aydvfocra5mnasl0mftqcuymkumciloxbxfjwtnywpmjqx901qbrzlburhvtwau8tmsmn-iq

Results and Analysis

To judge the performance of our algorithm, we made our interface play a large number of games and recorded the win percentage of our new model over the older model. Our result shows that our player was clearly learning the rules of the game and refining its strategy after each round of play. Its win percentage went up rapidly within the first few hundred games as shown in the figures. Instead of playing with the original model, we made each new model play game with the old model of the last round. This setting is much more helpful for us to learn whether our model is developing in each round. If we just make new model play with the original version, it will always win even though the model doesn’t develop after several rounds. But based on our situation, the win percentage will increase only if the model develops continuously.

Then we set different initialization values for five coefficients at the beginning and we find that this kind of change will not change the final result of the model parameters. It will also take about hundreds of rounds to get a better result and to be stable finally.

The results of accuracy are attached below. From the results, we can find out the AI model develops fast in about first 3000 rounds and then it is much more stable after 3000 rounds and its accuracy is about 96.3% after 10000 rounds. The detailed result is stored in our “result_gomoku.csv” document.

screen shot 2016-06-03 at 22 50 59

Figure 2: Win percentage distribution with maximum 3000 rounds

screen shot 2016-06-03 at 22 51 16

Figure 3: Win percentage distribution with maximum 10000 rounds

Conclusion and Future Work

By implementing the combination of heuristic and reinforcement algorithm using Python by ourselves, we are able to build a Gomoku AI player which can quickly learn the rules of Gomoku and eventually learns more advanced strategies, and to enable it with a much higher win percentage than its original version. And finally about 100% win percentage means that our model is stable enough in such a size of board size and the specific rules.

In future, we intend to add the traditional gradient descent optimization method to this project and compare the convergence speed of this method with the convergence speed using our heuristic method. And we will try to implement AI with game tree algorithm. Actually, we had a try on it and achieved some goals. We can use game tree algorithm to train a model which is even better than our model using heuristic and reinforcement learning algorithm. But we cannot let it develop continually after each round. It is a much harder and challenging task. We will try to implement it in the next step.

Contact

Chaofan Yu chaofanyu2015@gmail.com

Fei Zhao feizhao2015@u.northwestern.edu

Yuntuo Wang yuntuowang2015@u.northwestern.edu

Qian Wang qianwang2015@u.northwestern.edu