Introduction

In my last post, I used a minimax algorithm to create a tic-tac-toe agent that sees a few steps ahead and can’t be beat. While this approach works for simple games like this one, it wouldn’t scale well for more complex games. For this reason, I wanted to implement an agent that can use a neural network to find patterns in the game through machine learning. If I can make this for tic-tac-toe, the general approach should be usable for other games.

I started following Michael Nielsen’s Neural Networks and Deep Learning online book, as it doesn’t use any libraries and it goes over building NNs and learning algorithms from scratch. However, the book’s example project is building a digit classifier, where the learning algorithm, gradient descent, relies on large quantities of pre-labeled data. I couldn’t think of a way to fit it into my use case, and I had an intuition about using some sort of an evolution approach, so I went with that.

I also wanted to keep a clear separation of the elements in the network within my code, so I didn’t use any vectors for expressing layers or linear algebra operations. Essentially, I wanted the most basic neural network, so I can really get to know the fundamentals well. I can easily optimize it later.

My neural network

Going back to the tic-taco-toe repo from my previous post, I decided to add my agent there, as I already have everything set up for the game, plus I could potentially play the trained neural network agent against the minimax one.

The neuron

As the most fundamental element of the network, the neuron is responsible for taking a number of inputs and producing an output. This is done by assigning a weight to each input and using a bias which acts as a threshold of how much weight has to accumulate for the neuron to fire.

There are different types of neurons, in this project I’ll be using sigmoid neurons. The code looks like this:

import math
import random


class SigmoidNeuron:
    bias: float
    weights: tuple[float, ...]

    def __init__(self, input_size: int) -> None:
        self.weights = tuple(0. for _ in range(input_size))
        self.bias = 0.

    def calculate_output(self, inputs: list[float]) -> float:
        if len(inputs) != len(self.weights):
            raise ValueError("Inputs of incorrect size")
        wx_sum = sum(i*j for i,j in zip(self.weights, inputs))
        return 1/(1+math.exp(-wx_sum - self.bias))
    
    def mutate(self, max_magnitude: float) -> None:
        if max_magnitude < 0.:
            raise ValueError("Magnitude should be a positive number.")
        self.bias = (random.random() - 0.5) * 2 * max_magnitude + self.bias
        self.weights = tuple(
	        (random.random() - 0.5) * 2 * max_magnitude + w
	        for w in self.weights
	    )

    def copy(self) -> "SigmoidNeuron":
        new_neuron = SigmoidNeuron(len(self.weights))
        new_neuron.bias = self.bias
        new_neuron.weights = self.weights
        return new_neuron

Each method has the following purpose:

calculate_output receives a list of inputs (presumably the outputs of the neurons in the previous layer), and then calculates the resulting output based on the sigmoid function.
mutate applies a random mutation to the neuron’s parameters.
copy makes a new copy of the neuron. This will be used later when best performing agents are copied and mutated for the next training epoch.

Neuron tests

Testing every piece of the network thoroughly is very important. During earlier versions of this code, I had implemented the sigmoid function wrong, where instead of $\sum_i{x_i w_i}$ I actually calculated $\sum_i{x_i + w_i}$ and the agents could never get past making 2 legal turns in a game. I knew the correct formula, but it was something basic that I was supposed to do quick and easy. Only after I wrote the test I saw my bug.

Anyway, a few handwritten scenarios should be enough to ensure the calculation function works as expected:

@pytest.mark.parametrize(
	"input_x, weights, bias, expected_result",
	[
		(
			[1.],
			(1.,),
			1.,
			0.8807970779778823
		),
		(
			[1., 0., 0.],
			(1., 10., 10.),
			1.,
			0.8807970779778823
		),
		(
			[1., 2., -1.5],
			(1., 1.5, -2.),
			-10.,
			0.04742587317756678
		)
	]
)
def test_output_is_calculated_correctly(
	input_x: list[float],
	weights: tuple[float, ...],
	bias: float,
	expected_result: float
) -> None:
	neuron = SigmoidNeuron(len(input_x))
	neuron.bias = bias
	neuron.weights = weights
	result = neuron.calculate_output(input_x)
	assert result == expected_result

By testing if a copied neuron mutates the original, we can assert that all the rest of the neuron functions correctly:

def test_mutating_a_copied_neuron_does_not_mutate_original() -> None:
	original_neuron = SigmoidNeuron(10)
	original_neuron.bias = 5
	copied_neuron = original_neuron.copy()
	copied_neuron.mutate(max_magnitude=10.)
	assert id(original_neuron) != id(copied_neuron)
	assert original_neuron.bias != copied_neuron.bias
	assert original_neuron.weights != copied_neuron.weights

A neuron layer

To represent a single layer within the neural network, I create a simple neuron aggregator class.

class Layer:
    neurons: tuple[SigmoidNeuron, ...]

    def __init__(self, init_neuron_count: int, input_size: int) -> None:
        self._input_size = input_size
        self.neurons = tuple(
            SigmoidNeuron(input_size)
            for _ in range(init_neuron_count)
        )

    def calculate_neuron_outputs(self, inputs: list[float]) -> list[float]:
        return [n.calculate_output(inputs) for n in self.neurons]
    
    def mutate_layer(self, max_magnitude: float) -> None:
        for neuron in self.neurons:
            neuron.mutate(max_magnitude)

    def copy(self) -> "Layer":
        new_layer = Layer(len(self.neurons), self._input_size)
        new_layer.neurons = tuple(n.copy() for n in self.neurons)
        return new_layer

The tests are very similar to the individual neuron ones, just on a larger scale:

def test_copied_layer_has_new_neuron_instances() -> None:
	neuron_count = 10
	neuron_layer = Layer(neuron_count, 10)
	copied_layer = neuron_layer.copy()
	assert id(neuron_layer) != id(copied_layer)
	for i in range(neuron_count):
		assert id(neuron_layer.neurons[i]) != id(copied_layer.neurons[i])

def test_copied_layer_does_not_mutate_original() -> None:
	neuron_count = 10
	neuron_layer = Layer(neuron_count, 10)
	copied_layer = neuron_layer.copy()
	copied_layer.mutate_layer(10.)
	for i in range(neuron_count):
		original_neuron = neuron_layer.neurons[i] 
		copied_neuron = copied_layer.neurons[i]
		assert original_neuron.weights != copied_neuron.weights
		assert original_neuron.bias != copied_neuron.bias

def test_layer_calculates_output_correctly() -> None:
	neuron_1 = SigmoidNeuron(3)
	neuron_1.weights = (1., 1.5, -2.)
	neuron_1.bias = -10.
	neuron_2 = SigmoidNeuron(3)
	neuron_2.weights = (2., 2., 3.)
	neuron_2.bias = .12

	neuron_layer = Layer(0, 3)
	neuron_layer.neurons = (neuron_1, neuron_2)

	input_x = [1., 2., -1.5]
	expected_result = [
		0.04742587317756678,
		0.8347951298093854
	]

	result = neuron_layer.calculate_neuron_outputs(input_x)
	assert result == expected_result

The neural network

The neural network puts all the layers and neurons into a single unit:

class NeuralNetwork:
    layers: tuple[Layer, ...]

    def __init__(self, layer_sizes: list[int], input_size: int) -> None:
        self._input_size = input_size
        if len(layer_sizes) < 1:
            raise ValueError("The neural network can't have less than a single layer.")
        layers: list[Layer] = []
        for i, layer_size in enumerate(layer_sizes):
            layer_input_size = self._input_size if i == 0 else layer_sizes[i - 1]
            layers.append(Layer(layer_size, layer_input_size))
        self.layers = tuple(layers)

    def mutate_network(self, max_magnitude: float) -> None:
        for layer in self.layers:
            layer.mutate_layer(max_magnitude)
    
    def calculate_output(self, input_layer: list[float]) -> list[float]:
        if len(input_layer) != self._input_size:
            raise ValueError("Bad input size")
        output = self.layers[0].calculate_neuron_outputs(input_layer)
        for layer in self.layers[1:]:
            output = layer.calculate_neuron_outputs(output)
        return output
    
    def copy(self) -> "NeuralNetwork":
        nn = NeuralNetwork([0], self._input_size)
        nn.layers = tuple(l.copy() for l in self.layers)
        return nn

Testing the neural network

To ensure that with given weights and biases the network as a whole produces the right output, neurons performing a certain logic gate function can be set up. In this example, I use XOR, as it requires at least 2 layers.

Knowing that each neuron applies the following activation function: $$ \frac{1}{1+\exp(-\sum_jw_jx_j-b)} $$

Thinking of the neurons being connect as in the graph below:

graph LR

  %% Inputs
  A[x1] --> n1((n1))
  A[x1] --> n2((n2))
  B[x2] --> n1((n1))
  B[x2] --> n2((n2))

  %% Hidden layer
  n1((n1)) --> O((n3))
  n2((n2)) --> O((n3))

  %% Labels
  subgraph Inputs
    A
    B
  end

  subgraph Hidden Layer
    n1
    n2
  end

  subgraph Output Layer
    O
  end

We can configure the weights and biases for each neuron to perform some logic:

graph LR

  %% Inputs
  A[x1] -- 8 --> n1((n1<br>b=-4))
  A[x1] -- 7.5 --> n2((n2<br>b=-10))
  B[x2] -- 8 --> n1
  B[x2] -- 7.5 --> n2

  %% Hidden layer
  n1 -- 8 --> O((n3<br>b=-4))
  n2 -- -8 --> O

  %% Labels
  subgraph Inputs
    A
    B
  end

  subgraph Hidden Layer
    n1
    n2
  end

  subgraph Output Layer
    O
  end

In code it looks like this:

def test_xor_calculation() -> None:
	neural_network = NeuralNetwork(
		layer_sizes=[2,1],
		input_size=2
	)

	neuron_1_or = neural_network.layers[0].neurons[0]
	neuron_1_or.weights = (8., 8.)
	neuron_1_or.bias = -4.

	neuron_2_and = neural_network.layers[0].neurons[1]
	neuron_2_and.weights = (7.5, 7.5)
	neuron_2_and.bias = -10.

	neuron_3_exclusive_n1 = neural_network.layers[1].neurons[0]
	neuron_3_exclusive_n1.weights = (8., -8.)
	neuron_3_exclusive_n1.bias = -4.

	inputs = [
		[0., 0.], [0., 1.], [1., 0.], [1., 1.]
	]
	expected_results = [0., 1., 1., 0.]
	results = [neural_network.calculate_output(i)[0] for i in inputs]
	for expected, real in zip(expected_results, results):
		assert abs(expected - real) < .1

There are also some other tests in the repo, but they’re laborious and won’t serve much purpose in this post.

Adapting the network for a tic-tac-toe agent

Now that the neural network is ready and working, we need a new agent that can use it to predict the best move in a position. I’ll use the same Agent base class I used in the last post for the minimax agent. There will also be 2 new key methods for decoding and encoding the neural network input.

class NeuralNetworkAgent(Agent):
    nn: NeuralNetwork

    def __init__(
	    self,
	    name: str = "unnamed",
	    neural_network: NeuralNetwork | None = None
	) -> None:
        self.name = name
        if neural_network is None:
            self.nn = NeuralNetwork(layer_sizes=[15, 9], input_size=27)
        else:
            self.nn = neural_network

    @staticmethod
    def encode_board_to_input(tttb: TicTacToeBoard) -> list[float]:
        charmap = {
            "_": (1.,0.,0.),
            "x": (0.,1.,0.),
            "o": (0.,0.,1.),
        }
        board = tttb.get_board_copy()
        result: list[float] = []
        for row in board:
            for cell in row:
                result += [i for i in charmap[cell]]
        return result

    @staticmethod
    def decode_nn_output(nn_output: list[float]) -> tuple[int, int]:
        max_val = 0.
        mv_index = 0
        for i, v in enumerate(nn_output):
            if v > max_val:
                max_val = v
                mv_index = i
        row = mv_index // 3
        col = mv_index % 3
        return row, col

    def get_best_move(self, tttb: TicTacToeBoard) -> tuple[int, int]:
        nn_input = self.encode_board_to_input(tttb)
        nn_output = self.nn.calculate_output(nn_input)
        return self.decode_nn_output(nn_output)

    def mutate(self, max_magnitude: float) -> None:
        self.nn.mutate_network(max_magnitude)

Neural network modeling

When initializing the agent, the following lines define the neural network that will be created and used by the agent:

self.nn = NeuralNetwork(layer_sizes=[15, 9], input_size=27)

The input size is 27 because each square in tic-tac-toe has 3 possible states: “empy”, “o” or “x”. Each possible state represents an input to the network.

There’s a hidden layer of size 15. I have no idea what it does.

The last layer is the output layer and it’s of size 9, representing each square on the tic-tac-toe board. When picking a square to play, the one with highest score is selected.

Training the network

This is a topic I’ll cover in my next post. As a teaser, so far I’ve made a simple evolution script where agents play against each other, league style, the winners progressing and getting mutated in the next epoch:

I was able to create a 100 agent league, where 9900 games are played each epoch.
- The rewarding function is based on moves made. If a player wins, they get all their score, in a draw, both players get the score.
After 10 epochs, there was a game where 2 agents were able to chain 7 valid moves.
Not every game was a DQ because of an invalid move, there were actually 52 valid games!

Introduction#

My neural network#

The neuron#

Neuron tests#

A neuron layer#

The neural network#

Testing the neural network#

Adapting the network for a tic-tac-toe agent#

Neural network modeling#

Training the network#