Reinforcement Learning

Introduction In my previous post I made a neural network from scratch with the idea of using it for a tic-tac-toe agent. In this post I’ll go over the training process. My general intuition is to play each agent with all other agents twice, once for “o” and once for “x”. A reward function would decide what score an agent gets each game. The agents with top score progress to the next epoch, where they are cloned and mutated to fill the dropped out population. ...