An AI is being trained to play a complex board game. The AI learns by making moves in the game, and after each game (or series of moves), it receives a signal indicating whether it won (a positive reward) or lost (a negative reward). The AI's goal is to learn a strategy that maximizes its chances of winning over time. Which machine learning approach is primarily being used here?