书名：Python Reinforcement Learning
作者名：Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
本章字数：144字
更新时间：2024-12-21 01:46:36

Markov Decision Process

MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.

MDP is represented by five important elements:

A set of states the agent can actually be in.
A set of actions that can be performed by an agent, for moving from one state to another.
A transition probability (), which is the probability of moving from one state to another state by performing some action .
A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state by performing some action .
A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.