Agent-environment interface

Reinforcement learning can be seen as a special case of the interaction problem, in terms of achieving a goal. The entity that must reach the goal is called an agent. The entity with which the agent must interact is called the environment, which corresponds to everything that is external to the agent.

So far, we are more focused on the term agent, but what does it represent? The agent (software) is a software entity that performs services on behalf of another program, usually automatically and invisibly. These pieces of software are also called smart agents.

What follows is a list of the most important features of an agent:

  • It can choose between a continuous and a discrete set for an action on the environment.
  • The action depends on the situation. The situation is summarized in the system state.
  • The agent continuously monitors the environment (input) and continuously changes the status
  • The choice of the action is not trivial and requires a certain degree of intelligence.
  • The agent has a smart memory.

The agent has a goal-directed behavior, but acts in an uncertain environment that is not known a priori or only partially known. An agent learns by interacting with the environment. Planning can be developed while learning about the environment through the measurements made by the agent itself. This strategy is close to trial-and-error theory.

Trial and error is a fundamental method of problem solving. It is characterized by repeated, varied attempts that are continued until success, or until the agent stops trying.

The agent-environment interaction is continuous: the agent chooses an action to be taken, and in response, the environment changes state, presenting a new situation to be faced.

In the particular case of reinforcement learning, the environment provides the agent with a reward. It is essential that the source of the reward is the environment to avoid the formation, within the agent, of a personal reinforcement mechanism that would compromise learning.

The value of the reward is proportional to the influence that the action has in reaching the objective, so it is positive or high in the case of a correct action, or negative or low for an incorrect action.

In the following list are some examples of real life in which there is an interaction between agent and environment to solve a problem:

  • A chess player, for each move, has information on the configurations of pieces that can be created, and on the possible countermoves of the opponent.
  • A little giraffe, in just a few hours, learns to get up and run.
  • A truly autonomous robot learns to move around a room to get out of it. For example: Roomba Robot Vacuum.
  • The parameters of a refinery (oil pressure, flow, and so on) are set in real time, so as to obtain the maximum yield or maximum quality. For example, if particularly dense oil arrives, then the flow rate to the plant is modified to allow an adequate refining.

All the examples that we examined have the following characteristics in common:

  • Interaction with the environment
  • A specific goal that the agent wants to get
  • Uncertainty or partial knowledge of the environment

From the analysis of these examples, it is possible to make the following observations:

  • The agent learns from its own experience.
  • The actions change the status (the situation), the possibilities of choice in the future change (delayed reward).
  • The effect of an action cannot be completely predicted.
  • The agent has a global assessment of its behavior.
  • It must exploit this information to improve its choices. Choices improve with experience.
  • Problems can have a finite or infinite time horizon.

Essentially, the agent receives sensations from the environment through its sensors. Depending on its feelings, the agent decides what actions to take in the environment. Based on the immediate result of its actions, the agent can be rewarded.

If you want to use an automatic learning method, you need to give a formal description of the environment. It is not important to know exactly how the environment is made; what is interesting is to make general assumptions about the properties that the environment has. In reinforcement learning, it is usually assumed that the environment can be described by a MDP.