The agent function Jupyter Notebook simulates a single agent moving on an 8x8 chessboard with a changing temperature field:
The agent can survive temperatures between -4 and +4, temperatures outside this range lead to the immediate death of the agent. In each step, the agent can perform one out of the following five possible actions (implemented as method calls to an object of the class Surroundings):
The changing temperature field is caused by moving heat sources which have an average velocity of about one half square per time step, theoretically giving the agent the opportunity to escape extreme temperatures in most cases. However, for decision making within the agent function, the agent only has access to the following percepts:
In each time step, these percepts are automatically collected by the agent method sensor_input() and stored in the following properties of the agent object:
The method live() simulates the life of the agent and returns the number of time steps after which the death has occurred.
Decision making is done in the agent function, i.e., the method agent_function of the class Agent. The present solution for this from the notebook makes some use of the sensory percepts, but presumably not in an optimal way:
if abs(self._percept_x0_higher[1]) < abs(self._local_temperature): self._environment.action_increment_x0() elif abs(self._percept_x0_lower[1]) < abs(self._local_temperature): self._environment.action_decrement_x0() elif abs(self._percept_x1_higher[1]) < abs(self._local_temperature): self._environment.action_increment_x1() elif abs(self._percept_x1_lower[1]) < abs(self._local_temperature): self._environment.action_decrement_x1() else: self._environment.action_wait()
Above, self._percept_x0_higher[1] and the other similar quantities contain the temperature two squares away (one square away would be self._percept_x0_higher[0], etc.).
The task is to improve this agent function such that, on average, the agent will survive this scenario for a longer time.
You may (but need not) extend the data stored by the agent, e.g., to remember previous percepts, which is not done at present, or to evaluate the percepts in an intelligent way. The scenario itself (Surroundings, temperature tolerance, etc.) may not be changed, and collection of the percepts need not be changed since this is already done by the sensor_input() method. The agent and the surroundings may not interact in any other way than by means of the five percepts and the five possible actions mentioned above.
Submission deadline: 4th December 2021; discussion planned for 17th December 2021. Group work by up to four people is welcome.