Quick decisions can be made to complex problems using extensive background planning

Andriy Drozdyuk
2 min readOct 5, 2021

--

Decision to a complex problem does not necessarily have to take long, but can be lightning fast.
This is because coming up with a policy (slow) and using the policy (fast process) can be separated by the use of background planning, which uses simulated experience to learn a policy in the background.
When it is time for a system to act, all it has to do is use its policy to select an appropriate action.

Simple greedy policy action selection is a specific example of a heuristic search, where greedy action selection is done using pre-computed action-values or state-values.

A consequence of this line of reasoning is that there must be a part that chooses how much decision time planning to perform. Considering that some situations are deadly to an organism, and there is not much time to plan, it must be able to forgo planning and use a policy that it built-up previously to make a decision.

Urgency of a state

The urgency of a state is the amount of decision time planning that a system is allowed to perform in it.

We can define the urgency ∈[0,1] of a state S_t to be the amount of decision time planning a system is allowed to do in that state, with 1 being most urgent and representing no decision time planning, and 0 representing unlimited amount planning.

We can denote it by U(St), and use a neural network to approximate this value.

It is not clear:

  • What will “urgency” mean? Perhaps a number of seconds a planner is allowed to work for.
  • How to backprop through it?

References:

  1. Reinforcement Learning, An Introduction, 2nd ed, Sutton and Barto

--

--

Andriy Drozdyuk
Andriy Drozdyuk

Written by Andriy Drozdyuk

“Ideas are everywhere but knowledge is rare”

No responses yet