Your Reward function matters or Good intentions can lead to terrible outcomes

Andriy Drozdyuk
3 min readOct 5, 2021

Specification gaming is when rewards go wrong. When the reward you specify results in the behavior different from the one we desired. As in this example of an RL Agent that was trained on Atari boat racing game but it prioritized getting points instead of finishing the race.

Starting out with good intentions does not mean that what you will accomplish will be good. Most bad outcomes do start with good intentions.

One example is arresting drug dealers leads to worse outcomes as this reduces the competition, giving more market share to the remaining gangs and favouring those gangs who are more violent.

As another example, laws were passed to allow schools only to accept students from the neighbourhood it was located in. The previously successful all-black school was now forced to only accept students from the ghetto it was located in. Unruly students overwhelmed the school, many teachers left or retired. School became bad.

Reward for harmful outcome must be lower than non-harmful

Sometimes the harmful outcome is not given a smaller reward than the alternative. Particularly when the do-nothing action reward is zero, harmful outcome reward is kept at zero as well.
This leads to actors trying harmful behaviors because they pay no cost for it⁵, and exploration forces them to do so.

For example, when researchers don’t get less reward for publishing wrong research they will do so in abundance, simply because it is easier than publishing correct research, most of which is highly competitive.

Concentrated rewards and spread out involuntary costs may be an indication of a problem with reward system

When we observe a system with many actors, in which very few benefit at the involuntary cost of very many, it may be an indication of a bad reward scheme.
This is different from a business, for example, like Amazon where Jeff Bezos benefits because the spread out cost is voluntary⁶.

Changing the environment reward function is an instrumental convergent goal

Changing the reward function such that it benefits it more is an instrumental convergent goal of any intelligent system. For example, given a reward function R(s,a)=A if we can modify it to be R(s,a)=B where B>A we earn more reward without modifying our own behavior.

An example of this might be the criminal laws that are created by companies or people who have an incentive to create them.

Use sparse rewards

If you want someone to get stuff done that aligns with your goals specify really sparse rewards that correspond to end goal. For example, instead of praising a chess player for a good move, reserve all praise until a game is won.

Don’t plan on someone else’s behalf

Planning without having access to the policy is ineffective. If one attempts to plan for another agent, without having access to the agent’s current policy π, the plan obtained will be useless for guiding the agent’s behavior.

This is because any plans you would have made may never apply to the situation in which the agent may find itself. For example, if you plan on some person’s behalf to spend their $22,000 on the repair of their dilapidated apartment, but their policy π would instead choose to rent a new apartment, then any further plans you make are visiting such states which the agent would never visit.
This is the reason why planning for other people is very ineffective.

If we incentivize people to take away wealth they will do only that

One should aim to help those who are worse become better than to bring down those that are better off. If you want to improve the world, bring up the quality of life of those who are worse off, rather than bringing down the quality of life of those who are better off.⁹ One way of doing that is by devising an incentive where we reward those people helping those in need.

References:

  1. https://www.nationalreview.com/2016/10/dunbar-high-schools-integration-ruined-exceptional-black-school/
  2. Discrimination and Disparities, Thomas Sowell, Rubin Report (Podcast), Apr 17, 2018 https://pca.st/d7wx75iv
  3. How Drug Gangs Actually Work | How Crime Works, Insider, July 31, 2021 https://youtu.be/y_TV4GuXFoA
  4. Is Most Published Research Wrong? Aug 11, 2016 Veritasium, https://youtu.be/42QuXLucH3Q
  5. Intellectuals and Society, 2009, Thomas Sowell
  6. Why government is the problem, Milton Friedman 1993
  7. Everyone’s a criminal, John Stossel, Feb 4, 2020, https://youtu.be/2zQrSNlLzQM
  8. Milton Friedman — Tyranny of the Status Quo — Part 3 — Politicians w/ David Brooks, May 27, 2012, https://youtu.be/ML9Pcq_mJRU
  9. Thomas Sowell on Noam Chomsky, Cornell West and Other Intellectuals Dec 12, 2020 (Deleted)

--

--