Your Reward function matters or Good intentions can lead to terrible outcomesSpecification gaming is when rewards go wrong. When the reward you specify results in the behavior different from the one we desired. As in…Oct 5, 2021Oct 5, 2021
Superstition in Reinforcement Learning systemsWe bias a learner by restricting it to choosing a predictor from a given hypothesis class H, chosen before the learner has seen the data…Oct 5, 2021Oct 5, 2021
Quick decisions can be made to complex problems using extensive background planningDecision to a complex problem does not necessarily have to take long, but can be lightning fast. This is because coming up with a policy…Oct 5, 2021Oct 5, 2021
Data Augmentation with the help of a model during PlanningThis is an idea I had about utilizing a model for data augmentation (can be applied in model based RL).Oct 5, 2021Oct 5, 2021
Principle of Maximum Likelihood or how to find Neural network weightsWhat we’re really after is the distribution of weights W. Now, even a single w will do, so trivially the distribution can be P(w)=1 for a…Oct 5, 2021Oct 5, 2021
How to refactor large, HUGE, complex codebasesThis is a note to any software developer who is stuck refactoring 100k+ LOC C or Java project, where changing anything at all will break…Apr 3, 2021Apr 3, 2021
Why, unfortunately, Medium Partner Program will not succeedIn this article we will attempt to examine from a system theory point of view, whether an innovative revenue model in use by Medium.com…Mar 16, 20181Mar 16, 20181
Why Medium’s “clap” feature is a good thingClaps replace recommendations in Medium. Initially, what seems to be a bad idea, turns out to be good, but only if you actually know these…Aug 21, 2017Aug 21, 2017
Proposal to pronounce “www” as “triple-u”Currently we pronounce the “www” in most urls on the internet as “double-u double-u double-u”. This is uncessarily long and complicated. My…Aug 19, 2017Aug 19, 2017