“Intrinsic rewards are the self-supervised pretraining stage for reinforcement learning agents.”
One of the most critical challenges in training AI systems today is finding scalable supervision signals. As models grow in complexity and their applications extend into increasingly nuanced domains, providing supervision to lead them towards performing well on a task becomes more difficult.