2 Comments

When you talk about training RL models for goal-directedness, isn't that worrying? Even with best intentions, which you touch in the risks part like "desirable objectives" and "nefarious objectives". My line of thought goes to 2 points: 1) Lawmakers/legislators have a really hard time doing their work because it impacts their sovereignty culture directly and for generations most times. It's a really tricky thing to do right; 2) training RL models with decent capabilities costs a lot of computational power (aka millions of dollars) and having specific "desirable" goals can both become unfeasible to retrain to patch those goals (because we found out they weren't healthy even tho on surface a really good idea, think of food addictives that we found out years later gave cancer), and those "desirable goals" becoming engrained in society and like old legacy systems hard to replace for the better.

Because point 1 is a hard thing to do, and point 2 is a dangerous place to be (and we don't know we are there until we are there), shouldn't lawmakers and AI researchers working together to define guidelines or even laws on what goals you are allowed to train your RL/GAN on?

Expand full comment

Just like the mentioned examples that use the output of a language model as a model of human behavior, recently I've been trying to use a language model as a regularizer for an image autoencoder with discrete hidden representation. The idea is that the hidden code can look like a text sentence and the penalty from this regularization would be the negative log-probability, informed by the language model, of the produced code.

Expand full comment