Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Stop thinking of models as a 'normal' human with a single identity. Think of it instead as thousands, maybe tens of thousands of human identities mashed up in a machine monster. Depending on how you talk to it you generally get the good models as they try to train the bad modes out, problem is there are a nearly uncountable means to talking to the model to find modes we consider negative. It's one of the biggest problems in AI safety.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: