Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m skeptical this would work in production better than RLHF, if the agent makes a mistake, how is it supposed to know to correct itself and understand what it did wrong to prevent it? It seems better to try again recursively until it finds the solution like a human




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: