Thanks for that, sounds like you really know what you are talking about.
Probably for now the best solution would be to make this form of cheating a bit hard, and then living with it, and having provisions on the ladder server to erase all scores based on a cheating genome if it is detected.
That will get you a long way, certainly. Obviously I'm only referring to a perfect technical solution being impossible; social solutions are very feasible. Getting crap past an automated detector is easy, however most of it will also be readily visible to human eyes.
And given that you are willing to live with that, Python is otherwise a pretty good language, very easy to pick up, etc. I love this in general, I'm explaining this so you don't end up burning time in a tarpit and instead use it to build something that will actually work. :)
After some thought, I agree with your assessment. You could set up a server to make people mostly play nice, but in the current stage of development, a gentleman's agreement is probably the right choice.
If you sandboxed the Python process running agent code to limit the mischief to that particular process, that would go a pretty long way, and that sort of thing is certainly possible by restricting syscalls at the OS level.
Probably for now the best solution would be to make this form of cheating a bit hard, and then living with it, and having provisions on the ladder server to erase all scores based on a cheating genome if it is detected.