ok but how are you sure that the AI is correctly turning the spec into tests. if it makes a mistake there and then builds the code in accordance with the mistaken test you only get the Illusion of a correct implementation
You use the specs to generate the tests, and you review the changes.
If they have built a thing they can't maintain that isn't bully for them and we should feel sorry that their best efforts aren't working well enough - it's proof that that thing (or at least the way they built that thing) isn't feasible.
Edit: Personally I think betting on war is immoral and should be condemned by all sane people, but saying that everything needs 100% safety and 100% no harm is very naive.
Prior to this, it was already known to produce false feedback and confessions. The US military has a strange way of repeating history to see if it'll turn out differently "this time." It sadly never does.
I don't think that follows at all, I'm currently working through Blue Prince and the way in which that game gives you information later on which completely recontextualizes things you have encountered earlier makes it so that a screenshot of something could definitely rob you of experiencing that moment, which is a big part of the joy of that game.