We've been wrestling with a specific problem: when you use AI agents to build real software, how do you make sure the output is actually verified, not just syntactically correct, but traced back to requirements and independently tested?
Most AI coding workflows are just "write this, now test this." The agent that writes the code also writes the tests for the code it just wrote. That's not testing, that's confirmation bias in a loop.
Curious whether others have tried to enforce test independence structurally in agentic workflows, and whether the Skills format (vs. system prompts or tool definitions) is something people are actually using or think is the right abstraction for this.