We can use these challenges and benchmark our system against Aider for the same models and see if we can improve: https://github.com/Aider-AI/polyglot-benchmark