SWE-bench

A coding benchmark that tests AI's ability to fix real-world software bugs from open source repositories. Used to compare the practical coding ability of different AI models.