This paper went under the radar - Large Language Monkeys. In the paper they scored 56% on SWE-bench Lite using DeepSeek-Coder-V2 and 300 samples which is much higher than previous SOTA (43%).