Back to Articles
PREMIUM
Tutorials

LLM Coding Leaderboard: May 15th (11 Models Tested)

May 15, 2026
1 min read

I've compiled all my recent tests with 11 different LLMs into one summary table. The results are based on 3 different Laravel projects that I published on my YouTube channel over last weeks.

Each prompt was launched 5 times on the same project, making it total of 15 points max.

For all evaluation tests passing, LLM got 1 point for the task. If at least one test failed, LLM got 0 points for that task.

So this is the summary table.

Premium members see full table with these LLMs tested, in alphabetical order:

Deepseek-V4-Pro / GLM-5.1 / Kimi K2.6 / MiMo 2.5 Pro / Minimax M2.7 / Qwen 3.6 Plus / Sonnet 4.6

I also tested Grok 4.3 but it performed so badly that I decided to NOT include it in this leaderboard.

I will continue testing models constantly - will come up with new tasks for evaluation, and will update when new LLMs are released.

This is Premium Content

Subscribe to unlock this article and get access to all premium content.

Povilas Korop

Get Weekly AI Coding News

You'll also get TWO free tutorials:
"My Favorite 10 Tips & Tricks" on Claude Code and Codex CLI!

Sent every Wednesday. No spam, ever. Unsubscribe anytime.