LLM Coding Leaderboard

Last update: July 9th, 2026

Results are based on 5 different PHP/Laravel/React projects from my YouTube channel over the last weeks.

Sort by

Direction

Then by

Direction

#	Model	Total points (max 25)	Avg time per prompt	Avg cost per prompt	Tested with	Points per project (max 5)
#	Model	Total points (max 25)	Avg time per prompt	Avg cost per prompt	Tested with	Filament	API	Fluent	React	CSV Import
1	Opus 4.8 (Medium)	24.5	02:00	$0.74	Claude Code	5	5	5	5	4.5
2	GPT-5.5 (Medium)	24.5	04:00	$1.22	Codex CLI	5	5	5	5	4.5
3	GPT-5.4 (Medium)	21.5	03:56	$0.47	Codex CLI	5	5	3	5	3.5
4	Grok 4.5	21.2	01:53	$0.28	OpenCode	5	5	3	5	3.2
5	GPT-5.4-Mini (Medium)	20.7	03:58	$0.18	Codex CLI	5	3	5	5	2.7
6	Kimi K2.7 Code	20	05:36	$0.37	OpenCode	3	4	5	5	3
7	Composer 2.5	19.2	01:42	$0.17	Cursor	3	5	5	4	2.2
8	Gemini-3.5-Flash (High)	19	04:33	$1.13	OpenCode	5	5	3	5	1
9	Gemini-3.1-Pro	18.9	02:01	$0.35	OpenCode	3	5	4	5	1.9
10	Minimax M3	18.5	08:11	$0.31	OpenCode	3	4	3	5	3.5
11	GLM-5.2	17.7	05:28	$0.27	OpenCode	5	5	1	4	2.7
12	Sonnet 5 (Medium)	16.5	02:01	$0.72	Claude Code	0	4	3	5	4.5
13	Sonnet 4.6 (Medium)	16.4	02:41	$0.49	Claude Code	3	4	3	5	1.4
14	MiMo 2.5 Pro	15.4	03:53	$0.16	OpenCode	3	2	4	4	2.4
15	Kimi K2.6	15.1	04:22	$0.18	OpenCode	4	3	3	4	1.1
16	Deepseek-V4-Flash (High)	13.9	01:54	$0.01	OpenCode	1	4	2	4	2.9
17	Qwen 3.7 Max	11.7	06:38	$0.40	OpenCode	0	4	2	3	2.7
18	Deepseek-V4-Pro (High)	11.2	03:35	$0.09	OpenCode	0	4	3	2	2.2
19	Tencent Hy3 (High)	10.4	01:54	$0.00	OpenCode	0	3	2	4	1.4
20	Qwen 3.7 Plus	8.3	03:41	$0.06	OpenCode	2	4	1	0	1.3

Support Leaderboard Updates

Running dozens of prompts on multiple projects with different LLMs is costly.
So, if you want to support my mission and help keep the leaderboard updated with new models/variants, subscribe to Premium membership of AI Coding Daily.

Explanations

Prices are calculated with API costs
GPT/Opus were tested only on Medium effort: it was enough to get top spots, High effort wasn't needed

Updates

July 9th: added Grok 4.5
July 8th: added Tencent Hy3
July 1st: added Sonnet 5
June 24th: added GPT-5.4-Mini and Gemini-3.5-Flash
June 23rd: added 5th benchmark project, removed Opus 4.7 and GLM-5.1, added GPT-5.4
June 17th: added GLM-5.2
June 13th: added Kimi-K2.7-Code
June 5th: added Deepseek v4 Flash and removed Minimax M2.7
June 4th: added Qwen 3.7 Plus
June 2nd: added Qwen 3.7 Max and "Tested with" column
June 2nd: added Minimax M3
May 30th: added Avg Cost for all LLMs (based on API pricing)
May 29th: added Claude Opus 4.8
May 24th: added 4th benchmark project - for React/TypeScript
May 20th: added Composer 2.5

Explanation / Methodology

Each prompt was launched 5 times on the same project, making it total of 25 points max.

For all evaluation tests passing, LLM got 1 point for the task. If at least one test failed, LLM got 0 points for that task.

So this above is the summary table.

I will continue testing models constantly - will come up with new tasks for evaluation, and will update when new LLMs are released.

LLM Coding Leaderboard

Support Leaderboard Updates

Explanations

Updates

Explanation / Methodology

Get Weekly AI Coding News