AI Model Benchmarks | PandaiTech

AI Model Benchmarks

Compare model performance across LMArena and GDPval occupational benchmarks

Tier Legend:
S Best (90%+)
A Excellent (75%+)
B Good (50%+)
C Average (25%+)
D Weak
Claude Opus 4.5
Claude
Finance & Business Healthcare Instruction Following Expert Tasks Project & Operations Management Sales & Customer Service Creative Writing Engineering & Manufacturing Mathematics & Logic Media & Communications Real Estate Coding & Programming Legal & Compliance Social Services
#1 At
LMArena
Expert, Longer Query
GDPval
Compliance Officers, Financial Sales, Healthcare Managers, Industrial Engineers, Investigators, Office Supervisors, Operations Managers, Order Clerks, Producers, Rental Clerks, Sales Supervisors, Software Developers
View full details →
Gemini 3 Pro
Gemini
Mathematics & Logic Instruction Following Expert Tasks Engineering & Manufacturing Creative Writing Coding & Programming Social Services Finance & Business Sales & Customer Service Real Estate Project & Operations Management Healthcare Legal & Compliance
#1 At
LMArena
Creative Writing, Hard Prompts, Overall
GDPval
Financial Advisors, Lawyers, Medical Admin, Wholesale Sales
View full details →
GPT-5.2
ChatGPT
Legal & Compliance Sales & Customer Service Social Services Finance & Business Real Estate Engineering & Manufacturing Media & Communications Healthcare Project & Operations Management Coding & Programming Mathematics & Logic
#1 At
GDPval
AV Technicians, Accountants, Admin Managers, Concierges, Customer Service, Editors, Financial Analysts, Financial Managers, IT Managers, Inventory Clerks, Investigators, Journalists, Lawyers, Mechanical Engineers, Nurse Practitioners, Nurses, Office Supervisors, Overall, Police Supervisors, Project Managers, Property Managers, Purchasing Agents, Real Estate Agents, Real Estate Brokers, Recreation Workers, Retail Supervisors, Sales Managers, Sales Supervisors, Social Workers, Technical Sales, Video Editors, Wholesale Sales
View full details →
Claude Sonnet 4.5
Claude
Media & Communications Project & Operations Management Real Estate Creative Writing Expert Tasks Coding & Programming Instruction Following Social Services Engineering & Manufacturing Sales & Customer Service
#1 At
GDPval
Pharmacists, Production Supervisors, Wholesale Sales
View full details →
Gemini 3 Flash
Gemini
Mathematics & Logic Creative Writing Instruction Following Expert Tasks Coding & Programming
View full details →
Claude Opus 4.5 Thinking
Claude
Instruction Following Expert Tasks Mathematics & Logic Coding & Programming Creative Writing
#1 At
LMArena
Coding, Instruction Following
View full details →
Claude Sonnet 4.5 Thinking
Claude
Instruction Following Expert Tasks Coding & Programming Mathematics & Logic Creative Writing
View full details →
Claude Opus 4.1
Claude
Legal & Compliance Finance & Business Coding & Programming Real Estate
View full details →
GPT-5.2 High
ChatGPT
Expert Tasks Mathematics & Logic Coding & Programming
#1 At
LMArena
Math
View full details →
GPT-5 High
ChatGPT
Media & Communications Healthcare Sales & Customer Service
View full details →
Gemini 3 Flash Thinking
Gemini
Mathematics & Logic Instruction Following Creative Writing
View full details →
Claude Opus 4.1 Thinking
Claude
Instruction Following Coding & Programming Creative Writing
View full details →
Grok 4.1 Thinking
Grok
Mathematics & Logic Coding & Programming Expert Tasks
View full details →

LMArena Rankings

Human preference rankings from lmarena.ai

Model Overall Expert Hard Prompts Coding Math Creative Instruction Long Query
14 - 12 17 - 21 14 16
19 13 18 18 37 19 18 19
8 8 11 14 5 11 9 11
18 5 16 12 1 43 15 22
20 17 23 26 14 48 33 50
15 41 36 41 44 13 16 21
17 47 20 34 57 17 21 28
1 3 1 3 2 1 3 3
2 7 5 8 3 2 6 6
7 12 7 10 4 8 11 8
9 14 15 24 11 4 10 10
5 1 3 4 6 3 2 1
12 6 10 7 20 7 7 7
16 16 13 9 17 12 8 9
4 2 2 1 7 5 1 2
10 4 6 2 8 9 4 4
11 10 8 5 12 6 5 5
3 9 4 6 9 10 12 15
6 19 9 13 19 15 17 13

Data sources: LMArena | OpenAI GDPval

Last updated: December 2024