Compare model performance across LMArena and GDPval occupational benchmarks
Human preference rankings from lmarena.ai
| Model | Overall | Expert | Hard Prompts | Coding | Math | Creative | Instruction | Long Query |
|---|---|---|---|---|---|---|---|---|
|
|
14 | - | 12 | 17 | - | 21 | 14 | 16 |
|
|
19 | 13 | 18 | 18 | 37 | 19 | 18 | 19 |
|
|
8 | 8 | 11 | 14 | 5 | 11 | 9 | 11 |
|
|
18 | 5 | 16 | 12 | 1 | 43 | 15 | 22 |
|
|
20 | 17 | 23 | 26 | 14 | 48 | 33 | 50 |
|
|
15 | 41 | 36 | 41 | 44 | 13 | 16 | 21 |
|
|
17 | 47 | 20 | 34 | 57 | 17 | 21 | 28 |
|
|
1 | 3 | 1 | 3 | 2 | 1 | 3 | 3 |
|
|
2 | 7 | 5 | 8 | 3 | 2 | 6 | 6 |
|
|
7 | 12 | 7 | 10 | 4 | 8 | 11 | 8 |
|
|
9 | 14 | 15 | 24 | 11 | 4 | 10 | 10 |
|
|
5 | 1 | 3 | 4 | 6 | 3 | 2 | 1 |
|
|
12 | 6 | 10 | 7 | 20 | 7 | 7 | 7 |
|
|
16 | 16 | 13 | 9 | 17 | 12 | 8 | 9 |
|
|
4 | 2 | 2 | 1 | 7 | 5 | 1 | 2 |
|
|
10 | 4 | 6 | 2 | 8 | 9 | 4 | 4 |
|
|
11 | 10 | 8 | 5 | 12 | 6 | 5 | 5 |
|
|
3 | 9 | 4 | 6 | 9 | 10 | 12 | 15 |
|
|
6 | 19 | 9 | 13 | 19 | 15 | 17 | 13 |
Data sources: LMArena | OpenAI GDPval
Last updated: December 2024