With the increasing playing strength of chess engines the draw rate of matches also increases. Between top engines this often goes up to 80-90%, see for instance the TCEC super final of 2021.
Does this mean progress becomes harder and harder? Certainly. Does this mean progress sooner or later will become hard to measure because the law of Diminishing Returns and the draw rate between engines will reach 95% or worse? In the 90's it was said by some that all games will be a draw when top engines would reach a search depth of 14 plies. We (now) know that is not true, in fact the end is nowhere near in sight as we will demonstrate playing time-odds matches with the current 2 top engines Stockfish 14 and Komodo Dragon 2.5
About 2 years ago we already demonstrated with time-odds matches how strong Stockfish 11 is and how much time factors it needed before its nearest competitors could beat it. We now do it the other way around, we play SF14 vs SF14 matches with time-odds of factor 2, 4, 8 and 16 and measure the elo progress as an indication how much space there is for improvement and how much the draw rate will lower.
Balanced openings
Time odd | Stockfish 14 | Dragon 2.5 |
Equal time | 50.0% | 50.0% |
Factor 2 | 55.4% | 57.4% |
Factor 4 | 59.9% | 64.1% |
Factor 8 | 62.5% | 70.7% |
Factor 16 | 66.8% | 74.4% |
PGN
ORDO calculation
Time odd | Stockfish 14 | Dragon 2.5 |
Equal time | 3565 | 3552 |
Factor 2 | 3602 | 3637 |
Factor 4 | 3635 | 3714 |
Factor 8 | 3654 | 3789 |
Factor 16 | 3687 | 3833 |
____________________________________________________________________________________
Technical
1. The four matches Komodo and Stockfish are played with as base time control 40/40 (one second average).
Match-1 : SF14 (40/40) vs SF14 (40/80) - one-vs-two-seconds
Match-2 : SF14 (40/40) vs SF14 (40/160) - one-vs-four-seconds
Match-3 : SF14 (40/40) vs SF14 (40/320) - one-vs-eight-seconds
Match-4 : SF14 (40/40) vs SF14 (40/640) - one-vs-sixteen-seconds
Likewise for Komodo.
2. To know how strong Stockfish 14 and Komodo Dragon 2.5 are at 40/40 we play 2 time-odds matches at GRL time control (40/120) against 3 of its nearest competitors.
. Stockfish 14 at 40/40 rates : 3565
. Komodo Dragon 2.5 at 40/40 rates : 3552
So now we roughly know at which elo Komodo and Stockfish play at one second average and so we could calculate an imaginary elo (imaginary because the results are based on self-play) for both engines with ORDO as listed above.
____________________________________________________________________________________
Draw rate overview
Draw rate balanced openings
Conclusions
1. Despite the high draw rates of the two current strongest chess engines there is still plenty of room for improvement as engines playing at ~3550 elo can still lose significally.
2. Komodo Dragon scales a lot better than Stockfish.
___________________________________________________________________________________________________
The issue of diminishing returns
Ideally we should also play more matches to measure the effects of the diminishing returns phenomenon and so we also play:
. Factor 2 vs Factor 4, 8 and 16
. Factor 4 vs Factor 8 and 16
. Factor 8 vs Factor 16
And calculate the diminishing returns.
Balanced openings
Time odd | Stockfish 14 | Dragon 2.5 |
2 secs vs 4 secs | 55.1% | 56.2% |
2 secs vs 8 secs | 56.6% | 63.3% |
2 secs vs 16 secs | 61.4% | 66.5% |
4 secs vs 8 secs | 52.2% | 55.1% |
4 secs vs 16 secs | 54.6% | 61.8% |
8 secs vs 16 secs | 53.4% | 53.4% |
Draw rate
Time odd | Stockfish 14 | Dragon 2.5 |
2 secs vs 4 secs | 83.4% | 80% |
2 secs vs 8 secs | 83.2% | 70.6% |
2 secs vs 16 secs | 74.8% | 62% |
4 secs vs 8 secs | 88.4% | 81% |
4 secs vs 16 secs | 86.8% | 73.2% |
8 secs vs 16 secs | 89.2% | 91.9% |
After 10,000 games both engines played we can keep the balance, calculate an imaginary elo (imaginary because the results are based on self-play) and view the Diminishing Returns for each engine
Diminishing Returns for Stockfish 14
ORDO calculation
Engine | Rating | Gain | Games |
Sixteen seconds | 3679 | +27 | 1000 |
Eight seconds | 3652 | +15 | 1750 |
Four seconds | 3637 | +35 | 2250 |
Two seconds | 3602 | +37 | 2250 |
One second | 3565 | 2750 |
Diminishing Returns for Komodo Dragon 2.5
ORDO calculation
Engine | Rating | Gain | Games |
Sixteen seconds | 3732 | +34 | 1000 |
Eight seconds | 3698 | +44 | 1750 |
Four seconds | 3654 | +47 | 2250 |
Two seconds | 3607 | +55 | 2250 |
One second | 3552 | 2750 |
As we can see (most clearly from the Komodo results) the elo gain lowers and lowers after each doubling of time.
While these 20,000 games are played single core (which took more than a week) it is expected the elo gains will lower further and further using multiple threads.
____________________________________________________________________________________________________
What about lower rated engines?
We test 9 other engines in the range of 2600 - 3600 elo (time odds factor 2 only)
and compare the elo gain and draw rate.
Time control : 1 second vs 2 seconds
Engine | GRL elo | Result | Elo Gain | Draw rate |
Stockfish 14 | ~3700 | 55.4% | 38 | 84.1% |
Dragon 2.5 | ~3650 | 57.4% | 52 | 75.8% |
Dragon 2.0 | ~3600 | 59.8% | 68 | 73.4% |
Stockfish 11 | ~3500 | 62.6% | 88 | 65.0% |
Koivisto 6.0 | ~3400 | 61.4% | 80 | 68.1% |
Clover 2.4 | ~3200 | 66.0% | 112 | 58.1% |
Counter 3.8 | ~3000 | 62.8% | 89 | 63.0% |
Wasp 1.02 | ~2900 | 67.1% | 120 | 45.8% |
ProDeo 3.1 | ~2800 | 68.1% | 127 | 45.0% |
Fruit 2.1 | ~2700 | 67.3% | 121 | 39.2% |
Zevra 2.4 | ~2600 | 64.5% | 101 | 47.5% |
It's surprising to see even an 3500 elo rated engine like Stockfish 11 produces such a high elo gain at this time-odds level. Time do double the time control, see table on your right.
Time control : 2 seconds vs 4 seconds
Engine | GRL elo | Result | Elo Gain | Draw rate |
Stockfish 14 | ~3700 | 55.1% | 35 | 83.4% |
Dragon 2.5 | ~3650 | 56.2% | 43 | 80.0% |
Dragon 2.0 | ~3600 | 57.4% | 52 | 74.8% |
Stockfish 11 | ~3500 | 61.6% | 81 | 69.6% |
Koivisto 6.0 | ~3400 | 58.2% | 57 | 75.6% |
Clover 2.4 | ~3200 | 64.0% | 98 | 56.8% |
Counter 3.8 | ~3000 | 64.8% | 103 | 56.8% |
Wasp 1.02 | ~2900 | 65.2% | 106 | 46.4% |
ProDeo 3.1 | ~2800 | 63.2% | 92 | 52.8% |
Fruit 2.1 | ~2700 | 66.4% | 114 | 38.4% |
Zevra 2.4 | ~2600 | 63.6% | 95 | 44.0% |
Still high elo gains. We do the same for :
. 4 secs vs 8 secs (faster than CCRL 40/2 and CEGT 40/4)
. 8 secs vs 16 secs (close to CCRL 40/15 and CEGT 40/20)
And present the results in a different (final) format.
_____________________________________________________________________________________________________
Presenting the final results
Engine | GRL elo | 1 vs 2 | 2 vs 4 | 4 vs 8 | 8 vs 16 |
Stockfish 14 | ~3700 | +38 | +35 | +15 | +24 |
Dragon 2.5 | ~3650 | +52 | +43 | +35 | +24 |
Dragon 2.0 | ~3600 | +68 | +52 | +45 | +29 |
Stockfish 11 | ~3500 | +88 | +81 | +74 | +54 |
Koivisto 6.0 | ~3400 | +80 | +57 | +53 | +47 |
Clover 2.4 | ~3200 | +112 | +98 | +87 | +64 |
Counter 3.8 | ~3000 | +89 | +103 | +66 | +82 |
Wasp 1.02 | ~2900 | +120 | +106 | +89 | +71 |
ProDeo 3.1 | ~2800 | +127 | +92 | +112 | +88 |
Fruit 2.1 | ~2700 | +121 | +114 | +124 | +112 |
Zevra 2.4 | ~2600 | +101 | +95 | +61 | +85 |
Diminishing ELO returns time odds factor 2
Draw rates time odds factor 2
Engine | GRL elo | 1 vs 2 | 2 vs 4 | 4 vs 8 | 8 vs 16 |
Stockfish 14 | ~3700 | 84.1% | 83.4% | 88.4% | 89.2% |
Dragon 2.5 | ~3650 | 75.8% | 80.0% | 81.0% | 91.9% |
Dragon 2.0 | ~3600 | 73.4% | 74.8% | 77.6% | 82.0% |
Stockfish 11 | ~3500 | 65.0% | 69.6% | 68.4% | 75.6% |
Koivisto 6.0 | ~3400 | 68.1% | 75.6% | 75.2% | 78.4% |
Clover 2.4 | ~3200 | 58.1% | 56.8% | 64.0% | 68.6% |
Counter 3.8 | ~3000 | 63.0% | 56.8% | 64.4% | 66.0% |
Wasp 1.02 | ~2900 | 45.8% | 46.4% | 52.0% | 53.2% |
ProDeo 3.1 | ~2800 | 45.0% | 52.8% | 47.2% | 58.0% |
Fruit 2.1 | ~2700 | 39.2% | 38.4% | 36.4% | 44% |
Zevra 2.4 | ~2600 | 47.5% | 44.0% | 51.2% | 53.2% |
Observations - which is not the same as conclusions :-)
1. The sharp fall in elo gain (green vs red) (8 vs 16 seconds) seems to indicate that for top engines the road to further progress NNUE evaluation becomes more and more important, perhaps even more important than search improvements, although of course they always go hand in hand.
2. There is a clear pattern (with a few exceptions) that after each doubling of the time odds time control the elo gain lowers while the draw rate increases.
3. Stockfish 11 is interesting, it's a HCE engine, while the orange are NNUE, and it seems to profit more from the doubling of the time control.
4. For the lower rated engines counts they profit the most, search seems to be the dominant factor.
________________________________________________________________________________________
One step further
A comparison with the GRL (single core) vs the GRL (20 cores)
and the draw rates
Draw Rate Comparison
Engine | one core | 20 cores |
Stockfish 14 | 39% | 61% |
Komodo-Dragon 2.5 | 48% | 58% |
Komodo-Dragon 2.0 | 43% | 65% |
Ethereal 13.25 | 47% | 63% |
Koivisto 6.16 | 49% | 58% |
SlowChess 2.7 | 51% | 61% |
RubiChess 2.2 | 47% | 61% |
Average Search Depth Comparison
Engine | one core | 20 cores |
Stockfish 14 | 28.59 | 37.12 |
Komodo-Dragon 2.5 | 27.72 | 35.70 |
Komodo-Dragon 2.0 | 25.86 | 32.08 |
Ethereal 13.25 | 25.56 | 32.10 |
Koivisto 6.16 | 27.12 | 30.53 |
SlowChess 2.7 | 21.37 | 24.94 |
RubiChess 2.2 | 29.50 | 35.03 |
Still low draw rates with 20 cores.
Maybe unbalanced but playable positions (like the gambit positions) is the future, at least for the entertaining part.
This study is the work of playing 35,750 games that took
about 12 days in total using 20 cores.
Last update - October 22, 2021