Time odds Matches | Home of the Dutch Rebel

With the increasing playing strength of chess engines the draw rate of matches also increases. Between top engines this often goes up to 80-90%, see for instance the TCEC super final of 2021.

Does this mean progress becomes harder and harder? Certainly. Does this mean progress sooner or later will become hard to measure because the law of Diminishing Returns and the draw rate between engines will reach 95% or worse? In the 90's it was said by some that all games will be a draw when top engines would reach a search depth of 14 plies. We (now) know that is not true, in fact the end is nowhere near in sight as we will demonstrate playing time-odds matches with the current 2 top engines Stockfish 14 and Komodo Dragon 2.5

About 2 years ago we already demonstrated with time-odds matches how strong Stockfish 11 is and how much time factors it needed before its nearest competitors could beat it. We now do it the other way around, we play SF14 vs SF14 matches with time-odds of factor 2, 4, 8 and 16 and measure the elo progress as an indication how much space there is for improvement and how much the draw rate will lower.

Balanced openings

Time odd	Stockfish 14	Dragon 2.5
Equal time	50.0%	50.0%
Factor 2	55.4%	57.4%
Factor 4	59.9%	64.1%
Factor 8	62.5%	70.7%
Factor 16	66.8%	74.4%

PGN

ORDO calculation

Time odd	Stockfish 14	Dragon 2.5
Equal time	3565	3552
Factor 2	3602	3637
Factor 4	3635	3714
Factor 8	3654	3789
Factor 16	3687	3833

Match Details

____________________________________________________________________________________

Technical

1. The four matches Komodo and Stockfish are played with as base time control 40/40 (one second average).

Match-1 : SF14 (40/40) vs SF14 (40/80) - one-vs-two-seconds
Match-2 : SF14 (40/40) vs SF14 (40/160) - one-vs-four-seconds
Match-3 : SF14 (40/40) vs SF14 (40/320) - one-vs-eight-seconds
Match-4 : SF14 (40/40) vs SF14 (40/640) - one-vs-sixteen-seconds

Likewise for Komodo.

2. To know how strong Stockfish 14 and Komodo Dragon 2.5 are at 40/40 we play 2 time-odds matches at GRL time control (40/120) against 3 of its nearest competitors.

. Stockfish 14 at 40/40 rates : 3565

. Komodo Dragon 2.5 at 40/40 rates : 3552

So now we roughly know at which elo Komodo and Stockfish play at one second average and so we could calculate an imaginary elo (imaginary because the results are based on self-play) for both engines with ORDO as listed above.

____________________________________________________________________________________

Draw rate overview

Draw rate balanced openings

Time odd	Stockfish 14	Dragon 2.5
Equal time	83.9%	79.7%
Factor 2	84.1%	75.8%
Factor 4	77.4%	67.4%
Factor 8	72.2%	55.8%
Factor 16	65.6%	48.0%

Conclusions

1. Despite the high draw rates of the two current strongest chess engines there is still plenty of room for improvement as engines playing at ~3550 elo can still lose significally.

2. Komodo Dragon scales a lot better than Stockfish.

___________________________________________________________________________________________________

The issue of diminishing returns

Ideally we should also play more matches to measure the effects of the diminishing returns phenomenon and so we also play:

. Factor 2 vs Factor 4, 8 and 16

. Factor 4 vs Factor 8 and 16

. Factor 8 vs Factor 16

And calculate the diminishing returns.

Balanced openings

Time odd	Stockfish 14	Dragon 2.5
2 secs vs 4 secs	55.1%	56.2%
2 secs vs 8 secs	56.6%	63.3%
2 secs vs 16 secs	61.4%	66.5%
4 secs vs 8 secs	52.2%	55.1%
4 secs vs 16 secs	54.6%	61.8%
8 secs vs 16 secs	53.4%	53.4%

Draw rate

Time odd	Stockfish 14	Dragon 2.5
2 secs vs 4 secs	83.4%	80%
2 secs vs 8 secs	83.2%	70.6%
2 secs vs 16 secs	74.8%	62%
4 secs vs 8 secs	88.4%	81%
4 secs vs 16 secs	86.8%	73.2%
8 secs vs 16 secs	89.2%	91.9%

After 10,000 games both engines played we can keep the balance, calculate an imaginary elo (imaginary because the results are based on self-play) and view the Diminishing Returns for each engine

Diminishing Returns for Stockfish 14

ORDO calculation

Engine	Rating	Gain	Games
Sixteen seconds	3679	+27	1000
Eight seconds	3652	+15	1750
Four seconds	3637	+35	2250
Two seconds	3602	+37	2250
One second	3565		2750

Diminishing Returns for Komodo Dragon 2.5

ORDO calculation

Engine	Rating	Gain	Games
Sixteen seconds	3732	+34	1000
Eight seconds	3698	+44	1750
Four seconds	3654	+47	2250
Two seconds	3607	+55	2250
One second	3552		2750

As we can see (most clearly from the Komodo results) the elo gain lowers and lowers after each doubling of time.

While these 20,000 games are played single core (which took more than a week) it is expected the elo gains will lower further and further using multiple threads.

PGN

20,000 games

____________________________________________________________________________________________________

What about lower rated engines?

We test 9 other engines in the range of 2600 - 3600 elo (time odds factor 2 only)

and compare the elo gain and draw rate.

Time control : 1 second vs 2 seconds

Engine	GRL elo	Result	Elo Gain	Draw rate
Stockfish 14	~3700	55.4%	38	84.1%
Dragon 2.5	~3650	57.4%	52	75.8%
Dragon 2.0	~3600	59.8%	68	73.4%
Stockfish 11	~3500	62.6%	88	65.0%
Koivisto 6.0	~3400	61.4%	80	68.1%
Clover 2.4	~3200	66.0%	112	58.1%
Counter 3.8	~3000	62.8%	89	63.0%
Wasp 1.02	~2900	67.1%	120	45.8%
ProDeo 3.1	~2800	68.1%	127	45.0%
Fruit 2.1	~2700	67.3%	121	39.2%
Zevra 2.4	~2600	64.5%	101	47.5%

It's surprising to see even an 3500 elo rated engine like Stockfish 11 produces such a high elo gain at this time-odds level. Time do double the time control, see table on your right.

Time control : 2 seconds vs 4 seconds

Engine	GRL elo	Result	Elo Gain	Draw rate
Stockfish 14	~3700	55.1%	35	83.4%
Dragon 2.5	~3650	56.2%	43	80.0%
Dragon 2.0	~3600	57.4%	52	74.8%
Stockfish 11	~3500	61.6%	81	69.6%
Koivisto 6.0	~3400	58.2%	57	75.6%
Clover 2.4	~3200	64.0%	98	56.8%
Counter 3.8	~3000	64.8%	103	56.8%
Wasp 1.02	~2900	65.2%	106	46.4%
ProDeo 3.1	~2800	63.2%	92	52.8%
Fruit 2.1	~2700	66.4%	114	38.4%
Zevra 2.4	~2600	63.6%	95	44.0%

Still high elo gains. We do the same for :

. 4 secs vs 8 secs (faster than CCRL 40/2 and CEGT 40/4)

. 8 secs vs 16 secs (close to CCRL 40/15 and CEGT 40/20)

And present the results in a different (final) format.

_____________________________________________________________________________________________________

Presenting the final results

Engine	GRL elo	1 vs 2	2 vs 4	4 vs 8	8 vs 16
Stockfish 14	~3700	+38	+35	+15	+24
Dragon 2.5	~3650	+52	+43	+35	+24
Dragon 2.0	~3600	+68	+52	+45	+29
Stockfish 11	~3500	+88	+81	+74	+54
Koivisto 6.0	~3400	+80	+57	+53	+47
Clover 2.4	~3200	+112	+98	+87	+64
Counter 3.8	~3000	+89	+103	+66	+82
Wasp 1.02	~2900	+120	+106	+89	+71
ProDeo 3.1	~2800	+127	+92	+112	+88
Fruit 2.1	~2700	+121	+114	+124	+112
Zevra 2.4	~2600	+101	+95	+61	+85

Diminishing ELO returns time odds factor 2

Draw rates time odds factor 2

Engine	GRL elo	1 vs 2	2 vs 4	4 vs 8	8 vs 16
Stockfish 14	~3700	84.1%	83.4%	88.4%	89.2%
Dragon 2.5	~3650	75.8%	80.0%	81.0%	91.9%
Dragon 2.0	~3600	73.4%	74.8%	77.6%	82.0%
Stockfish 11	~3500	65.0%	69.6%	68.4%	75.6%
Koivisto 6.0	~3400	68.1%	75.6%	75.2%	78.4%
Clover 2.4	~3200	58.1%	56.8%	64.0%	68.6%
Counter 3.8	~3000	63.0%	56.8%	64.4%	66.0%
Wasp 1.02	~2900	45.8%	46.4%	52.0%	53.2%
ProDeo 3.1	~2800	45.0%	52.8%	47.2%	58.0%
Fruit 2.1	~2700	39.2%	38.4%	36.4%	44%
Zevra 2.4	~2600	47.5%	44.0%	51.2%	53.2%

Observations - which is not the same as conclusions :-)

1. The sharp fall in elo gain (green vs red) (8 vs 16 seconds) seems to indicate that for top engines the road to further progress NNUE evaluation becomes more and more important, perhaps even more important than search improvements, although of course they always go hand in hand.

2. There is a clear pattern (with a few exceptions) that after each doubling of the time odds time control the elo gain lowers while the draw rate increases.

3. Stockfish 11 is interesting, it's a HCE engine, while the orange are NNUE, and it seems to profit more from the doubling of the time control.

4. For the lower rated engines counts they profit the most, search seems to be the dominant factor.

________________________________________________________________________________________

One step further

A comparison with the GRL (single core) vs the GRL (20 cores)

and the draw rates

Draw Rate Comparison

Engine	one core	20 cores
Stockfish 14	39%	61%
Komodo-Dragon 2.5	48%	58%
Komodo-Dragon 2.0	43%	65%
Ethereal 13.25	47%	63%
Koivisto 6.16	49%	58%
SlowChess 2.7	51%	61%
RubiChess 2.2	47%	61%

Average Search Depth Comparison

Engine	one core	20 cores
Stockfish 14	28.59	37.12
Komodo-Dragon 2.5	27.72	35.70
Komodo-Dragon 2.0	25.86	32.08
Ethereal 13.25	25.56	32.10
Koivisto 6.16	27.12	30.53
SlowChess 2.7	21.37	24.94
RubiChess 2.2	29.50	35.03

Still low draw rates with 20 cores.

Maybe unbalanced but playable positions (like the gambit positions) is the future, at least for the entertaining part.

This study is the work of playing 35,750 games that took

about 12 days in total using 20 cores.

Last update - October 22, 2021