simex

Introduction


SIMEX (SIMilarity EXperiments) is the successor of Don Dailey's famous SIM03 that became extremely important in the chaotic 2010-2015 period when the computer chess community was overfloaded with Rybka 3 clones and derivatives. Not only could SIM03 detect Ippolit, Robolito and friends as Rybka3 derivatives but also Fruit 2.1 derivatives were detected.


Nowadays a large number of strong engines are available on Github and a starting programmer has a rich choice from which engine to start. So far so good. What's not good is the deliberate lack of transparency, making a few changes, releasing it as if is an original work while it is a clone and often an abuse of the GPL. SIM03 was developped to unmask the lack of transparency and the system has proven itself as accurate.


SIMEX works the same way but is more user-friendly, has more features but the main advantage is that you are no longer limited to the build-in 8238 positions of SIM03 but can create your own positions using EPD, in the download 7 EPD sets are provided for demonstration purposes. SIMEX uses MEA from Ferdinand Mosca as a base.


A comparison between SIM03 and SIMEX first to check its reliability. We tested both SIM03 and SIMEX with the 8238 SIM03 positions on 5 time controls, 100ms | 250ms | 500ms | 1000ms and finally 2500ms.


Results SIM03 : 100ms | 250ms | 500ms | 1000ms | 2500ms

Results SIMEX : 100ms | 250ms | 500ms | 1000ms | 2500ms


From the comparion we can see the numbers of SIMEX are somewhat higher than SIM03 and can be explained by the two different approaches. What's real interesting and hardly was discussed in the 2010-2015 period is the increasing simililarity when the time control also is increased and that a program like Ethereal 11.25 slighly crosses the 60% line at 2.5 seconds with 3 other engines.


__________________________________________________________________________________________________


Operation


Quick-guide: after installation to get a first impression double-click the example.bat batchfile, it will run a small EPD of 100 positions only with Stockfish 9 and Stockfish 10 and after about 30 seconds you will see the similary result. The 6 other batch files are for real.

Batch File

Description

URL

run_simex.bat

simex.epd contains the original 8238 SIM03 positions as already discussed above.

run_sts.bat

sts.epd contains the 1500 positions of the famous STS test. As one can see from the link the similarity is alarming but in reality it's a bad choice as the positions are too easy. To test engines for similarity use random positions, more on that below.


It also shows (and this is extremely important) that each created set of positions will have its own orange and red markers.

run_opening.bat

opening.epd is made from a PGN opening test-set of 8 moves, in total 8527 positions.

run_midgame.bat

midgame.epd is extracted from an EPD collection of Dann Corbit, 10.000 positions in total.

run_endgame.bat

endgame.epd is extracted from an EPD collection of Dann Corbit, 10.000 positions in total.

run_match.bat

match.epd is created from a random cutechess eng-eng match, 10.000 positions in total.

We tested in total 19 engines at 100ms

Simex 2.0

58Mb

Your old SIM03 files can be used with SIMEX. Copy them (for instance "similarity.data") in the simex\data folder, go to the command line and type:


simex2 data\similarity.data >report.txt


It will create the web-page and also stores the result in report.txt, see example a déjà vu all over.

_________________________________________________________________________________________________



Doing it yourself

The parameters in the batch files, example run_simex.bat

set MT=100

If you want to run simex.epd at 200ms or 1000ms set the MT (movetime) value accordingly. Save the batch file and run it.

set HASH=64
set THREADS=1

Self explaining

set PROTOCOL=uci

UCI engines only for the moment. Winboard engines require an extra programming effort.

set EPD=epd\simex.epd
set DATABASE=data\simex.data

Decide what epd-set to use and the database the results should be stored.

set EXE=engines\Andscacs_0.93.exe
set NAME=Andscacs_0.93

Define the engine to run. NAME is the name that will be used in the overviews.

Folder usuage

engines - all executables

data - all data files with the obliged *.data extension.

epd - all suitable MEA EPD's.

html - all created web pages

log - all created log files

epd_out - from each run an EPD is created with the bm (best move) ce (score) and acd (depth) tags.

_________________________________________________________________________________________________



Creating datasets yourself


MEA wasn't created for simex but for other purposes such as OKE or creating opening books and for that reason requires a special EPD tag. SOMU 1.5a will do that job for you, see the [F9] and [F10] options marked with "new" on that page.


In a nutshell:

[F9] - from a PGN create a suitable simex EPD.

[f10] - converts an EPD that contains the "bm" tag for the use in SIMEX.


_________________________________________________________________________________________________


Differences between SIM03 and SIMEX


1. SIM03 sends the whole game history to the engine while SIMEX uses EPD. This might cause differences.


2. The time control is fundamental different. SIM03 is in control, it sends a stop command to the engine when time is up. MEA leaves it to the engine programmer and how he has programmed the fixed move time. Unfortunately not every engine has programmed this accurately.


An extreme example is Rybka1. With SIM03 it uses 17 minutes to finish the 8238 position at 100ms (already 3½ minutes too much!) but with SIMEX it notable takes 1 hour and 2 minutes to finish. One can check the end of the log file to check the sanity of the time an engine has used. For Rybka1 we got:


Time allocation : BAD!! spending more time
ActualTime > ExpectedTime + MarginTime
ExpectedTime : 823.8s
ActualTime : 3669.8s


However Rybka1 is a big exception, the engines we tested stay in reasonable margins but reason (2) explains why the SIMEX similarity percentages are somewhat higher than with SIM03.


_________________________________________________________________________________________________


Other features


1. SIMEX parameters to manipulate the data for better results see the README file.


2. Add comments to HTML reports. Store them into legend.txt, example.


3. Make a dendrogram from *.data files for *.png visualization, example. During the creation of an HTML SIMEX also creates an Excel file called dendrogram.csv which can be used by the dendrogram tool of Ferdinand Mosca. Just double click dendrogram.bat in case you want such a picture.

Syntax: dendrogram --input dendrogram.csv --output sim.png


4. Chris Whittington created EPD sets of each 10,000 positions. Each set contains a specific piece distribution. In total 100 sets representing the most common board positions in use. See the list. Example with SIMEX.


_________________________________________________________________________________________________

100 EPD sets

12.8 Mb

CREDITS


Ferdinand Mosca for MEA and Dendrogram

Chris Whittington for 100 extra EPD's

Tord Romstad, Marco Costalba, Joona Kiiski for Stockfish

Daniel José Queraltó for Andscacs

Mohammed Li for Asmfish

Thomas Zipproth for Brainfish

Giancarlo Delli Colli for Equinox

Andrew Grant for Ethereal

Sam Hamilton and Edsel Apostol for Hannibal

Jeffrey An and Michael An for Laser

Andreas Matthies for Rubichess

Dennis Sceviour for Schooner