Introduction
SIMEX (SIMilarity EXperiments) is the successor of Don Dailey's famous SIM03 that became extremely important in the chaotic 2010-2015 period when the computer chess community was overfloaded with Rybka 3 clones and derivatives. Not only could SIM03 detect Ippolit, Robolito and friends as Rybka3 derivatives but also Fruit 2.1 derivatives were detected.
Nowadays a large number of strong engines are available on Github and a starting programmer has a rich choice from which engine to start. So far so good. What's not good is the deliberate lack of transparency, making a few changes, releasing it as if is an original work while it is a clone and often an abuse of the GPL. SIM03 was developped to unmask the lack of transparency and the system has proven itself as accurate.
SIMEX works the same way but is more user-friendly, has more features but the main advantage is that you are no longer limited to the build-in 8238 positions of SIM03 but can create your own positions using EPD, in the download 7 EPD sets are provided for demonstration purposes. SIMEX uses MEA from Ferdinand Mosca as a base.
A comparison between SIM03 and SIMEX first to check its reliability. We tested both SIM03 and SIMEX with the 8238 SIM03 positions on 5 time controls, 100ms | 250ms | 500ms | 1000ms and finally 2500ms.
Results SIM03 : 100ms | 250ms | 500ms | 1000ms | 2500ms
Results SIMEX : 100ms | 250ms | 500ms | 1000ms | 2500ms
From the comparion we can see the numbers of SIMEX are somewhat higher than SIM03 and can be explained by the two different approaches. What's real interesting and hardly was discussed in the 2010-2015 period is the increasing simililarity when the time control also is increased and that a program like Ethereal 11.25 slighly crosses the 60% line at 2.5 seconds with 3 other engines.
__________________________________________________________________________________________________
Operation
Quick-guide: after installation to get a first impression double-click the example.bat batchfile, it will run a small EPD of 100 positions only with Stockfish 9 and Stockfish 10 and after about 30 seconds you will see the similary result. The 6 other batch files are for real.
Batch File | Description | URL |
run_simex.bat | simex.epd contains the original 8238 SIM03 positions as already discussed above. | |
run_sts.bat | sts.epd contains the 1500 positions of the famous STS test. As one can see from the link the similarity is alarming but in reality it's a bad choice as the positions are too easy. To test engines for similarity use random positions, more on that below. It also shows (and this is extremely important) that each created set of positions will have its own orange and red markers. | |
run_opening.bat | opening.epd is made from a PGN opening test-set of 8 moves, in total 8527 positions. | |
run_midgame.bat | midgame.epd is extracted from an EPD collection of Dann Corbit, 10.000 positions in total. | |
run_endgame.bat | endgame.epd is extracted from an EPD collection of Dann Corbit, 10.000 positions in total. | |
run_match.bat | match.epd is created from a random cutechess eng-eng match, 10.000 positions in total. | |
We tested in total 19 engines at 100ms |
Simex 2.0
58Mb
Your old SIM03 files can be used with SIMEX. Copy them (for instance "similarity.data") in the simex\data folder, go to the command line and type:
simex2 data\similarity.data >report.txt
It will create the web-page and also stores the result in report.txt, see example a déjà vu all over.
_________________________________________________________________________________________________
Doing it yourself
The parameters in the batch files, example run_simex.bat
set MT=100 | If you want to run simex.epd at 200ms or 1000ms set the MT (movetime) value accordingly. Save the batch file and run it. |
set HASH=64 | Self explaining |
set PROTOCOL=uci | UCI engines only for the moment. Winboard engines require an extra programming effort. |
set EPD=epd\simex.epd | Decide what epd-set to use and the database the results should be stored. |
set EXE=engines\Andscacs_0.93.exe | Define the engine to run. NAME is the name that will be used in the overviews. |
Folder usuage
engines - all executables |
data - all data files with the obliged *.data extension. |
epd - all suitable MEA EPD's. |
html - all created web pages |
log - all created log files |
epd_out - from each run an EPD is created with the bm (best move) ce (score) and acd (depth) tags. |
_________________________________________________________________________________________________
Creating datasets yourself
MEA wasn't created for simex but for other purposes such as OKE or creating opening books and for that reason requires a special EPD tag. SOMU 1.5a will do that job for you, see the [F9] and [F10] options marked with "new" on that page.
In a nutshell:
[F9] - from a PGN create a suitable simex EPD.
[f10] - converts an EPD that contains the "bm" tag for the use in SIMEX.
_________________________________________________________________________________________________
Differences between SIM03 and SIMEX
1. SIM03 sends the whole game history to the engine while SIMEX uses EPD. This might cause differences.
2. The time control is fundamental different. SIM03 is in control, it sends a stop command to the engine when time is up. MEA leaves it to the engine programmer and how he has programmed the fixed move time. Unfortunately not every engine has programmed this accurately.
An extreme example is Rybka1. With SIM03 it uses 17 minutes to finish the 8238 position at 100ms (already 3½ minutes too much!) but with SIMEX it notable takes 1 hour and 2 minutes to finish. One can check the end of the log file to check the sanity of the time an engine has used. For Rybka1 we got:
Time allocation : BAD!! spending more time
ActualTime > ExpectedTime + MarginTime
ExpectedTime : 823.8s
ActualTime : 3669.8s
However Rybka1 is a big exception, the engines we tested stay in reasonable margins but reason (2) explains why the SIMEX similarity percentages are somewhat higher than with SIM03.
_________________________________________________________________________________________________
Other features
1. SIMEX parameters to manipulate the data for better results see the README file.
2. Add comments to HTML reports. Store them into legend.txt, example.
3. Make a dendrogram from *.data files for *.png visualization, example. During the creation of an HTML SIMEX also creates an Excel file called dendrogram.csv which can be used by the dendrogram tool of Ferdinand Mosca. Just double click dendrogram.bat in case you want such a picture.
Syntax: dendrogram --input dendrogram.csv --output sim.png
4. Chris Whittington created EPD sets of each 10,000 positions. Each set contains a specific piece distribution. In total 100 sets representing the most common board positions in use. See the list. Example with SIMEX.
_________________________________________________________________________________________________
100 EPD sets
12.8 Mb
CREDITS
Ferdinand Mosca for MEA and Dendrogram
Chris Whittington for 100 extra EPD's
Tord Romstad, Marco Costalba, Joona Kiiski for Stockfish
Daniel José Queraltó for Andscacs
Mohammed Li for Asmfish
Thomas Zipproth for Brainfish
Giancarlo Delli Colli for Equinox
Andrew Grant for Ethereal
Sam Hamilton and Edsel Apostol for Hannibal
Jeffrey An and Michael An for Laser
Andreas Matthies for Rubichess
Dennis Sceviour for Schooner