My suspect against [Right to Move = 100] became true, see how badly [40/60] scales to [40/15]. A 50% increase in ply should bring at least 10-15 elo even with only 2000 games and it gave a regression. So yesterday I stalled the parameter testing and will try lower values later which BTW also gave good results at [40/15] 25=50.8% | 50=51.4% | 75=51.4% respectively, so there is still good hope. But then (as predicted), there goes my 2 weeks planning.
What to do (test) next? And my curiosity about the simpleness of LMR (edition 3) got the upperhand and I decided to try LMR (edition 4) doing 4 reductions. But then only running on pure cores and [40/60] immediately, lower time controls at those low depths hardly make sense.
And then I made a mistake which I only noticed later, I had forgotten to restore the Right to Move parameter back to zero and so the 2 testruns were running with the bad parameter value of 100 instead of 0. But surprisingly so far results are still good (54%), it's coming more than a full ply deeper and I decided to give it a chance and let it run for the moment although it doesn't feel good.
I stopped both LMR (edition 4) matches. Reducing 4 plies is literally one bridge too far, for now. 3 is already a big step forward. Instead I will now focus to find the optimal value of the [Right to Move] parameter, there something to gain. See the changed test schedule above.
[Right to Move = 50] testing finished. Remarkable positive result. It shows me again you can not be careful enough messing around with uncllear and hard to define evaluation ingredients such as the value of a tempo. Obviously in the past I used the wrong values, from 100, back to 0, even 125, then in version 1.86 back to 0, to arrive at 100 again in 1.87. Now that I have reasonable hardware I don't have to wild guess any longer, so it seems, for the moment.
Next, combining LMR (edition 3) with [Right to Move = 50] and see what's happens. In progress now.
Results of test round-3, see test schedule above look fine except for 40/60 with the highest scaling which is worrying. OTOH it's only 2000 games with an error bar of 13 elo points so there is still hope. For that reason we now include the 2 (more or less proven) positional improvements (king safery and the double isolated pawn change) and start round-4. And the 40/60 [2000 games] run should really give a good jump else this effort for a new version is on the brink to fail.
There was a short power failure which caused one of my PC's to reboot. So unfortunately only 1918 games of the 2000 were played and I leave it that way, restarting a match with cutechess-cli is probelematic in the way I use the program (without the concurrency option). But..... I am very happy with the result, 55.0% (35 elo) see test schedule above. The 3 other runs are looking good as well. I will label this version as BETA-1. eventhough one match in this round is still running.
It's another reminder that sometimes the result of 2000 games can be very misleading, see round-3. It's what I noticed 2 years ago when I faced (and underwent) another attack on my programmer genes to improve that old beast and museum piece of the 80's and 90's. It goes like this, you play 2000 40/60 games ussing 4 cores which takes 30 hours to complete and you get a (say) 51.5% score (+10 elo) and you play the same match again you can get 49.5%, thus a regression! Happened to me several times and is predicted by the error-bar (margin) that comes with 2000 games. So, so now and then these things happen, unbalanced randomness finding the edges (+ or -) of the of the error-bar (margin).
Anyway, a 35 elo improvement in just 5-6 weeks in not bad at all, in the 80's and 90's that sometimes took a full year. I want 50 elo for a version worthy the name REBEL 13 so 15 elo to go. Next round is testing LMP usually good for a 10% speed-up tested already on [40/15] 12,0000 games scoring 50.8%. We will see how it scales.
LMP testing finished and it scales badly. All the way from 50.8% -> 50.6% -> 50.2% to even 49.5%. It would be risky to count it as an improvement because the overall score is somewhat positive. I have made scaling a dominant point for this version because I noticed from statistics made of rating lists that ProDeo doesn't scale well, meaning that its performance drops the longer the time control. Whatever the reason for that (and I don't think any programmer can fully grasp what the reasons are for this phenomenon) I think it makes sense one can try to improve by only accepting changes that scale well. It's an experiment. And a time consuming one.
It's best for now I put LMP in the freezer and have a look at the code later, it should breed some 5-10 elo.
Not much left on the menu to test that possibly could bring the desired 15 elo for a version release, I must go back to the drawing table hunting for new candidate improvements. In the meantime I am testing now the recapture extension, limiting its maximum from 2 to 1, heck I might even try to do without them.
Less recapture extensions isn't an improvement also, doing no recapture extensions at all is a big regression and so I am stuck for the moment. I will take a moment of reflection, either find some new changes or release the thing as ProDeo 1.9 and enjoy life again.
Consulted my notes from the past with suggested (small) improvements (ideas) and picked a number of them to try. Most of them were hardly measurable with the hardware of the past and were stamped as unclear, thus not used. The below list of changes will only be tested at [40/15] 12.000 bullet games. If there is a sign of improvement it will be included in the scaling testing later.