Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Bulgaria - Summer of Chess
#11
Thanks for the information on the ban – I wonder what will happen at the end of the four months if the committee finds it has insufficient information to justly ban him. If titled players en masse are refusing to play him, sounds like finding him guilty will be the easier option; I think they could do without this influence on the verdict!

Ivanov’s results are surprising, but I’m not sure what your analysis adds to them because if any player is going to somehow score five very sharp tactical victories against GM players, I doubt it could be done without hitting that sort of match (90% plus) with the top three engine moves. I think we are still left with our impressions and explanations. My impression was that the way Ivanov won these games suggests he should hardly ever lose to players below 2000. Hence I think he was ‘probably’ cheating, but I’m not certain.

You mention courts, which I think is the right way to think. But courts have a process and a standard of evidence, both of which are lacking here. No standard has been set. There is no direct evidence whatsoever. No accuser has taken responsibility for making the accusation. The accusation itself hasn’t even been specified. Was he using the tournament transmission as originally thought, or transmitting his own picture of the board (as Lilov now says he believes)? Is it that the top engine move was transmitted to him, or the top three moves? I ask because people are trying to decide guilt on the basis of presentations like Lilov’s (a slight digression admittedly, as the ‘Kerr presentation’ is better) in which the ‘top three’ is typically being used for ‘illustration’, on the basis that we don’t know how much time the computer had, so a ‘top three’ move might have later popped up as the best. Once or twice when I stopped to look at what was actually on the screen I found quite a bit of leeway was being applied in saying things like ‘this is the computer move’ when it wasn’t the top, or it wasn't in the first 3 at that point - possibly cutting corners trying to keep the thing short.

OK I know this is not your analysis - and thanks for making the effort to actually count what is being discussed. In your analysis you found a ‘top three’ match of 91.2% which rose to 96.4% if you excluded games 2 and 8. As I say, on it’s own I don’t think this adds that much to the spectacle of five duffed-up GMs. I appreciate that you then dug a bit deeper, looking at the moves not in the top 3, of which you say

“I found that the moves he made were often in fact the top choice for Houdini when viewed at a different ply from my original analysis, in all cases, within just a few ply.”

But did you also re-check your stated percentages when calculated at these different ply levels? If you just checked the moves that were not in the top three to see if they were close being included without also checking the ones that were originally included (to see if they remained in the top three on the altered basis) you would be biasing your numbers - you’re bound to make the stats look better that way.

I’d just like to comment on your other subjective explanations. Carlsen was a misleading example that might mislead people. Strike that from the record, your honour! =) . Regarding game 2, excluded for time pressure issues – what time pressure issues? I thought that was a suggestion of Valeri Lilov’s in his video, but he seemed to be speculating without any information on the matter. In any case, if you are ‘improving’ the data by removing 36% of it, I’d say it needs a lot more justification. You also explain that in that game GM Jovanovic ‘was on to him’ and so played quietly - but then why aren’t all of the other GM’s ‘on to him’ and playing safely, rather than losing sharp tactical games?

Lilov said the game 2 endgame blunder on move 115 was probably a glitch, so there are two competing theories. This is one subjective explanation I agree with. It does look like a glitch! I doubt even a 2200 player would have missed that Nf4 won the d-pawn. He then exchanged into the obviously lost pawn ending, resigning almost immediately – quite consistent with someone blindly following the engine (choosing e.g. a meaningless minus four over minus five) and then realizing he was lost. Trouble is, if you are playing the evidence-counting game you still have to count objectively, even if it weakens your case.

It would help for someone to estimate a figure for how often a 2600 player matches Houdini in the game timescales, especially in tactical games. Even one example might shed some light.

What standard to apply? ‘Balance of probabilities’ means people could be banned on fifty-fifty hunches – not appropriate. I also think it’s important that any drastic action only follows a due process. Trouble is, this could be onerous! But as I say, I think the committee had less drastic options, I doubt he could keep cheating for long under such scrutiny, especially if they had varied the transmission.

Cheers
Reply
#12
I can see you're never going to be convinced until he has been caught red-handed, Walter.
The match-up rate for the 7 rounds (excluding rounds 8 and 2) was 95.8% of Houdini's 1st choice moves.

Round 1 game against FM Mario Schachinger - 20 moves were analysed, 17 of which were Houdini's 1st choice move at 25 ply. Move 21 was Houdini's 2nd choice at that depth, just 0.01 of a difference to the top move, which overtook the move played at 23 ply. Move 31 had the same evaluation as the 'best' (in the sense of highest engine evaluation) move. Move 22 was Houdini's 3rd choice move at 25 ply, but becomes the highest evaluation at 27 ply.
I think it would be easy for GM Jovanovic to see there was something going on.

Round 3 against GM Bojan Kurajica had 26 analysed moves, all bar 1 were Houdini's 1st choice at 25 ply, that one move was move 24 which was Houdini's 2nd choice at that depth, just 0.02 difference to the top move. It was actually the top move at 24 ply though...

A similar pattern occurs for the rest of the tournament too, just the odd move here and there not being the 1st choice move.
Nobody doubts that it's possible for a player even of my mediocre strength to have a high match up rate in a single game from time to time, sometimes the natural looking move is actually the best move. But to do it with this accuracy over the course of a whole tournament is, well, impossible without the help of a computer.

I've been involved in cheat detection on chess.com for a year or two now, and have studied the games of many players suspected of cheating. Ivanov's performance at Zadar is one of the most blatant i have seen.

There have been quite a few other cases where similar accusations have been made in OTB tournaments, but i haven't involved myself with those. These things are quite time consuming, i'd be happier to let someone else check out those others.

I do believe there is simple solution though, a delay of say, 10 or 15 minutes on any webcast would be sufficient enough to prohibit an accomplice of being able to access the game 'live' while not taking away the enjoyment from the user watching on their computer. I personally enjoy watching the games on here from the weekend tournaments around Scotland, and even had the privilege of playing on a live board myself once, being 10 minutes behind the actual play would make no difference.

To use another legal analogy, we have numerous victims, and we have one perpetrator with his victims' blood on his hands. We just haven't found the weapon yet.
Reply
#13
;| Hi again, Graham.

“I can see you're never going to be convinced until he has been caught red-handed, Walter.”

I don’t see how you can be more than 80% sure of that, Graham ;P

“The match-up rate for the 7 rounds (excluding rounds 8 and 2) was 95.8% of Houdini's 1st choice moves.”

Wasn’t that by adding other levels of ply? My point/query - I could be wrong, but I don’t think you answered it - was that if you vary the ply to bring in some (e.g.) fourth choices that weren’t counted, you might also lose some of the first three that were counted. Going by your post the number of undisputed 1st matches was 75.5%, rising to 84.8% if you remove rounds 2 and 8. If I’ve understood you correctly, you only got to 95.8% by moving the goalposts – I mean, varying the ply :-)

Actually I find 75% of first matches far more impressive than 90 plus percent of top three matches. Why mess around with the top three moves at all? It may be muddying the picture.

You say the cheating is blatant, but I suggest it’s the grounds for suspicion that are obvious; it's not so obvious that standards of proof have been met in fact it's clear they haven't; it's more like his accusers are trying to dispense with the standards. Blatant or not,if it were a court case it wouldn't get past the jannie.

I agree with your ‘simple solution’ of a delay in the broadcast, or not broadcasting Ivanov’s game as a temporary measure. The ban looks like making things worse. Rather than cut him off from the game (which also cuts off any new evidence) they should just have cut him off from the power supply :-)

“To use another legal analogy, we have numerous victims, and we have one perpetrator with his victims' blood on his hands. We just haven't found the weapon yet.”

I think the legal analogy is the murder without a body – sorry to nitpick, but you’ve not proved there are any ‘victims’ yet! There is ‘blood’ but chess is like that … By the way would you have thrown away the key on Sally Clark? Why that murder was so blatant…

Graham, thanks for the exchange and your analysis. I’ll leave it there while you find new ways to describe my naivety!

Cheers
Walter
Reply
#14
Never before has anyone scored as closely to a Chess engine in a whole tournament who wasn't subsequently found to be cheating. Ivanov's match to Houdini prior to the tournament (and indeed throughout his playing career), was nothing close to what it was at this tournament - so there is a spike in the engine move match % graph.

Statistically speaking, you cannot match an engine that closely without the use of an engine. The odds of doing so are so incredibly small (considerably more so - even - than DNA evidence being wrong), that it's completely impractical to think it possible. In court, the only evidence that is required to place someone at a scene of a crime is the detection of their DNA there. If that evidence is declared to certainly exist, then the presence of the individual at the crime scene is considered a fact. This is similar, except with an even more compelling set of statistics and an even more conclusive tie to an actual "crime" (i.e. the presence of DNA doesn't mean the person committed the crime, it just means they were at the crime scene, but in this case the presence of the evidence is in itself the evidence of the "crime" being committed). In other words, the move percentages of the computer being found in Ivanov's game is like finding the computer's DNA at the scene of the crime (the board), and its very presence is conclusive evidence that it was involved in the "crime" (his moves).

So I disagree entirely that this wouldn't stand up in court. When faced with the statistics, it's more clearly proven beyond the remotest reasonable doubt than any other crime in history.
Reply
#15
OK Walter, let's agree to disagree.
I must explain my moving of the goalposts first though...
I only arrived at 25 ply because that's where i felt a decent modern laptop could operate under the time control. He (or rather his accomplice) could have set to a different ply, with a much stronger machine. Or they could have set to a particular time per move. Until he does a Lance Armstrong, who knows exactly how he did it.

BTW, who is Sally Clark?
Reply
#16
Sally Clark was convicted of the murder of her two sons on the basis of some extremey poor statistics. A doctor claimed that the chance of her two sons dying of SIDS was in excess of 73 million to 1. Very simplistic application of available statistics seemed convincing to jury. At her successful appeal, after 3 years in jail, it emerged that the evidence of the death by natural causes of one of her sons had been witheld.
Reply
#17
Thanks Graham, It wasn’t the choice of 25 ply I was questioning, it was afterwards varying the ply level to include some fourth or fifth choices that I thought was questionable. Sorry, Sally Clark was brought up on the other thread on this topic.

Hi Andrew. My point was, that as it stands it wouldn’t get to court, because a court would have certain requirements of the accusations, requirements that haven’t been met. I’ve already mentioned some of them. Anyway, before you can talk about ‘incredibly small’ odds you have to make sure the calculation that produces those odds is valid. At the moment the calculation is not even on view for Ivanov (or a lawyer or statisticians perhaps) to question.

I think I know what you mean though – the engine match ‘seems’ so strong that there ‘shouldn’t’ be any problem putting together a valid case, and even if the case has to be weakened slightly to remain valid, it shouldn’t matter? That would be putting the cart before the horse though. The case (and any calculation) has to be stated before any judgement could be made!

Let’s have a go anyway. Going by what has been said, it seems to me that the accusation against Ivanov, if formalized, might look something like this:

“The probability of a player rated X, playing normally, attaining a match of Y% against the top move of Houdini 3 engines in Zadar games 1-9, is Z%. Z is very small. Therefore he was not playing normally.”

It’s actually a bit of a step from the last statement to “therefore he was cheating”, but let’s ignore that for now.
To complete the accusation, several choices have to be made:

Is it really the top move, or the top 3?
Which rating to count, the pre-tournament or post tournament one, and should his improved performance not be allowed for?
Is it Houdini 3 or Rybka? Or perhaps Houdini 2, as claimed by the protesting players at Zadar?
Do you need to exclude certain games - which ones, and why? This also brings in questions relating to the method of cheating.

You might think well, you just look at the data, and pick the best choices for the accusation - but there is a statistical snag with this; every choice of parameter that you estimate from the data from which the result is to be obtained weakens the statistical strength of any positive result. I don’t know enough to quantify this, but if there are several choices made, I think it could mean dividing the final odds by several numbers relating to those choices.

There is another factor that reduces the statistical strength of the conclusion; the fact that the suspicion arose from the same data as is being used to prove the suspicion increases the chance of a false result. I’m not saying it’s a complete no-no, but I’m not sure how you account for it. An expert like Andy Muir might be able to advise on these questions. Had there been a stated prior suspicion, or some other objective reason to examine the data, it would be stronger.

I’m not just raising procedural objections in an open and shut case. Here is an extreme example to show you how these big astronomical odds can fall to earth if there is a problem with the data.

People eg Lilov are bit blasé talking about top matches or top 3 matches interchangeably, even allowing fourth or fifth match to be counted to make a point. So - let’s say someone allowed the first four choices over Ivanov’s 200 (say) Zadar games, and then someone else did a calculation based on them, but (wrongly) counting as if it were only the top three. What difference would that make to the odds? It might surprise you that in the probability calculation this could cause an error factor of (almost) a million million million million. This number comes from 3/4 to the power 200. As I say it’s an extreme example and I’m not saying anyone has done a calculation in this way, but it’s an example of what can happen if you hand-wave the numbers too much.

So while I 'think' he did cheat, I won’t be convinced the case is in any way ‘proven’ until I see a definite, precise accusation made by a reasonably competent body (who takes responsibility for making it) and it’s scrutinized by the wider community. I don’t mind if the competent body are a little bit biased (difficult to avoid), as long as any judgements made by them within the data are visible for all to see.

As I said before, a comparison with other players would help. In your opening gambit you seem to suggest these are around – if you could post any links I’d be interested, thanks.

Cheers
Reply
#18
Numbers are dangerous things.
Statistics even more so.

Just suppose that my DNA was on file.
A legal expert states in court that the chances of both me and a murder having the same DNA is one in a million.
Am I guilty beyond reasonable doubt??

If one in a million means that that for both parties there is a 1 in 1,000 chance of having that DNA type then.
There would be 6,000 suspects spread across Scotland and another 50,000 plus in England.

I plead (still hypothetically) not guilty. The court should demand extra evidence before convicting on this type of evidence.

For move analysis. Perhaps Ivanov owns and plays against Houdini. It would be valid for Ivanov to know when siting down for a tournament game that Houdini rates I e4 above 1 d4 and that against the Sicilian best moves are 2 Nf3 and 3 d4 and 4 Nxd4. That would be 10% of a game with perfect matches. If he is a 2000 plus player he will probably memorise longer lines than that.

Future cheats (inevitably this will happen) would be well advised to play the English.

No winner of the UK lottery winner ever gets investigated even though the odds of selecting the correct 6 numbers is 1 in 14,000,000. I have heard many stories over the years of winners who used their relative's birthdays to choose their lottery numbers. That explanation would simply not stand up in court.

Back to statistics.
In writing the above I am 95% certain that I am correct.
Reply
#19
Phil Thomas Wrote:Just suppose that my DNA was on file.
A legal expert states in court that the chances of both me and a murder having the same DNA is one in a million.
Am I guilty beyond reasonable doubt??

If one in a million means that that for both parties there is a 1 in 1,000 chance of having that DNA type then.
There would be 6,000 suspects spread across Scotland and another 50,000 plus in England.

Except that the chances of DNA evidence being incorrect are widely regarded as being somewhere in the range of 1 in 3 billion to 1 in 20 billion. That's at least 3000 times more unlikely than your 1 in a million, and means that by even the most conservative estimate there is statistically only one other person on the planet with a close DNA match to you (identical twins excluded). But that would be with a standard DNA test which looks at hundreds of base pairs. If you were to look at the whole genotype of a person (which can be done these days), then you'd find an exact match is pretty much impossible.

Phil Thomas Wrote:I plead (still hypothetically) not guilty. The court should demand extra evidence before convicting on this type of evidence.

A court wouldn't necessarily convict you using this type of evidence. It would use the evidence to say for certain that you were at the scene of the crime. Essentially, a court considers DNA evidence to be conclusive proof that a person was at the scene of a crime. It's as much a fact as seeing them there on CCTV footage.

Phil Thomas Wrote:For move analysis. Perhaps Ivanov owns and plays against Houdini. It would be valid for Ivanov to know when siting down for a tournament game that Houdini rates I e4 above 1 d4 and that against the Sicilian best moves are 2 Nf3 and 3 d4 and 4 Nxd4. That would be 10% of a game with perfect matches. If he is a 2000 plus player he will probably memorise longer lines than that.

Analysis of Ivanov's games started in the middle game to avoid opening theory from skewing the results.

Phil Thomas Wrote:No winner of the UK lottery winner ever gets investigated even though the odds of selecting the correct 6 numbers is 1 in 14,000,000. I have heard many stories over the years of winners who used their relative's birthdays to choose their lottery numbers. That explanation would simply not stand up in court.

But lots of people play the lottery, making it very likely that someone will win it. To make it possible that someone was not at a crime scene but someone else with a very close match to their DNA was, everyone on the planet other than the accused would have needed to - statistically - have been at the crime scene. And even then, there is a very reasonable chance that a DNA match to the accused would not be found. The two aren't comparable. Smile
Reply
#20
Valuable and interesting reply Andrew,

Good to know that opening moves were discounted from the analysis of Ivanov's chess games.

Would dispute one thing though in your analysis.

If there are enough profiles stored in a DNA database then odds of millions to one against a match of one of them with crime scene DNA is not surprising and if that is the only evidence then its not strong enough on its own - I am sure it won't be long before databases with millions of DNA profiles exist.

One thing I'm not sure about perhaps you can help?
In the lottery analogy if there are 14,000,000 possible combinations and 140,000,000 lucky dips were sold then typically one would expect around 10 winners.

For DNA tests are all possible combinations equally well populated? This does seem to be a hidden assumption in the stats.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)