Just how strong is ChessUp bot level 12 "Stockfish elo: 1500" in actuality? My investigation

So I thought it’d be interesting to determine how strong the first ChessUp 2 bot that is based on Stockfish, bot level 12 “Stockfish elo: 1500” is. I first started out by testing it against the chess.com bots and discovered that the ratings for the chess.com bots are much higher than they should be.

ChessUp 2 level 12 managed to mow through the chess.com bots up to the ones rated in the 2500 range. It had some difficulty with the Kosteniuk (“2561”) bot but managed to win a 6 game match against it, and after drawing the first game against the Naroditsky (“2650”) bot I decided to expedite things and moved it up to the Judit Polgar bot (“2735”). The Judit bot made short work of the ChessUp 2 level 12 bot easily winning the three games they played.

So after that I thought it’d be interesting if I use Fritz 20’s rated game feature to see what it thinks the ChessUp 2’s level 12 bot’s rating is. Fritz 20’s rated game feature has you playing against a weakened Fritz 20 which can make its moves within 1 second each and lets you choose its strength between 1040 and 2440. The 2440 setting brutally outplayed CU2 L12 so from then on I tried to pair it against lower opponents.

in 20 games the highest rating the CU2 L12 achieved was 1967 and with a performance of 10.5 points out of 20 games against opponents with an average rating of 1945 the CU2 L12 ended up with a rating of 1924. As I’m nowhere near that strength I cannot verify it but it sounds like a reasonable estimate, and if that estimate is correct it’d mean it’s over 400 elo stronger than what the “1500” label claims.

1 Like

Cool - would be curious what level 11 is like when compared to Fritz levels.

Then with that info we can make the jump smaller between 11 and 12.

That’ll be on my to-do list but I think it might be a little bit trickier due to the current endgame bug I discovered for bots level 11 and below in which they don’t know how to mate with queen vs lone king, this could cause it to allow its opponent to escape with a draw. I discuss the bug in this thread: Major endgame flaw with the hardwired ChessUp bots (levels 1 to 11)?

Completely unscientific and anecdotal but I have always found Chess.com bots much easier than an equivalent CU2 offline bot. Im roughly in the 1200 range. Can beat chess.com bots up to around 1500 but on CU2 1100 is a good match.

Definitely feel that Chess.com bots are overrated make blunders and mistakes no human would ever make at those levels. CU2 feels more accurate.

I used the same Fritz 20 rated game feature to test ChessUp level 13 (Stockfish elo: 1600) and ended up with a rating of 2053 after 20 games against an average opponent rating of 2113. While this is very close to the +100 elo you’d expect from going between ChessUp level 12 and 13 they both seem to be much stronger than the “Stockfish elo: 1500 and 1600” descriptions.

However it is possible the Fritz 20 rated game feature is overestimating the rating?

I think it is more likely that the estimate from the Fritz 20 rating tool is closer to the truth. Here’s one of the games in which ChessUp level 13 managed to get 3 brilliancies according to the chess.com Game Review at max settings, and an 84% accuracy score vs black’s 78.8%:

[Event “Rapid 60min”]
[Site “?”]
[Date “2025.12.06”]
[Round “?”]
[White “ChessUp level 13”]
[Black “2200”]
[Result “1-0”]
[ECO “D45”]
[WhiteElo “1750”]
[BlackElo “2200”]
[WhiteFideId “-1”]
[BlackFideId “-1”]
[PlyCount “147”]
[GameId “2252103756472338”]
[TimeControl “3600”]

  1. c4 {0} e6 {0} 2. Nc3 {9} d5 {0} 3. e3 {7} Nd7 {0} 4. Nf3 {9} c6 {0} 5. a3 {10} Ngf6 {0} 6. d4 {9} Be7 {0} 7. b3 {7} b6 {0} 8. Bb2 {7} dxc4 {0} 9. bxc4 {15} Bb7 {0} 10. e4 {7} e5 {0} 11. Be2 {7} exd4 {0} 12. Nxd4 {13} O-O {0} 13. Nf5 {10} a6 {0} 14. Qd3 {8} Re8 {0} 15. f4 {8} g6 {0} 16. Nh6+ {8} Kf8 {0} 17. Rd1 {9} Kg7 {0} 18. Qh3 {9} b5 {0} 19. e5 {7} Qb6 {0} 20. Nxf7 {12} Bc8 {0} 21. f5 {7} Kxf7 {0} 22. Rxd7 {12} Bxd7 {0} 23. exf6 {11} Bxf6 {0} 24. Qxh7+ {12} Kf8 {0} 25. fxg6 {10} Re7 {0} 26. g7+ {8} Bxg7 {0} 27. Rf1+ {9} Rf7 {0} 28. Rxf7+ {11} Kxf7 {0} 29. Qxg7+ {14} Kxg7 {0} 30. Na4+ {9} Qd4 {0} 31. Bxd4+ {16} Kg6 {0} 32. Nc5 {12} Be8 {0} 33. Bd3+ {12} Kh6 {0} 34. Kd2 {8} Bg6 {0} 35. Bf1 {10} Rd8 {0} 36. Ke3 {9} Bf7 {0} 37. g3 {8} Bxc4 {0} 38. Bxc4 {12} bxc4 {0} 39. a4 {8} Kh5 {0} 40. h3 {10} Re8+ {0} 41. Ne4 {9} Rf8 {0} 42. Bc3 {8} Rf1 {0} 43. Nf6+ {9} Kg5 {0} 44. a5 {7} Rxf6 {0} 45. Bxf6+ {14} Kxf6 {0} 46. Kd4 {8} Kf5 {0} 47. h4 {7} Kg6 {0} 48. g4 {7} Kf6 {0} 49. Kxc4 {10} Ke5 {0} 50. h5 {9} Ke6 {0} 51. g5 {9} Kf5 {0} 52. h6 {10} Kg6 {0} 53. Kc5 {9} Kxg5 {0} 54. h7 {11} Kf4 {0} 55. h8=Q {10} Ke3 {0} 56. Qc8 {11} Kd3 {0} 57. Qa8 {11} Kc2 {0} 58. Qxa6 {13} Kb2 {0} 59. Qf1 {10} Kb3 {0} 60. Qh1 {8} Kb2 {0} 61. Qf1 {8} Kc2 {0} 62. a6 {8} Kd2 {0} 63. Qh1 {9} Ke3 {0} 64. Qh6+ {12} Ke2 {0} 65. Qf6 {10} Kd1 {0} 66. Qe6 {9} Kc2 {0} 67. Qxc6 {11} Kc3 {0} 68. a7 {11} Kd3 {0} 69. a8=Q {8} Ke3 {0} 70. Qh1 {9} Kd3 {0} 71. Qe1 {8} Kc2 {0} 72. Qf2+ {8} Kb3 {0} 73. Qh2 {8} Kc3 {0} 74. Qf3# {10} 1-0

I must say though that a few times during my testing ChessUp level 13 played strangely horrific in the opening, like in this game where it looks like it was tilted:

[Event “Rapid 60min”]
[Site “?”]
[Date “2025.12.07”]
[Round “?”]
[White “ChessUp level 13”]
[Black “2027”]
[Result “0-1”]
[ECO “D20”]
[WhiteElo “2033”]
[BlackElo “2027”]
[WhiteFideId “-1”]
[BlackFideId “-1”]
[PlyCount “32”]
[GameId “2252431377506322”]
[TimeControl “3600”]

  1. d4 {0} d5 {0} 2. c4 {10} dxc4 {0} 3. e4 {16} e5 {0} 4. Nf3 {10} exd4 {0} 5. Bxc4 {12} Bb4+ {0} 6. Nc3 {14} dxc3 {0} 7. Qa4+ {14} Nc6 {0} 8. Bxf7+ {14} Kxf7 {0} 9. O-O {13} Qe7 {0} 10. Bg5 {12} Nf6 {0} 11. Rae1 {11} cxb2 {0} 12. e5 {9} Bxe1 {0} 13. Rxe1 {11} h6 {0} 14. a3 {8} hxg5 {0} 15. Nxg5+ {13} Kg6 {0} 16. e6 {14} Kxg5 {0 Leto resigns} 0-1

But more often than not I feel that ChessUp level 13 played solid chess. According to chess.com’s Game Review it scored an average of 85.56% as white and 82.02% as black.

Next up for me is ChessUp level 11 although I’m unsure how well that would go because it currently has no endgame knowledge and can allow draws even if it’s up by a queen against a lone king. I think my best option in this case is to put it against an opponent that is rated 200 elo stronger, but finding that out might be tricky and am not sure if it’s worth the effort since they plan on giving it some endgame knowledge.

After 4 games in my test of the ChessUp level 11 (“1400”) bot I have decided to terminate the test after this disappointing game where level 11 was completely winning and gifted white with a draw by repetition:

[Event “Rapid 60min”]
[Site “?”]
[Date “2025.12.07”]
[Round “?”]
[White “1403”]
[Black “ChessUp Level 11”]
[Result “1/2-1/2”]
[ECO “C00”]
[WhiteElo “1403”]
[BlackElo “1533”]
[WhiteFideId “-1”]
[BlackFideId “-1”]
[PlyCount “73”]
[GameId “2252477916024850”]
[TimeControl “3600”]

  1. e4 {0} e6 {31} 2. Nf3 {0} d5 {8} 3. Bb5+ {0} c6 {8} 4. Bf1 {0} dxe4 {12} 5. Ng1 {0} e5 {9} 6. Qh5 {0} Qd4 {12} 7. Nc3 {0} Nf6 {17} 8. Qg5 {0} Be6 {15} 9. Qg3 {0} h6 {17} 10. d3 {0} exd3 {19} 11. Bxd3 {0} Nh5 {110} 12. Qe3 {0} Be7 {12} 13. Rb1 {0} g6 {12} 14. h3 {0} Bd7 {15} 15. Ne4 {0} Qd5 {12} 16. b4 {0} Qxa2 {17} 17. c3 {0} Nf4 {15} 18. Qf3 {0} g5 {10} 19. Ne2 {0} Nxd3+ {22} 20. Qxd3 {0} Be6 {13} 21. Be3 {0} Bc4 {17} 22. Qd1 {0} Bxe2 {17} 23. Nd6+ {0} Bxd6 {11} 24. Qxd6 {0} Qxb1+ {13} 25. Kxe2 {0} Qc2+ {10} 26. Ke1 {0} Qxc3+ {15} 27. Kf1 {0} Qa1+ {9} 28. Ke2 {0} Qb2+ {10} 29. Bd2 {0} Qd4 {26} 30. Qc7 {0} Qc4+ {9} 31. Ke3 {0} Qb3+ {8} 32. Ke2 {0} Qc4+ {9} 33. Kd1 {0} Qb3+ {8} 34. Kc1 {0} Qc4+ {9} 35. Kd1 {0} Qb3+ {8} 36. Kc1 {0} Qc4+ {9} 37. Kd1 {0} 1/2-1/2

When the hardwired bots (levels 1 to 11) get updated I’ll start a new test with level 11.

So I’ve been testing the ChessUp 2 level 13 bot at Lichess against the community bots and after 23 games it has a rapid rating of 2258 and my observation is that it is very close in strength with the SimpleEval bot which currently has a 2234 rapid rating and a 2167 blitz rating.

The question is how does the Lichess rating compare to the estimated rating given by the Fritz rating tool which was 2053? I wish I had done the Lichess testing with blitz time control as the Chessup 2 bots already play fast no matter what time control but even if we were to assume the Chessup 2 level 13 bot is only around 2160 in blitz (I’m assuming in blitz it’d still be around the same strength as the SimpleEval bot) that’s still a 100 elo difference between the two estimates.

But now there’s two estimates putting the ChessUp 2 level 13 bot at over 2000 elo which is significantly higher than the bot’s description of “1600” elo. Perhaps the description should be updated or preferably the bot be weakened without sacrificing its endgame knowledge.