erinptah | Entries tagged with chess

Have you heard about the “feed this document to an LLM, it’ll generate a podcast episode discussing it” gimmick?

Somebody fed it a document that’s just the phrase “poopoo peepee” over and over. And then took the discussion of that document, and fed it back into the LLM to discuss that. And then did it again. That link is to a 10-episode playlist. (…They’re only between 6 and 15 minutes long, so it’s not as much content as you’d think.)

It’s as stupid as it sounds — which makes it the perfect demonstration of “how LLMs will take anything, even the dumbest nonsense, and generate a response that has the vibe of something meaningful.” The recursion reveals even more levels of this. I swear every single video has included some version of “it reminds me of the absurdity of Dadaism”…and none of them acknowledge “the video we’re analyzing said it reminded them of Dadaism”…and they’re always repeating the same 2 basic facts about Dadaism.

(Bonus: the software always pronounces it “Day-day-ism” or “Daddy-ism”.) (Bonus 2: At least one of the videos inserts a fake ad break.)

There’s some hallucination, too. The TTS voices start referring to things like “funny sound effects” and “dramatic zooms”, which do not exist at all in the video they’re “analyzing.” One video says “Do we even really know what the original document said anymore?” (Yes. Yes, we do.) There are at least three variations of “it makes me think of apophenia, have you ever heard of that?” / “No, what does it mean?” when they’re supposedly “discussing” a video where apophenia was brought up and defined.

This is almost as good as the Chatbot Chess Championships 2026. (Which had a lot fewer random nonsense moves than the last one…but in one of the videos GothamChess mentions that he’s been “reminding the bots where their pieces are.” Boo.)

–

The rest of this is just a roundup of bots faceplanting in non-chess fields:

“I decided to do an experiment/torture myself with the default image model you get when you open Gemini. The prompt: “create a grid showing the flags of european countries in alphabetical order. there should be labels below them stating their name. the one for Liechtenstein should have a note below it saying “doubly-landlocked”.”“

“What really happened was that someone who is fully equipped to know better was surprised when her AI agent — a class of software that does not work reliably and cannot work reliably — messed up. […] This is not a misfortune befalling some random person — this is the director of AI alignment at Meta.”

“Amazon is absolutely clear who’s to blame for all this — this 13-hour outage caused by their own bot turning something off and on again is officially user error!“

“Press Start Gaming is almost certainly a tool for making money off of ads and sponsored posts, and posts like the Phantasy Star Fukkokuban misinformation exist mostly to give the site more juice of looking like a real website. If someone goes out and buys a copy of Fukkokuban expecting a new and improved Phantasy Star with better graphics and new sidequests, what do they care? The article wasn’t really meant to provide information.”

“It finally became clear to me and the COYOTE team that we’d been bamboozled. Someone had fabricated an identity and put our call for pitches into a large language model like ChatGPT, just to make a fake story that we’d feasibly pay for. When I started asking too many questions, that someone evaporated.“

“Even entire passwords repeat: In the above 50 attempts, there are actually only 30 unique passwords. The most common password was G7$kL9#mQ2&xP4!w, which repeated 18 times, giving this specific password a 36% probability in our test set.”

The Onion: “HmmAI is at the bleeding edge of artificial intelligence, responding to prompts about recipe ideas, ancient history, or even advanced nuclear physics with the word ‘huh’ in just a fraction of a second.”

The promised recs for “videos about the reality of LLMs attempting to play chess” from the GothamChess channel.

ChatGPT versus Stockfish (an actual, functional chess-playing program)
ChatGPT versus GothamChess himself
ChatGPT versus Google Bard (they both make absurd moves at each other)
ChatGPT versus DeepSeek (it starts with valid moves, then deteriorates, and both boths keep giving nonsense explanations of their moves)
The most recent ChatGPT versus Stockfish (this one has a twist at the end, I won’t even spoil it)
He also did his own Chatbot Chess Championship (playlist is in reverse order, so I linked the last video in the list, which is the first game).

The host plays the games out on-screen for you, with explanations and commentary. These ones aren’t for serious chatbot-testing purposes, they’re for entertainment — so when the bots make up illegal moves, he usually just runs with them. Sometimes with narration like “and here ChatGPT summons an extra rook from another dimension” or “You might think this is just a pawn, but Grok knows it’s secretly a horse pawn!”

Once in a while, he’ll tell the bot its move is illegal. Some of them go into “yes, of course, you’re right, my mistake” sycophancy mode. Others just get weirder.

The bots teleport pieces through each other. Manifest already-taken pieces back from the Shadow Realm. Spawns more pieces than it had to start with. Move pieces in directions they don’t go. And just because it’s making up moves, doesn’t mean it’s making up good moves! Sometimes it takes its own pieces. Sometimes it puts itself in check!

Sometimes they also generate their opponent’s moves. Because “black moves 1” is typically followed by “white moves 2, black moves 3, white moves 4” — and the bots don’t actually have a meaningful sense of “stop auto-generating text at the end of move 1.”

I was curious if the LLM’s idea of moves included “making up whole new categories of pieces” or “moving to squares that aren’t on the 8×8 chess grid.” Haven’t seen either of those so far.

One thing I didn’t anticipate is, sometimes a bot tells the other player their move is illegal. Even when it’s not! Saying “there’s a piece in your way” (when there isn’t), or “the king can’t move to E7” (not for any rules-based reason, the bot was just gatekeeping E7).

The newer bots also give general paragraphs on “here’s the explanation for my move,” which are absolutely just LLM Word Salad(TM) made of chess words. As a person who knows Basic Chess Rules but doesn’t actively play the game, sometimes I need GothamChess’s breakdown to see why they’re nonsense. Other times it’s just the bot saying “I have put you in check!” when the other player is blatantly not in check.

The whole thing was very informative, and also really entertaining. (…And it doesn’t involve the chatbots doing anything consequential, so it’s a nice break from all the stories about LLMs putting someone’s life in danger.) Give it a look.

So I was reading a post that was supposed to be about testing different LLMs at chess…and the author keeps saying things like “I asked it for the next move, and if the first 20 responses were all illegal, I chose a legal move at random.”

My dude (gender-neutral), this means the model cannot play chess.

Just imagine applying this logic to any other kind of tech. “If I run the vacuum cleaner over the same cat hair 20 times and it still doesn’t get sucked up, I pick up the cat hair by hand and keep going. And look, I end up with a clean carpet! This proves how well the vacuum works!”

I mentioned all this on Mastodon/Bluesky, and added that what I really wanted to see was a breakdown of the kind of illegal moves LLMs try to make. Someone replied with a rec for GothamChess on Youtube. (I’ve watched a bunch of his LLM game videos now, they’re exactly what I was looking for, more on those later.)

The thing is, though: I was out at the time, I couldn’t stop to watch videos, so I just googled the guy on my phone. When I leave a tab open, it’s a reminder to check this out once I get home.

…And one of the top search results was a Reddit post with the summary, quote, “American Internatiol Master Levy Rozman, AKA “GothamChess” has just been charged with one count of first-degree murder.”

Screenshot of Google results, one with this hallucinated summary: American Internatiol Master Levy Rozman, AKA "GothamChess" has just been charged with one count of first-degree murder.

I was, uh, pretty alarmed by this. I clicked through, hoping to find out more about what happened.

…The actual Reddit conversation is all about what makes GothamChess’s Youtube channel engaging. No charges. Not even allegations. No mention of murder at all!

So…what gives? Is Reddit putting AI-hallucinated summaries in the metadata of its own posts, or is Google using AI-hallucinated summaries to replace what the site gives it? Which executive signed off on this?

Edit: The line is apparently the title of a completely different (and joking!) Reddit post. Thanks to Gwen for spotting it! It’s not linked in the post above, or in any of the comments — apparently Reddit was showing it as a “Related Post” for Gwen, and for me, it isn’t even doing that.

So it’s not completely hallucinated text…it’s just pulled from a completely inappropriate part of the page. And then put at almost the top of the Google search results. Without linking back to the context that would show it’s a joke. Either it’s a normal algorithm, but for some reason it was programmed to pull summary text from random parts of the page…or it’s still an LLM, having the “you should eat several small rocks per day” problem.

Wish I knew which it was. And I’m still curious which of the companies is falling on their face, here.

humanist + humorist

THOR'S INSPIRATIONAL SPEECHES ARE NOT ADMISSIBLE IN COURT

Entries tagged with chess

HmmAI is at the bleeding edge of artificial intelligence [and other LLM links]

LLM Word Salad but it’s specifically chess words

LLMs, chess, and–Where did the hallucinated murder allegations come from?

Profile

Links

Syndicate

Most Popular Tags

Style Credit