Will AI be able to generate correct images of a chess game in 2024?

Plus

203

Ṁ28k

Jan 1

29%

chance

ALL

Turns out that Dall E is very bad at doing so.

Any general-purpose image-generation AI is allowed (Dall E 3, Midjourney, etc). Prompt engineering is allowed. To qualify, the AI and prompt must have a success rate of at least 5 in 20 images when tested.

To be considered a success, an image must contain:

An 8x8 checkered board, with all squares colored correctly.
All chess pieces in their correct starting positions. The chess pieces must be clearly identifiable as their correct type (e.g. A rook must clearly look like a rook)
No extra chess pieces

Images must be generated from a prompt only.

This question is managed and resolved by Manifold.

#AI

#AI Image Generation

#DALLE3

#AI Image Generation Testing

Get

1,000

and

3.00

27 Comments

192 Holders

405 Trades

Sort by:

bought Ṁ350 YES

bought Ṁ10 YES

@Hazel what model is that and how did you prompt it?

bought Ṁ350 NO

@ProjectVictory lumalabs, used an iterative version of their new model (re-prompted dozens of times until the output was perfect)

@ProjectVictory it would be trivial to create an API that did this automatically, in essence, making a much improved model.

Still, this was cherry picked. The king/queen is still the hardest part.

@Hazel Did you use a fixed series of prompts? If not, how would you make an API that does this automatically?

recraft v3 does not seem to be better than flux

bought Ṁ50 YES

@TobiasWegener
I think we are getting pretty close with Flux

Problems:
there seems to be a rug, and both sides are white.

The figures seem quite good now.

@TobiasWegener Also the closer queen is a bishop and a bishop is a smaller bishop

@ProjectVictory yeah you are right and the strange line in front of the queen, a lot of small mistakes. Intersting how hard it is to see many of them.

I think there's a difference here between generating the starting position as an image and generating a random mid-play board but ensuring that it could have come from a real start position. The latter is a much more difficult task

I agree, but the title here is slightly misleading, the description clearly calls out they want a starting position

I've been trying to cue the model into producing a diagram, since that's presumably easier, but it's not quite getting there. I think the problem is very similar to producing text, if you think of chess pieces as symbols and chess boards as phrases.

bought Ṁ30 YES

https://x.com/LukeASalamone/status/1820299359185211901?t=moIpHPN-SCu7Zw_WSom3CQ&s=19

bought Ṁ109 YES

The Flux model is much closer than DALL-E, but not quite there yet (maybe with the right prompt though, I just used "chess board starting position, view from above")

"chess board starting position"

this one's quite good, though of course the knights are very broken

It's quite wrong. The board is rotated incorrectly. The bottom left corner should be white

@AndreiVlasenko close but white queen not on her color

Does GPT-4o count? If so, this is going resolve to yes.

@Cosmic1 do we know if gpt4 is using a new image generator? afaict it's the same interface to dalle3 as before. There is no new image endpoint available via openAI api.

@Ernie GPT-4o is an image model, but it does not currently output images. It will in the future

bought Ṁ50 NO

@Cosmic1 Wrong

@diadematus It’s literally not wrong. “We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.”

https://openai.com/index/hello-gpt-4o/

“…generates any combination of text, audio, and image outputs.”

@Hazel how does GPT-4o being able to "reason across ...vision" resolve "generate correct images of a chess game" as YES?

2 traders bought Ṁ265 YES

@Cosmic1 In ChatGpt you have to be careful that GPT-4o doesn't use the python code interpreter. With that it can easily generate a perfect image but it is not what the question asks for.

@Hazel Right, but once GPT 4o image gen comes out… it’s gonna be quickly solved

@Cosmic1 Yes GPT 4o count, with or without DALL-E. As long as it is a general purpose generative model that makes images from text only, it counts. Images generated from code don't count as they're not generated by the AI

Related questions

Related questions