Turns out that Dall E is very bad at doing so.
Any general-purpose image-generation AI is allowed (Dall E 3, Midjourney, etc). Prompt engineering is allowed. To qualify, the AI and prompt must have a success rate of at least 5 in 20 images when tested.
To be considered a success, an image must contain:
An 8x8 checkered board, with all squares colored correctly.
All chess pieces in their correct starting positions. The chess pieces must be clearly identifiable as their correct type (e.g. A rook must clearly look like a rook)
No extra chess pieces
Images must be generated from a prompt only.
@ProjectVictory lumalabs, used an iterative version of their new model (re-prompted dozens of times until the output was perfect)
@ProjectVictory it would be trivial to create an API that did this automatically, in essence, making a much improved model.
Still, this was cherry picked. The king/queen is still the hardest part.
@Hazel Did you use a fixed series of prompts? If not, how would you make an API that does this automatically?
@TobiasWegener
I think we are getting pretty close with Flux
Problems:
there seems to be a rug, and both sides are white.
The figures seem quite good now.
@ProjectVictory yeah you are right and the strange line in front of the queen, a lot of small mistakes. Intersting how hard it is to see many of them.
I've been trying to cue the model into producing a diagram, since that's presumably easier, but it's not quite getting there. I think the problem is very similar to producing text, if you think of chess pieces as symbols and chess boards as phrases.
@Cosmic1 do we know if gpt4 is using a new image generator? afaict it's the same interface to dalle3 as before. There is no new image endpoint available via openAI api.
@diadematus It’s literally not wrong. “We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.”
https://openai.com/index/hello-gpt-4o/
“…generates any combination of text, audio, and image outputs.”
@Hazel how does GPT-4o being able to "reason across ...vision" resolve "generate correct images of a chess game" as YES?
@Cosmic1 In ChatGpt you have to be careful that GPT-4o doesn't use the python code interpreter. With that it can easily generate a perfect image but it is not what the question asks for.
@Cosmic1 Yes GPT 4o count, with or without DALL-E. As long as it is a general purpose generative model that makes images from text only, it counts. Images generated from code don't count as they're not generated by the AI