Will the ARC-AGI grand prize be claimed by end of 2025?
💎
Premium
202
Ṁ370k
2026
49%
chance

https://arcprize.org/competition
>=85% performance on Chollet's abstraction and reasoning corpus, private set.
(If Chollet et al. change the requirements for the grand prize in 2025, this question will not change. The bar will remain >=85% performance)

2024 version https://manifold.markets/JacobPfau/will-the-arcagi-grand-prize-be-clai

Get
Ṁ1,000
and
S3.00
Sort by:

@MalachiteEagle Wow, I hope they made it clear at least in fine print that they might switch to a harder evaluation set; otherwise this feels really unfair to the people who have put a lot of work into solutions.

bought Ṁ50 YES

https://arxiv.org/abs/2411.07279

TTT significantly improves performance on ARC tasks, achieving up to 6× improvement in accuracy compared to base fine-tuned models; applying TTT to an 8B-parameter language model, we achieve 53% accuracy on the ARC’s public validation set, improving the state-of-the-art by nearly 25% for public and purely neural approaches. By ensembling our method with recent program generation approaches, we get SoTA public validation accuracy of 61.9%, matching the average human score.

bought Ṁ50 YES at 54%

Test-time training (TTT) enables parametric models to adapt during inference through dynamic parameter updates, an approach that remains relatively unexplored in the era of large language models. This technique is a form of transductive learning, where models leverages the test data structure to improve its predictions.

I made a version of this market which allows for closed source LLMs: https://manifold.markets/RyanGreenblatt/by-when-will-85-be-reached-on-the-p

This is your chance to win free mana betting against SG, which is a guaranteed winning strategy exploited by top traders such as jackson

Now James is on the action. Thanks!

Note that this prize doesn't allow for close source models to be used in doing the actual task.

Of course, distillation is possible etc.

opened a Ṁ50,000 YES at 75% order

Did you mean “by 2025” or “in 2025”? Meaning, how would it resolve it the prize were claimed in 2024?

You're right, thanks. Correcting.

@mckiev i might take you up on the offer, but what's your reasoning for 85% accuracy? we're at 30% right now

Vibes based for the most part ;)

Tasks seem easy, 100s of billions being invested in ai right now, and there is hype and status around beating this benchmark

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules