AI resolves at least X% on SWE-bench assistance, by 2025? | Manifold

AI resolves at least X% on SWE-bench assistance, by 2025?

Plus

25

Ṁ5341

Dec 31

99%

X=5

98.2%

X=10

98%

X=20

97%

X=40

3%

X=80

The SWE-bench is a benchmark developed to evaluate if language models can resolve real-world GitHub issues. The leaderboard showcases various models and their performances in terms of the percentage of SWE-bench instances they resolved. Each instance in the SWE-bench represents a GitHub issue. The leaderboard is categorized into two main sections: Unassisted and Assisted.

Assisted: In this category, models have the advantage of the "oracle" retrieval setting where the correct files to edit are directly given to them.

This question is only about the Assisted category of this benchmark.

http://www.swebench.com/#
Current SOTA is 4.8

The prediction market will resolve based on the SWE-bench leaderboard standings as of 31 December 2024.

Multiple answers can be correct.

This question is managed and resolved by Manifold.

#Technical AI Timelines

#Programming Automation

Get

1,000

and

3.00

Sort by:

bought Ṁ500 YES

@AntonOsika SOTA is now 55.0%

(didn't mean to repost)

@Bayesian Isn't that for verified? Does assisted even exist anymore on leaderboards?

Do all the of the options bellow the threshold resolve to true?

bought Ṁ50 YES

reposted

bump

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

-3% 1d53% chance

AI resolves at least X% on SWE-bench without any assistance, by 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025?

What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?

80% on SWE-Bench Verified by Jan 1 2025

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

What will be the best performance on SWE-bench Verified by December 31st 2025?

Will an AI SWE model score higher than 50% on SWE-bench in 2024?

Will >50% of the tasks in the WebArena benchmark be solved by EOY 2024?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

AI resolves at least X% on SWE-bench without any assistance, by 2028?

What will be the best performance on SWE-bench Verified by December 31st 2025?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025?

Will an AI SWE model score higher than 50% on SWE-bench in 2024?

What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?

Will >50% of the tasks in the WebArena benchmark be solved by EOY 2024?

80% on SWE-Bench Verified by Jan 1 2025

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules