Will QACI turn out to be a viable alignment plan?

Plus

Ṁ1788

2099

20%

chance

ALL

https://www.lesswrong.com/posts/4RrLiboiGGKfsanMF/the-qaci-alignment-plan-table-of-contents

This question is managed and resolved by Manifold.

#️ AI Alignment

Get

1,000

and

3.00

10 Comments

21 Holders

63 Trades

Sort by:

predictedYES

note that QACI does not intend to be a full alignment plan, merely a plan for a formal goal which produces nice things when maximized.

an AI which takes as input QACI and maximizes it is also required, for a full alignment plan.

Prior on something being a viable alignment plan is quite low, and I suspect that QACI in particular runs into the problem of being impossible to do in full and not having good approximations

Does viable mean it succeeds in creating aligned AI, or some intermediate goal?

predictedYES

@KatjaGrace I would currently bet yes at 50% on "succeeds at creating aligned ai sufficient to produce utopia with no further design work". the only other candidate I'd do that with is the one I sketched in my market about what a promising alignment plan looks like. QACI is not quite ready to use, though; it's possible an invalidating counterexample will be found that breaks the whole thing, but right now it seems like it nails several of the hard alignment components while also getting soft alignment close to right.

predictedYES

@L more theory processing is needed to actually flesh it out into concrete steps, but having been a deep learning nut for a long time, this is the first time a MIRI-style plan has looked exciting to me. (it took me quite a while to be convinced it didn't require actually simulating the universe from the beginning, though.)

My main issue with it is the risk of invoking the teleporter problem. I think we can fix that without departing from being QACI. a well-designed QACI impl shouldn't, in my opinion, actually need a strong pivotal act; weak/local pivotal acts should do.

@KatjaGrace Viable means that it will succeed in creating aligned AI, or that it will be judged to have a meaningful chance of doing so in counterfactuals where it is attempted.

predictedYES

@NoUsernameSelected anything you can critique?

predictedNO

@L Not especially, it just seemed like 77% was very high for one specific, highly speculative/still-in-development alignment plan. I do think something QACI-like or a downstream plan is one of the better alignment hopes we currently have.

bet against me and explain why pls

predictedYES

@L im very convinced that QACI is a viable plan (well, minus the pivotal act part, but I think it's such a good plan the resulting ai would not dare attempt a pivotal act)

Related questions

Related questions