Neural Nets will have human-level situational awareness by the end of 2025.

Plus

Ṁ2145

Dec 31

60%

chance

ALL

Set criteria:

Understand that they're NNs, how their actions interface with the world.
Can explain the likely consequences of their actions

Inspired by tweet thread:

Link: https://twitter.com/RichardMCNgo/status/1640568775018975232?s=20

This question is managed and resolved by Manifold.

#AI

#Technical AI Timelines

Get

1,000

and

3.00

9 Comments

30 Holders

59 Trades

Sort by:

Any updated thoughts on how this will be operationalized? I'm not sure what tests we could apply here that they don't already obviously pass.

For this to resolve no, do we just have to find a few examples of prompts that consistently "trick" the AI in ways that humans wouldn't be tricked? If so, I actually feel this is very likely to resolve no.

But if they just have to understand that they're an LLM talking to a human through a chat interface it seems an obvious yes and we can resolve today.

@ChrisPrichard the resolution criteria is extremely ambiguous. I have no idea how this is going to resolve. Chatgpt can explain it's a neural network and the consequences of it's actions. Does it "understand" it? How will it be tested?

For the record, my object-level prediction on this is ~39%, but I'd put ~58% chance that Richard will see it as yes. Accounting for that and Nathan's perception of "community consensus," I'm betting at ~54%.

The scary kind of situational awareness is when a model uses situational knowledge to guide its outputs in a "semantics-agnostic" way. I.e. there's a spectrum between 'coherently talk about self' to 'act on self-knowledge in contexts not mentioning anything about self'. I wrote up an example of the spookier kind of situational awareness [here](https://www.lesswrong.com/posts/tJzdzGdTGrqFf9ekw/early-situational-awareness-and-its-implications-a-story), but I suspect it's very hard to come up with general criterion describing more things of this kind in advanced.

@JacobPfau @NathanpmYoung C.f. also Evan's discussion in this section. Testing for situational awareness would involve training the model on mentions of information relevant to its situation, and then verifying that it uses this information in very different settings.

If it's going to judged based on how they answer questions about it, it's doesn't seem that unlikely, answering questions is their strength.

On what date will liberals have human level self awareness?

@MarkIngraham lol

@NathanpmYoung (to readers, I'm a liberal)

Related questions

Related questions