jessicata

Karma: 9,987

Jessica Taylor. CS undergrad and Master’s at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Blog: unstableontology.com

Twitter: https://twitter.com/jessi_cata

jessicata 20 May 2025 16:07 UTC
2 points
0
in reply to: dr_s’s comment on: The absent-minded variations
Think of it as a predicate on policies. The predicate (local optimality) is true when, for each action the policy assigns non-zero probability to, that action maximizes expected utility relative to the policy.

I am interpreting that formula as “compute this quantity in the sum and find the a’ from the set of all possible actions \mathcal{A} that maximizes it, then do that”. Am I wrong?

Yes. It’s a predicate on policies. If two different actions (given an observation) maximize expected utility, then either action can be taken. Your description doesn’t allow that, because it assumes there is a single a’ that maximizes expected utility. Whereas, with a predicate on policies, we could potentially allow multiple actions.

Or is your point that the only self-consistent policy is the one where both have equal expected utility, and thus I can in fact choose either? Though then I have to choose according to the probabilities specified in the policy.

Yes, exactly. Look up Nash equilibrium in matching pennies. It’s pretty similar. (Except your expected utilities as a function of your action depend on the opponent’s actions in matching pennies, and your own action in absent minded driver.)

jessicata 20 May 2025 14:24 UTC
2 points
0
in reply to: dr_s’s comment on: The absent-minded variations
Given a policy you can evaluate the expected utility of any action. This depends on the policy.

In the absent minded driver problem, if the policy is to exit 10% of the time, then the ‘exit’ action has higher expected utility than the ‘advance’ action. Whereas if the policy is to exit 90% of the time, then the ‘advance’ action has higher expected utility.

This is because the policy affects the SIA probabilities and Q values. The higher your exit probability, the more likely you are at node X (and therefore should advance).

The local optimality condition for a policy is that each action the policy assigns non-zero probability to must have optimal expected utility relative to the policy. It’s reflective like Nash equilibrium.

This is clear from the formula:

$\forall o \in O, a \in A : π (a | o) > 0 \Rightarrow a \in arg {max}_{a^{'} \in A} \sum_{s} S I A_{π} (s | o) Q_{π} (s, a^{'})$

Note that SIA and Q depend on $π$ . This is the condition for local optimality of $π$ . It is about each action that $π$ assigns non-zero probability to being optimal relative to $π$ .

(That’s the local optimality condition; there’s also global optimality, where utility is directly a function of the policy, and is fairly obvious. The main theorem of the post is: Global optimality implies local optimality.)

jessicata 20 May 2025 12:10 UTC
2 points
0
in reply to: dr_s’s comment on: The absent-minded variations
No, check the post more carefully… $π (a | o)$ in the above condition is a probability. The local optimality condition says that any action taken with nonzero probability must have maximal expected value. Which is standard for Nash equilibrium.

Critically, the agent can mix over strategies when the expected utility from each is equal and maximal. That’s very standard for Nash equilibrium e.g. in the matching pennies game.

jessicata 20 May 2025 0:20 UTC
2 points
0
in reply to: dr_s’s comment on: The absent-minded variations
I still think it shouldn’t matter...

We should have f(p) / g(p) = 0 in exactly the cases where f(p) = 0, as long as g(p) is always positive.

In the formula

$\forall o \in O, a \in A : π (a | o) > 0 \Rightarrow a \in arg {max}_{a^{'} \in A} \sum_{s} S I A_{π} (s | o) Q_{π} (s, a^{'})$

we could have multiplied $S I A_{π} (s | o)$ for all s by the same (always-positive) function of $π, o$ and gotten the same result.

jessicata 19 May 2025 6:25 UTC
2 points
0
in reply to: dr_s’s comment on: The absent-minded variations
If we look at a formula like $F_{p} (X) (Q_{p} (X, E x i t) - Q_{p} (X, A d v a n c e)) + F_{p} (Y) (Q_{p} (Y, E x i t) - Q_{p} (Y, A d v a n c e)) = 0$ , it shouldn’t make a difference when scaling by a positive constant.

The nice thing about un-normalized probabilities is that if they are 0 you don’t get a division by zero. It just becomes a trivial condition of $0 = 0$ if the frequencies are zero. It helps specifically in the local optimality condition, handling the case of an observation that is encountered with probability 0 given the policy.

jessicata 18 May 2025 19:37 UTC
2 points
0
in reply to: dr_s’s comment on: The absent-minded variations

THIS IS A SIMULATION YOU ARE INSTANCE #4859 YOU ARE IN EITHER NODE X OR Y YOU CAN CHOOSE A PROBABILITY TO [Advance] or [Exit] THE UTILITY IS AS FOLLOWS: * EXITING AT X IS WORTH 0 AND TERMINATES THE INSTANCE * ADVANCING AT X REINITIALISES THE SIMULATION AT Y * EXITING AT Y IS WORTH 4 AND TERMINATES THE INSTANCE * ADVANCING AT Y IS WORTH 1 AND TERMINATES THE INSTANCE PLEASE USE THE KEYPAD TO INPUT THE DESIRED PROBABILITY TO MAXIMISE YOUR UTILITY GOOD LUCK

Set of states: {X, Y, XE, YE, YA}

Set of actions: {Advance, Exit}

Set of observations: {”″}

Initial state: X

Transition function for non-terminal states:

t(X, Advance) = Y

t(X, Exit) = XE

t(Y, Advance) = YA

t(Y, Exit) = YE

Terminal states: {XE, YE, YA}

Utility function:

u(XE) = 0

u(YE) = 4

u(YA) = 1

Policy can be parameterized by p = exit probability.

Frequencies for non-terminal states (un-normalized):

$F_{p} (X) = 1$

$F_{p} (Y) = 1 - p$

SIA un-normalized probabilities:

$S I A_{p} (X | "") = 1$

$S I A_{p} (Y | "") = 1 - p$

Note we have only one possible observation, so SIA un-normalized probabilities match frequencies.

State values for non-terminals:

$V_{p} (Y) = 4 p + 1 - p = 3 p + 1$

$V_{p} (X) = (1 - p) V_{p} (Y) = (1 - p) (3 p + 1) = - 3 p^{2} + 2 p + 1$

Q values for non-terminals:

$Q_{p} (Y, E x i t) = 4$

$Q_{p} (Y, A d v a n c e) = 1$

$Q_{p} (X, E x i t) = 0$

$Q_{p} (X, A d v a n c e) = V_{p} (Y) = 3 p + 1$

For local optimality we compute partial derivatives.

$d_{p} ("", E x i t, A d v a n c e, Y) = Q_{p} (Y, E x i t) - Q_{p} (Y, A d v a n c e) = 3$

$d_{p} ("", E x i t, A d v a n c e, X) = (1 - p) * d_{p} ("", E x i t, A d v a n c e, Y) + Q_{p} (X, E x i t) - Q_{p} (X, A d v a n c e) = 3 (1 - p) - (3 p + 1) = 2 - 6 p$

By the post, this can be equivalently written:

$d_{p} ("", E x i t, A d v a n c e, X) = F_{p} (X) (Q_{p} (X, E x i t) - Q_{p} (X, A d v a n c e)) + F_{p} (Y) (Q_{p} (Y, E x i t) - Q_{p} (Y, A d v a n c e)) = - (3 p + 1) + (1 - p) (4 - 1) = - 3 p - 1 + 3 - 3 p = 2 - 6 p$

To optimize we set $2 - 6 p = 0$ i.e. $p = 1 / 3$ . Remember p is the probability of exit (sorry it’s reversed from your usage!). This matches what you computed as globally optimal.

I’m not going to do this for all of the examples… Is there a specific example where you think the theorem fails?

jessicata 17 May 2025 21:24 UTC
5 points
0
on: The absent-minded variations
I analyzed a general class of these problems here. Upshot: every optimal UDT solution is also an optimal CDT+SIA solution, but not vice versa.

jessicata 5 May 2025 5:08 UTC
LW: 11 AF: 5
0
AF
in reply to: orthonormal’s comment on: orthonormal’s Shortform
If some function g is computable in O(f(n)) time for primitive recursive f then g is primitive recursive, by simulating a Turing machine. I am pretty sure a logical inductor would satisfy; while it’s super exponential time, it’s not so fast-growing it’s not primitive recursive (like with the Ackerman function).

jessicata 21 Feb 2025 3:18 UTC
4 points
2
in reply to: Noosphere89’s comment on: How to Make Superbabies
Oh, to be clear I do think that AI safery automation is a well targeted x risk effort conditioned on the AI timelines you are presenting. (Related to Paul Christiano alignment ideas, which are important conditional on prosaic AI)

jessicata 20 Feb 2025 20:57 UTC
8 points
5
in reply to: Noosphere89’s comment on: How to Make Superbabies
On EV grounds, “2/3 chance it’s irrelevant because of AGI in the next 20 years” is not a huge contributor to the EV of this. Because, ok, maybe it reduces the EV by 3x compared to what it would otherwise have been. But there are much bigger than 3x factors that are relevant. Such as, probability of success, magnitude of success, cost effectiveness.

Then you can take the overall cost effectiveness estimate (by combining various factors including probability it’s irrelevant due to AGI being too soon) and compare it to other interventions. Here, you’re not offering a specific alternative that is expected to pay off in worlds with AGI in the next 20 years. So it’s unclear how “it might be irrelevant if AGI is in the next 20 years” is all that relevant as a consideration.

jessicata 19 Feb 2025 21:38 UTC
3 points
0
in reply to: |||||’s comment on: The Obliqueness Thesis
Wasn’t familiar. Seems similar in that facts/values are entangled. I was more familiar with Cuneo for that.

jessicata 11 Feb 2025 1:28 UTC
2 points
0
in reply to: Viliam’s comment on: “Self-Blackmail” and Alternatives

Dunno; gym membership also feels like a form of blackmail (although preferable to the alternative forms of blackmail), while home gym reduces the inconvenience of exercising.

I’m not sure what differentiates these in your mind. They both reduce the inconvenience of exercising, presumably? Also, in my post I’m pretty clear that it’s not meant as a punishment type incentive:

And it’s prudent to take into account the chance of not exercising in the future, making the investment useless: my advised decision process counts this as a negative, not a useful self-motivating punishment.

...

Generally, it seems like the problem is signaling. You buy the gym membership to signal your strong commitment to yourself. Then you feel good about sending a strong signal. And then the next day you feel just as lazy as previously, and the fact that you already paid for the membership probably feels bad.

That’s part of why I’m thinking an important step is checking whether one expects the action to happen if the initial steps are taken. If not then it’s less likely to be a good idea.

There is some positive function of the signaling / hyperstition, but it can lead people to be unnecessarily miscalibrated.

jessicata 10 Feb 2025 8:16 UTC
8 points
0
in reply to: Søren Elverlin’s comment on: “Self-Blackmail” and Alternatives
1. I was already paying attention to Ziz prior to this.
2. Ziz’s ideology is already influential. I’ve been having discussions about which parts are relatively correct or not correct. This is a part that seems relatively correct and I wanted to acknowledge that.
3. If engagement with Zizian philosophy is outlawed, then only outlaws have access to Zizian philosophy. Antimemes are a form of camouflage. If people refuse to see what is in front of them, people can coordinate crimes in plain sight. (Doesn’t apply so much to this post, more of a general statement)
4. The effect you’re pointing too seems very small if it even exists, in terms of causing negative effects.

jessicata 10 Feb 2025 7:52 UTC
2 points
0
in reply to: sapphire’s comment on: “Self-Blackmail” and Alternatives
Okay, I don’t think I was disagreeing except in cases of very light satisficer-type self-commitments. Maybe you didn’t intend to express disagreement with the post, idk.

jessicata 10 Feb 2025 7:37 UTC
1 point
0
in reply to: Mateusz Bagiński’s comment on: “Self-Blackmail” and Alternatives
So far I don’t see evidence that any LessWrong commentator has read the post or understood the main point.

jessicata 10 Feb 2025 1:38 UTC
2 points
0
in reply to: sapphire’s comment on: “Self-Blackmail” and Alternatives
Not disagreeing, but, I’m not sure what you are responding to? Is it something in the post?

“Self-Blackmail” and Alternatives

jessicata9 Feb 2025 23:20 UTC

19 points

12 comments7 min readLW link

(unstableontology.com)

jessicata 11 Jan 2025 21:59 UTC
8 points
4
in reply to: Lorec’s comment on: On Eating the Sun
We might disagree about the value of thinking about “we are all dead” timelines. To my mind, forecasting should be primarily descriptive, not normative; reality keeps going after we are all dead, and having realistic models of that is probably a useful input regarding what our degrees of freedom are. (I think people readily accept this in e.g. biology, where people can think about what happens to life after human extinction, or physics, where “all humans are dead” isn’t really a relevant category that changes how physics works.)

Of course, I’m not implying it’s useful for alignment to “see that the AI has already eaten the sun”, it’s about forecasting future timelines by defining thresholds and thinking about when they’re likely to happen and how they relate to other things.

(See this post, section “Models of ASI should start with realism”)

jessicata 11 Jan 2025 19:53 UTC
4 points
0
in reply to: Dmitry Vaintrob’s comment on: Adam Shai’s Shortform
I was trying to say things related to this:

In a more standard inference amortization setup one would e.g. train directly on question/answer pairs without the explicit reasoning path between the question and answer. In that way we pay an up-front cost during training to learn a “shortcut” between question and answers, and then we can use that pre-paid shortcut during inference. And we call that amortized inference.

Which sounds like supervised learning. Adam seemed to want to know how that relates to scaling up inference time compute so I said some ways they are related.

I don’t know much about amortized inference in general. The Goodman paper seems to be about saving compute by caching results between different queries. This could be applied to LLMs but I don’t know of it being applied. It seems like you and Adam like this “amortized inference” concept and I’m new to it so don’t have any relevant comments. (Yes I realize my name is on a paper talking about this but I actually didn’t remember the concept)

I don’t think I implied anything about o3 relating to parallel heuristics.

jessicata 10 Jan 2025 20:10 UTC
8 points
0
in reply to: sarahconstantin’s comment on: The AI Timelines Scam
I would totally agree they were directionally correct, I under-estimated AI progress. I think Paul Christiano got it about right.

I’m not sure I agree about the use of hyperbolic words being “correct” here; surely, “hyperbolic” contradicts the straightforward meaning of “correct”.

Partially the state I was in around 2017 was, there are lots of people around me saying “AGI in 20 years”, by which they mean a thing that shortly after FOOMs and eats the sun or something, and I thought this was wrong and a strange set of belief updates (which was not adequately justified, and where some discussions were suppressed because “maybe it shortens timelines”). And I stand by “no FOOM by 2037”.

The people I know these days who seem most thoughtful about the AI that’s around and where it might go (“LLM whisperer” / cyborgism cluster) tend to think “AGI already, or soon” plus “no FOOM, at least for a long time”. I think there is a bunch of semantic confusion around “AGI” that makes people’s beliefs less clear, with “AGI is what makes us $100 billion” as a hilarious example of “obviously economically/politically motivated narratives about what AGI is”.

So, I don’t see these people as validating “FOOM soon” even if they’re validating “AGI soon”, and the local rat-community thing I was objecting to was something that would imply “FOOM soon”. (Although, to be clear, I was still under-estimating AI progress.)

jessicata

“Self-Black­mail” and Alternatives

“Self-Blackmail” and Alternatives