The Singularity Institute Blog

  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
  • warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in /home1/kuehlebo/public_html/transhumanisme/modules/aggregator/ on line 259.
Syndicate content
Updated: 45 weeks 5 days ago

Decisions are for making bad outcomes inconsistent

Sat, 04/08/2017 - 00:02

Nate Soares’ recent decision theory paper with Ben Levinstein, “Cheating Death in Damascus,” prompted some valuable questions and comments from an acquaintance (anonymized here). I’ve put together edited excerpts from the commenter’s email below, with Nate’s responses.

The discussion concerns functional decision theory (FDT), a newly proposed alternative to causal decision theory (CDT) and evidential decision theory (EDT). Where EDT says “choose the most auspicious action” and CDT says “choose the action that has the best effects,” FDT says “choose the output of one’s decision algorithm that has the best effects across all instances of that algorithm.”

FDT usually behaves similarly to CDT. In a one-shot prisoner’s dilemma between two agents who know they are following FDT, however, FDT parts ways with CDT and prescribes cooperation, on the grounds that each agent runs the same decision-making procedure, and that therefore each agent is effectively choosing for both agents at once.1

Below, Nate provides some of his own perspective on why FDT generally achieves higher utility than CDT and EDT. Some of the stances he sketches out here are stronger than the assumptions needed to justify FDT, but should shed some light on why researchers at MIRI think FDT can help resolve a number of longstanding puzzles in the foundations of rational action.


Anonymous: This is great stuff! I’m behind on reading loads of papers and books for my research, but this came across my path and hooked me, which speaks highly of how interesting is the content and the sense that this paper is making progress.

My general take is that you are right that these kinds of problems need to be specified in more detail. However, my guess is that once you do so, game theorists would get the right answer. Perhaps that’s what FDT is: it’s an approach to clarifying ambiguous games that leads to a formalism where people like Pearl and myself can use our standard approaches to get the right answer.

I know there’s a lot of inertia in the “decision theory” language, so probably it doesn’t make sense to change. But if there were no such sunk costs, I would recommend a different framing. It’s not that people’s decision theories are wrong; it’s that they are unable to correctly formalize problems in which there are high-performance predictors. You show how to do that, using the idea of intervening on (i.e., choosing between putative outputs of) the algorithm, rather than intervening on actions. Everything else follows from a sufficiently precise and non-contradictory statement of the decision problem.

Probably the easiest move this line of work could make to ease this knee-jerk response of mine in defense of mainstream Bayesian game theory is to just be clear that CDT is not meant to capture mainstream Bayesian game theory. Rather, it is a model of one response to a class of problems not normally considered and for which existing approaches are ambiguous.

Nate Soares: I don’t take this view myself. My view is more like: When you add accurate predictors to the Rube Goldberg machine that is the universe — which can in fact be done — the future of that universe can be determined by the behavior of the algorithm being predicted. The algorithm that we put in the “thing-being-predicted” slot can do significantly better if its reasoning on the subject of which actions to output respects the universe’s downstream causal structure (which is something CDT and FDT do, but which EDT neglects), and it can do better again if its reasoning also respects the world’s global logical structure (which is done by FDT alone).

We don’t know exactly how to respect this wider class of dependencies in general yet, but we do know how to do it in many simple cases. While it agrees with modern decision theory and game theory in many simple situations, its prescriptions do seem to differ in non-trivial applications.

The main case where we can easily see that FDT is not just a better tool for formalizing game theorists’ traditional intuitions is in prisoner’s dilemmas. Game theory is pretty adamant about the fact that it’s rational to defect in a one-shot PD, whereas two FDT agents facing off in a one-shot PD will cooperate.

In particular, classical game theory employs a “common knowledge of shared rationality” assumption which, when you look closely at it, cashes out more or less as “common knowledge that all parties are using CDT and this axiom.” Game theory where common knowledge of shared rationality is defined to mean “common knowledge that all parties are using FDT and this axiom” gives substantially different results, such as cooperation in one-shot PDs.

A causal graph of Death in Damascus for CDT agents.2

Anonymous: When I’ve read MIRI work on CDT in the past, it seemed to me to describe what standard game theorists mean by rationality. But at least in cases like Murder Lesion, I don’t think it’s fair to say that standard game theorists would prescribe CDT. It might be better to say that standard game theory doesn’t consider these kinds of settings, and there are multiple ways of responding to them, CDT being one.

But I also suspect that many of these perfect prediction problems are internally inconsistent, and so it’s irrelevant what CDT would prescribe, since the problem cannot arise. That is, it’s not reasonable to say game theorists would recommend such-and-such in a certain problem, when the problem postulates that the actor always has incorrect expectations; “all agents have correct expectations” is a core property of most game-theoretic problems.

The Death in Damascus problem for CDT agents is a good example of this. In this problem, either Death will not find the CDT agent with certainty, or the CDT agent will never have correct beliefs about her own actions, or she will be unable to best respond to her own beliefs.

So the problem statement (“Death finds the agent with certainty”) rules out typical assumptions of a rational actor: that it has rational expectations (including about its own behavior), and that it can choose the preferred action in response to its beliefs. The agent can only have correct beliefs if she believes that she has such-and-such belief about which city she’ll end up in, but doesn’t select the action that is the best response to that belief.

Nate: I contest that last claim. The trouble is in the phrase “best response”, where you’re using CDT’s notion of what counts as the best response. According to FDT’s notion of “best response”, the best response to your beliefs in the Death in Damascus problem is to stay in Damascus, if we’re assuming it costs nonzero utility to make the trek to Aleppo.

In order to define what the best response to a problem is, we normally invoke a notion of counterfactuals — what are your available responses, and what do you think follows from them? But the question of how to set up those counterfactuals is the very point under contention.

So I’ll grant that if you define “best response” in terms of CDT’s counterfactuals, then Death in Damascus rules out the typical assumptions of a rational actor. If you use FDT’s counterfactuals (i.e., counterfactuals that respect the full range of subjunctive dependencies), however, then you get to keep all the usual assumptions of rational actors. We can say that FDT has the pre-theoretic advantage over CDT that it allows agents to exhibit sensible-seeming properties like these in a wider array of problems.

Anonymous: The presentation of the Death in Damascus problem for CDT feels weird to me. CDT might also just turn up an error, since one of its assumptions is violated by the problem. Or it might cycle through beliefs forever… The expected utility calculation here seems to give some credence to the possibility of dodging death, which is assumed to be impossible, so it doesn’t seem to me to correctly reason in a CDT way about where death will be.

For some reason I want to defend the CDT agent, and say that it’s not fair to say they wouldn’t realize that their strategy produces a contradiction (given the assumptions of rational belief and agency) in this problem.

Nate: There are a few different things to note here. First is that my inclination is always to evaluate CDT as an algorithm: if you built a machine that follows the CDT equation to the very letter, what would it do?

The answer here, as you’ve rightly noted above, is that the CDT equation isn’t necessarily defined when the input is a problem like Death in Damascus, and I agree that simple definitions of CDT yield algorithms that would either enter an infinite loop or crash. The third alternative is that the agent notices the difficulty and engages in some sort of reflective-equilibrium-finding procedure; variants of CDT with this sort of patch were invented more or less independently by Joyce and Arntzenius to do exactly that. In the paper, we discuss the variants that run an equilibrium-finding procedure and show that the equilibrium is still unsatisfactory; but we probably should have been more explicit about the fact that vanilla CDT either crashes or loops.

Second, I acknowledge that there’s still a strong intuition that an agent should in some sense be able to reflect on their own instability, look at the problem statement, and say, “Aha, I see what’s going on here; Death will find me no matter what I choose; I’d better find some other way to make the decision.” However, this sort of response is explicitly ruled out by the CDT equation: CDT says you must evaluate your actions as if they were subjunctively independent of everything that doesn’t causally depend on them.

In other words, you’re correct that CDT agents know intellectually that they cannot escape Death, but the CDT equation requires agents to imagine that they can, and to act on this basis.

And, to be clear, it is not a strike against an algorithm for it to prescribe making actions by reasoning about impossible scenarios — any deterministic algorithm attempting to reason about what it “should do” must imagine some impossibilities, because a deterministic algorithm has to reason about the consequences of doing lots of different things, but is in fact only going to do one thing.

The question at hand is which impossibilities are the right ones to imagine, and the claim is that in scenarios with accurate predictors, CDT prescribes imagining the wrong impossibilities, including impossibilities where it escapes Death.

Our human intuitions say that we should reflect on the problem statement and eventually realize that escaping Death is in some sense “too impossible to consider”. But this directly contradicts the advice of CDT. Following this intuition requires us to make our beliefs obey a logical-but-not-causal constraint in the problem statement (“Death is a perfect predictor”), which FDT agents can do but CDT agents can’t. On close examination, the “shouldn’t CDT realize this is wrong?” intuition turns out to be an argument for FDT in another guise. (Indeed, pursuing this intuition is part of how FDT’s predecessors were discovered!)

Third, I’ll note it’s an important virtue in general for decision theories to be able to reason correctly in the face of apparent inconsistency. Consider the following simple example:

An agent has a choice between taking $1 or taking $100. There is an extraordinarily tiny but nonzero probability that a cosmic ray will spontaneously strike the agent’s brain in such a way that they will be caused to do the opposite of whichever action they would normally do. If they learn that they have been struck by a cosmic ray, then they will also need to visit the emergency room to ensure there’s no lasting brain damage, at a cost of $1000. Furthermore, the agent knows that they take the $100 if and only if they are hit by the cosmic ray.

When faced with this problem, EDT agents reason: “If I take the $100, then I must have been hit by the cosmic ray, which means that I lose $900 on net. I therefore prefer the $1.” They then take the $1 (except in cases where they have been hit by the cosmic ray).

Since this is just what the problem statement says — “the agent knows that they take the $100 if and only if they are hit by the cosmic ray” — the problem is perfectly consistent, as is EDT’s response to the problem. EDT only cares about correlation, not dependency; so EDT agents are perfectly happy to buy into self-fulfilling prophecies, even when it means turning their backs on large sums of money.

What happens when we try to pull this trick on a CDT agent? She says, “Like hell I only take the $100 if I’m hit by the cosmic ray!” and grabs the $100 — thus revealing your problem statement to be inconsistent if the agent runs CDT as opposed to EDT.

The claim that “the agent knows that they take the $100 if and only if they are hit by the cosmic ray” contradicts the definition of CDT, which requires that agents of CDT refuse to leave free money on the table. As you may verify, FDT also renders the problem statement inconsistent, for similar reasons. The definition of EDT, on the other hand, is fully consistent with the problem as stated.

This means that if you try to put EDT into the above situation — controlling its behavior by telling it specific facts about itself — you will succeed; whereas if you try to put CDT into the above situation, you will fail, and the supposed facts will be revealed as lies. Whether or not the above problem statement is consistent depends on the algorithm that the agent runs, and the design of the algorithm controls the degree to which you can put that algorithm in bad situations.

We can think of this as a case of FDT and CDT succeeding in making a low-utility universe impossible, where EDT fails to make a low-utility universe impossible. The whole point of implementing a decision theory on a piece of hardware and running it is to make bad futures-of-our-universe impossible (or at least very unlikely). It’s a feature of a decision theory, and not a bug, for there to be some problems where one tries to describe a low-utility state of affairs and the decision theory says, “I’m sorry, but if you run me in that problem, your problem will be revealed as inconsistent”.3

This doesn’t contradict anything you’ve said; I say it only to highlight how little we can conclude from noticing that an agent is reasoning about an inconsistent state of affairs. Reasoning about impossibilities is the mechanism by which decision theories produce actions that force the outcome to be desirable, so we can’t conclude that an agent has been placed in an unfair situation from the fact that the agent is forced to reason about an impossibility.

A causal graph of the XOR blackmail problem for CDT agents.4

Anonymous: Something still seems fishy to me about decision problems that assume perfect predictors. If I’m being predicted with 100% accuracy in the XOR blackmail problem, then this means that I can induce a contradiction. If I follow FDT and CDT’s recommendation of never paying, then I only receive a letter when I have termites. But if I pay, then I must be in the world where I don’t have termites, as otherwise there is a contradiction.

So it seems that I am able to intervene on the world in a way that changes the state of termites for me now, given that I’ve received a letter. That is, the best strategy when starting is to never pay, but the best strategy given that I will receive a letter is to pay. The weirdness arises because I’m able to intervene on the algorithm, but we are conditioning on a fact of the world that depends on my algorithm.

Not sure if this confusion makes sense to you. My gut says that these kinds of problems are often self-contradicting, at least when we assert 100% predictive performance. I would prefer to work it out from the ex ante situation, with specified probabilities of getting termites, and see if it is the case that changing one’s strategy (at the algorithm level) is possible without changing the probability of termites to maintain consistency of the prediction claim.

Nate: First, I’ll note that the problem goes through fine if the prediction is only correct 99% of the time. If the difference between “cost of termites” and “cost of paying” is sufficient high, then the problem can probably go through even if the predictor is only correct 51% of the time.

That said, the point of this example is to draw attention to some of the issues you’re raising here, and I think that these issues are just easier to think about when we assume 100% predictive accuracy.

The claim I dispute is this one: “That is, the best strategy when starting is to never pay, but the best strategy given that I will receive a letter is to pay.” I claim that the best strategy given that you receive the letter is to not pay, because whether you pay has no effect on whether or not you have termites. Whenever you pay, no matter what you’ve learned, you’re basically just burning $1000.

That said, you’re completely right that these decision problems have some inconsistent branches, though I claim that this is true of any decision problem. In a deterministic universe with deterministic agents, all “possible actions” the agent “could take” save one are not going to be taken, and thus all “possibilities” save one are in fact inconsistent given a sufficiently full formal specification.

I also completely endorse the claim that this set-up allows the predicted agent to induce a contradiction. Indeed, I claim that all decision-making power comes from the ability to induce contradictions: the whole reason to write an algorithm that loops over actions, constructs models of outcomes that would follow from those actions, and outputs the action corresponding to the highest-ranked outcome is so that it is contradictory for the algorithm to output a suboptimal action.

This is what computer programs are all about. You write the code in such a fashion that the only non-contradictory way for the electricity to flow through the transistors is in the way that makes your computer do your tax returns, or whatever.

In the case of the XOR blackmail problem, there are four “possible” worlds: LT (letter + termites), NT (noletter + termites), LN (letter + notermites), and NN (noletter + notermites).

The predictor, by dint of their accuracy, has put the universe into a state where the only consistent possibilities are either (LT, NN) or (LN, NT). You get to choose which of those pairs is consistent and which is contradictory. Clearly, you don’t have control over the probability of termites vs. notermites, so you’re only controlling whether you get the letter. Thus, the question is whether you’re willing to pay $1000 to make sure that the letter shows up only in the worlds where you don’t have termites.

Even when you’re holding the letter in your hands, I claim that you should not say “if I pay I will have no termites”, because that is false — your action can’t affect whether you have termites. You should instead say:

I see two possibilities here. If my algorithm outputs pay, then in the XX% of worlds where I have termites I get no letter and lose $1M, and in the (100-XX)% of worlds where I do not have termites I lose $1k. If instead my algorithm outputs refuse, then in the XX% of worlds where I have termites I get this letter but only lose $1M, and in the other worlds I lose nothing. The latter mixture is preferable, so I do not pay.

You’ll notice that the agent in this line of reasoning is not updating on the fact that they’re holding the letter. They’re not saying, “Given that I know that I received the letter and that the universe is consistent…”

One way to think about this is to imagine the agent as not yet being sure whether or not they’re in a contradictory universe. They act like this might be a world in which they don’t have termites, and they received the letter; and in those worlds, by refusing to pay, they make the world they inhabit inconsistent — and thereby make this very scenario never-have-existed.

And this is correct reasoning! For when the predictor makes their prediction, they’ll visualize a scenario where the agent has no termites and receives the letter, in order to figure out what the agent would do. When the predictor observes that the agent would make that universe contradictory (by refusing to pay), they are bound (by their own commitments, and by their accuracy as a predictor) to send the letter only when you have termites.5

You’ll never find yourself in a contradictory situation in the real world, but when an accurate predictor is trying to figure out what you’ll do, they don’t yet know which situations are contradictory. They’ll therefore imagine you in situations that may or may not turn out to be contradictory (like “letter + notermites”). Whether or not you would force the contradiction in those cases determines how the predictor will behave towards you in fact.

The real world is never contradictory, but predictions about you can certainly place you in contradictory hypotheticals. In cases where you want to force a certain hypothetical world to imply a contradiction, you have to be the sort of person who would force the contradiction if given the opportunity.

Or as I like to say — forcing the contradiction never works, but it always would’ve worked, which is sufficient.

Anonymous: The FDT algorithm is best ex ante. But if what you care about is your utility in your own life flowing after you, and not that of other instantiations, then upon hearing this news about FDT you should do whatever is best for you given that information and your beliefs, as per CDT.

A causal graph of Newcomb’s problem for FDT agents.6

Nate: If you have the ability to commit yourself to future behaviors (and actually stick to that), it’s clearly in your interest to commit now to behaving like FDT on all decision problems that begin in your future. I, for instance, have made this commitment myself. I’ve also made stronger commitments about decision problems that began in my past, but all CDT agents should agree in principle on problems that begin in the future.7

I do believe that real-world people like you and me can actually follow FDT’s prescriptions, even in cases where those prescriptions are quite counter-intuitive.

Consider a variant of Newcomb’s problem where both boxes are transparent, so that you can already see whether box B is full before choosing whether to two-box. In this case, EDT joins CDT in two-boxing, because one-boxing can no longer serve to give the agent good news about its fortunes. But FDT agents still one-box, for the same reason they one-box in Newcomb’s original problem and cooperate in the prisoner’s dilemma: they imagine their algorithm controlling all instances of their decision procedure, including the past copy in the mind of their predictor.

Now, let’s suppose that you’re standing in front of two full boxes in the transparent Newcomb problem. You might say to yourself, “I wish I could have committed beforehand, but now that the choice is before me, the tug of the extra $1000 is just too strong”, and then decide that you were not actually capable of making binding precommitments. This is fine; the normatively correct decision theory might not be something that all human beings have the willpower to follow in real life, just as the correct moral theory could turn out to be something that some people lack the will to follow.8

That said, I believe that I’m quite capable of just acting like I committed to act. I don’t feel a need to go through any particular mental ritual in order to feel comfortable one-boxing. I can just decide to one-box and let the matter rest there.

I want to be the kind of agent that sees two full boxes, so that I can walk away rich. I care more about doing what works, and about achieving practical real-world goals, than I care about the intuitiveness of my local decisions. And in this decision problem, FDT agents are the only agents that walk away rich.

One way of making sense of this kind of reasoning is that evolution graced me with a “just do what you promised to do” module. The same style of reasoning that allows me to actually follow through and one-box in Newcomb’s problem is the one that allows me to cooperate in prisoner’s dilemmas against myself — including dilemmas like “should I stick to my New Year’s resolution?”9 I claim that it was only misguided CDT philosophers that argued (wrongly) that “rational” agents aren’t allowed to use that evolution-given “just follow through with your promises” module.

Anonymous: A final point: I don’t know about counterlogicals, but a theory of functional similarity would seem to depend on the details of the algorithms.

E.g., we could have a model where their output is stochastic, but some parameters of that process are the same (such as expected value), and the action is stochastically drawn from some distribution with those parameter values. We could have a version of that, but where the parameter values depend on private information picked up since the algorithms split, in which case each agent would have to model the distribution of private info the other might have.

That seems pretty general; does that work? Is there a class of functional similarity that can not be expressed using that formulation?

Nate: As long as the underlying distribution can be an arbitrary Turing machine, I think that’s sufficiently general.

There are actually a few non-obvious technical hurdles here; namely, if agent A is basing their beliefs off of their model of agent B, who is basing their beliefs off of a model of agent A, then you can get some strange loops.

Consider for example the matching pennies problem: agent A and agent B will each place a penny on a table; agent A wants either HH or TT, and agent B wants either HT or TH. It’s non-trivial to ensure that both agents develop stable accurate beliefs in games like this (as opposed to, e.g., diving into infinite loops).

The technical solution to this is reflective oracle machines, a class of probabilistic Turing machines with access to an oracle that can probabilistically answer questions about any other machine in the class (with access to the same oracle).

The paper “Reflective Oracles: A Foundation for Classical Game Theory” shows how to do this and shows that the relevant fixed points always exist. (And furthermore, in cases that can be represented in classical game theory, the fixed points always correspond to the mixed-strategy Nash equilibria.)

This more or less lets us start from a place of saying “how do agents with probabilistic information about each other’s source code come to stable beliefs about each other?” and gets us to the “common knowledge of rationality” axiom from game theory.10 One can also see it as a justification for that axiom, or as a generalization of that axiom that works even in cases where the lines between agent and environment get blurry, or as a hint at what we should do in cases where one agent has significantly more computational resources than the other, etc.

But, yes, when we study these kinds of problems concretely at MIRI, we tend to use models where each agent models the other as a probabilistic Turing machine, which seems roughly in line with what you’re suggesting here.


  1. CDT prescribes defection in this dilemma, on the grounds that one’s action cannot cause the other agent to cooperate. FDT outperforms CDT in Newcomblike dilemmas like these, while also outperforming EDT in other dilemmas, such as the smoking lesion problem and XOR blackmail.
  2. The agent’s predisposition determines whether they will flee to Aleppo or stay in Damascus, and also determines Death’s prediction about their decision. This allows Death to inescapably pursue the agent, making flight pointless; but CDT agents can’t incorporate this fact into their decision-making.
  3. There are some fairly natural ways to cash out Murder Lesion where CDT accepts the problem and FDT forces a contradiction, but we decided not to delve into that interpretation in the paper.

    Tangentially, I’ll note that one of the most common defenses of CDT similarly turns on the idea that certain dilemmas are “unfair” to CDT. Compare, for example, David Lewis’ “Why Ain’cha Rich?

    It’s obviously possible to define decision problems that are “unfair” in the sense that they just reward or punish agents for having a certain decision theory. We can imagine a dilemma where a predictor simply guesses whether you’re implementing FDT, and gives you $1,000,000 if so. Since we can construct symmetric dilemmas that instead reward CDT agents, EDT agents, etc., these dilemmas aren’t very interesting, and can’t help us choose between theories.

    Dilemmas like Newcomb’s problem and Death in Damascus, however, don’t evaluate agents based on their decision theories. They evaluate agents based on their actions, and the task of the decision theory is to determine which action is best. If it’s unfair to criticize CDT for making the wrong choice in problems like this, then it’s hard to see on what grounds we can criticize any agent for making a wrong choice in any problem, since one can always claim that one is merely at the mercy of one’s decision theory.

  4. Our paper describes the XOR blackmail problem like so:

    An agent has been alerted to a rumor that her house has a terrible termite infestation, which would cost her $1,000,000 in damages. She doesn’t know whether this rumor is true. A greedy and accurate predictor with a strong reputation for honesty has learned whether or not it’s true, and drafts a letter:

    “I know whether or not you have termites, and I have sent you this letter iff exactly one of the following is true: (i) the rumor is false, and you are going to pay me $1,000 upon receiving this letter; or (ii) the rumor is true, and you will not pay me upon receiving this letter.”

    The predictor then predicts what the agent would do upon receiving the letter, and sends the agent the letter iff exactly one of (i) or (ii) is true. Thus, the claim made by the letter is true. Assume the agent receives the letter. Should she pay up?

    In this scenario, EDT pays the blackmailer, while CDT and FDT refuse to pay. See the “Cheating Death in Damascus” paper for more details.

  5. Ben Levinstein notes that this can be compared to backward induction in game theory with common knowledge of rationality. You suppose you’re at some final decision node which you only would have gotten to (as it turns out) if the players weren’t actually rational to begin with.
  6. FDT agents intervene on their decision function, “FDT(P,G)”. The CDT version replaces this node with “Predisposition” and instead intervenes on “Act”.
  7. Specifically, the CDT-endorsed response here is: “Well, I’ll commit to acting like an FDT agent on future problems, but in one-shot prisoner’s dilemmas that began in my past, I’ll still defect against copies of myself”.

    The problem with this response is that it can cost you arbitrary amounts of utility, provided a clever blackmailer wishes to take advantage. Consider the retrocausal blackmail dilemma in “Toward Idealized Decision Theory“:

    There is a wealthy intelligent system and an honest AI researcher with access to the agent’s original source code. The researcher may deploy a virus that will cause $150 million each in damages to both the AI system and the researcher, and which may only be deactivated if the agent pays the researcher $100 million. The researcher is risk-averse and only deploys the virus upon becoming confident that the agent will pay up. The agent knows the situation and has an opportunity to self-modify after the researcher acquires its original source code but before the researcher decides whether or not to deploy the virus. (The researcher knows this, and has to factor this into their prediction.)

    CDT pays the retrocausal blackmailer, even if it has the opportunity to precommit to do otherwise. FDT (which in any case has no need for precommitment mechanisms) refuses to pay. I cite the intuitive undesirability of this outcome to argue that one should follow FDT in full generality, as opposed to following CDT’s prescription that one should only behave in FDT-like ways in future dilemmas.

    The argument above must be made from a pre-theoretic vantage point, because CDT is internally consistent. There is no argument one could give to a true CDT agent that would cause it to want to use anything other than CDT in decision problems that began in its past.

    If examples like retrocausal blackmail have force (over and above the force of other arguments for FDT), it is because humans aren’t genuine CDT agents. We may come to endorse CDT based on its theoretical and practical virtues, but the case for CDT is defeasible if we discover sufficiently serious flaws in CDT, where “flaws” are evaluated relative to more elementary intuitions about which actions are good or bad. FDT’s advantages over CDT and EDT — properties like its greater theoretical simplicity and generality, and its achievement of greater utility in standard dilemmas — carry intuitive weight from a position of uncertainty about which decision theory is correct.

  8. In principle, it could even turn out that following the prescriptions of the correct decision theory in full generality is humanly impossible. There’s no law of logic saying that the normatively correct decision-making behaviors have to be compatible with arbitrary brain designs (including human brain design). I wouldn’t bet on this, but in such a case learning the correct theory would still have practical import, since we could still build AI systems to follow the normatively correct theory.
  9. A New Year’s resolution that requires me to repeatedly follow through on a promise that I care about in the long run, but would prefer to ignore in the moment, can be modeled as a one-shot twin prisoner’s dilemma. In this case, the dilemma is temporally extended, and my “twins” are my own future selves, who I know reason more or less the same way I do.

    It’s conceivable that I could go off my diet today (“defect”) and have my future selves pick up the slack for me and stick to the diet (“cooperate”), but in practice if I’m the kind of agent who isn’t willing today to sacrifice short-term comfort for long-term well-being, then I presumably won’t be that kind of agent tomorrow either, or the day after.

    Seeing that this is so, and lacking a way to force themselves or their future selves to follow through, CDT agents despair of promise-keeping and abandon their resolutions. FDT agents, seeing the same set of facts, do just the opposite: they resolve to cooperate today, knowing that their future selves will reason symmetrically and do the same.

  10. The paper above shows how to use reflective oracles with CDT as opposed to FDT, because (a) one battle at a time and (b) we don’t yet have a generic algorithm for computing logical counterfactuals, but we do have a generic algorithm for doing CDT-type reasoning.

The post Decisions are for making bad outcomes inconsistent appeared first on Machine Intelligence Research Institute.

April 2017 Newsletter

Thu, 04/06/2017 - 18:59

Our newest publication, “Cheating Death in Damascus,” makes the case for functional decision theory, our general framework for thinking about rational choice and counterfactual reasoning.

In other news, our research team is expanding! Sam Eisenstat and Marcello Herreshoff, both previously at Google, join MIRI this month.

Research updates

General updates

News and links

The post April 2017 Newsletter appeared first on Machine Intelligence Research Institute.

Two new researchers join MIRI

Sat, 04/01/2017 - 03:46

MIRI’s research team is growing! I’m happy to announce that we’ve hired two new research fellows to contribute to our work on AI alignment: Sam Eisenstat and Marcello Herreshoff, both from Google.


Sam Eisenstat studied pure mathematics at the University of Waterloo, where he carried out research in mathematical logic. His previous work was on the automatic construction of deep learning models at Google.

Sam’s research focus is on questions relating to the foundations of reasoning and agency, and he is especially interested in exploring analogies between current theories of logical uncertainty and Bayesian reasoning. He has also done work on decision theory and counterfactuals. His past work with MIRI includes “Asymptotic Decision Theory,” “A Limit-Computable, Self-Reflective Distribution,” and “A Counterexample to an Informal Conjecture on Proof Length and Logical Counterfactuals.”


Marcello Herreshoff studied at Stanford, receiving a B.S. in Mathematics with Honors and getting two honorable mentions in the Putnam Competition, the world’s most highly regarded university-level math competition. Marcello then spent five years as a software engineer at Google, gaining a background in machine learning.

Marcello is one of MIRI’s earliest research collaborators, and attended our very first research workshop alongside Eliezer Yudkowsky, Paul Christiano, and Mihály Bárász. Marcello has worked with us in the past to help produce results such as “Program Equilibrium in the Prisoner’s Dilemma via Löb’s Theorem,” “Definability of Truth in Probabilistic Logic,” and “Tiling Agents for Self-Modifying AI.” His research interests include logical uncertainty and the design of reflective agents.


Sam and Marcello will be starting with us in the first two weeks of April. This marks the beginning of our first wave of new research fellowships since 2015, though we more recently added Ryan Carey to the team on an assistant research fellowship (in mid-2016).

We have additional plans to expand our research team in the coming months, and will soon be hiring for a more diverse set of technical roles at MIRI — details forthcoming!

The post Two new researchers join MIRI appeared first on Machine Intelligence Research Institute.

2016 in review

Wed, 03/29/2017 - 02:27

It’s time again for my annual review of MIRI’s activities.1 In this post I’ll provide a summary of what we did in 2016, see how our activities compare to our previously stated goals and predictions, and reflect on how our strategy this past year fits into our mission as an organization. We’ll be following this post up in April with a strategic update for 2017.

After doubling the size of the research team in 2015,2 we slowed our growth in 2016 and focused on integrating the new additions into our team, making research progress, and writing up a backlog of existing results.

2016 was a big year for us on the research front, with our new researchers making some of the most notable contributions. Our biggest news was Scott Garrabrant’s logical inductors framework, which represents by a significant margin our largest progress to date on the problem of logical uncertainty. We additionally released “Alignment for Advanced Machine Learning Systems” (AAMLS), a new technical agenda spearheaded by Jessica Taylor.

We also spent this last year engaging more heavily with the wider AI community, e.g., through the month-long Colloquium Series on Robust and Beneficial Artificial Intelligence we co-ran with the Future of Humanity Institute, and through talks and participation in panels at many events through the year.


2016 Research Progress

We saw significant progress this year in our agent foundations agenda, including Scott Garrabrant’s logical inductor formalism (which represents possibly our most significant technical result to date) and related developments in Vingean reflection. At the same time, we saw relatively little progress in error tolerance and value specification, which we had planned to put more focus on in 2016. Below, I’ll note the highlights from each of our research areas:

Logical Uncertainty and Naturalized Induction
  • 2015 progress: sizable. (Predicted: modest.)
  • 2016 progress: sizable. (Predicted: sizable.)

We saw a large body of results related to logical induction. Logical induction developed out of earlier work led by Scott Garrabrant in late 2015 (written up in April 2016) that served to divide the problem of logical uncertainty into two subproblems. Scott demonstrated that both problems could be solved at once using an algorithm that satisfies a highly general “logical induction criterion.”

This criterion provides a simple way of understanding idealized reasoning under resource limitations. In Andrew Critch’s words, logical induction is “a financial solution to the computer science problem of metamathematics”: a procedure that assigns reasonable probabilities to arbitrary (empirical, logical, mathematical, self-referential, etc.) sentences in a way that outpaces deduction, explained by analogy to inexploitable stock markets.

Our other main 2016 work in this domain is an independent line of research spearheaded by MIRI research associate Vadim Kosoy, “Optimal Polynomial-Time Estimators: A Bayesian Notion of Approximation Algorithm.” Vadim approaches the problem of logical uncertainty from a more complexity-theoretic angle of attack than logical induction does, providing a formalism for defining optimal feasible approximations of computationally infeasible objects that retain a number of relevant properties of those objects.

Decision Theory
  • 2015 progress: modest. (Predicted: modest.)
  • 2016 progress: modest. (Predicted: modest.)

We continue to see a steady stream of interesting results related to the problem of defining logical counterfactuals. In 2016, we began applying the logical inductor framework to decision-theoretic problems, working with the idea of universal inductors. Andrew Critch also developed a game-theoretic method for resolving policy disagreements that outperforms standard compromise approaches and also allows for negotiators to disagree on factual questions.

We have a backlog of many results to write up in this space. Our newest, “Cheating Death in Damascus,” summarizes the case for functional decision theory, a theory that systematically outperforms the conventional academic views (causal and evidential decision theory) in decision theory and game theory. This is the basic framework we use for studying logical counterfactuals and related open problems, and is a good introductory paper for understanding our other work in this space.

For an overview of our more recent work on this topic, see Tsvi Benson-Tilsen’s decision theory index on the research forum.

Vingean Reflection
  • 2015 progress: modest. (Predicted: modest.)
  • 2016 progress: modest-to-strong. (Predicted: limited.)

Our main results in reflective reasoning last year concerned self-trust in logical inductors. After seeing no major advances in Vingean reflection for many years—the last big step forward was perhaps Benya Fallenstein’s model polymorphism proposal in late 2012—we had planned to de-prioritize work on this problem in 2016, on the assumption that other tools were needed before we could make much more headway. However, in 2016 logical induction turned out to be surprisingly useful for solving a number of outstanding tiling problems.

As described in “Logical Induction,” logical inductors provide a simple demonstration of self-referential reasoning that is highly general and accurate, is free of paradox, and assigns reasonable credence to the reasoner’s own beliefs. This provides some evidence that the problem of logical uncertainty itself is relatively central to a number of puzzles concerning the theoretical foundations of intelligence.

Error Tolerance
  • 2015 progress: limited. (Predicted: modest.)
  • 2016 progress: limited. (Predicted: modest.)

2016 saw the release of our “Alignment for Advanced ML Systems” research agenda, with a focus on error tolerance and value specification. Less progress occurred in these areas than expected, partly because investigations here are still very preliminary. We also spent less time on research in mid-to-late 2016 overall than we had planned, in part because we spent a lot of time writing up our new results and research proposals.

Nate noted in our October AMA that he considers this time investment in drafting write-ups one of our main 2016 errors, and we plan to spend less time on paper-writing in 2017.

Our 2016 work on error tolerance included “Two Problems with Causal-Counterfactual Utility Indifference” and some time we spent discussing and critiquing Dylan Hadfield-Menell’s proposal of corrigibility via CIRL. We plan to share our thoughts on the latter line of research more widely later this year.

Value Specification
  • 2015 progress: limited. (Predicted: limited.)
  • 2016 progress: weak-to-modest. (Predicted: modest.)

Although we planned to put more focus on value specification last year, we ended up making less progress than expected. Examples of our work in this area include Jessica Taylor and Ryan Carey’s posts on online learning, and Jessica’s analysis of how errors might propagate within a system of humans consulting one another.


We’re extremely pleased with our progress on the agent foundations agenda over the last year, and we’re hoping to see more progress cascading from the new set of tools we’ve developed. At the same time, it remains to be seen how tractable the new set of problems we’re tackling in the AAMLS agenda are.


2016 Research Support Activities

In September, we brought on Ryan Carey to support Jessica’s work on the AAMLS agenda as an assistant research fellow.3 Our assistant research fellowship program seems to be working out well; Ryan has been a lot of help to us in working with Jessica to write up results (e.g., “Bias-Detecting Online Learners”), along with setting up TensorFlow tools for a project with Patrick LaVictoire.

We’ll likely be expanding the program this year and bringing on additional assistant research fellows, in addition to a slate of new research fellows.

Focusing on other activities that relate relatively directly to our technical research program, including collaborating and syncing up with researchers in industry and academia, in 2016 we:

On the whole, our research team growth in 2016 was somewhat slower than expected. We’re still accepting applicants for our type theorist position (and for other research roles at MIRI, via our Get Involved page), but we expect to leave that role unfilled for at least the next 6 months while we focus on onboarding additional core researchers.4


2016 General Activities

Also in 2016, we:


2016 Fundraising

2016 was a strong year in MIRI’s fundraising efforts. We raised a total of $2,285,200, a 44% increase on the $1,584,109 raised in 2015. This increase was largely driven by:

  • A general grant of $500,000 from the Open Philanthropy Project.5
  • A donation of $300,000 from Blake Borgeson.
  • Contributions of $93,548 from Raising for Effective Giving.6
  • A research grant of $83,309 from the Future of Life Institute.7
  • Our community’s strong turnout during our Fall Fundraiser—at $595,947, our second-largest fundraiser to date.
  • A gratifying show of support from supporters at the end of the year, despite our not running a Winter Fundraiser.

Assuming we can sustain this funding level going forward, this represents a preliminary fulfillment of our primary fundraising goal from January 2016:

Our next big push will be to close the gap between our new budget and our annual revenue. In order to sustain our current growth plans — which are aimed at expanding to a team of approximately ten full-time researchers — we’ll need to begin consistently taking in close to $2M per year by mid-2017.

As the graph below indicates, 2016 continued a positive trend of growth in our fundraising efforts.

Drawing conclusions from these year-by-year comparisons can be a little tricky. MIRI underwent significant organizational changes over this time span, particularly in 2013. We also switched to accrual-based accounting in 2014, which also complicates comparisons with previous years.

However, it is possible to highlight certain aspects of our progress in 2016:

  • The Fall Fundraiser: For the first time, we held a single fundraiser in 2016 instead of our “traditional” summer and winter fundraisers—from mid-September to October 31. While we didn’t hit our initial target of $750k, we hoped that our funders were waiting to give later in the year and would make up the shortfall at the end of year. We were pleased that they came through in large numbers at the end of 2016, some possibly motivated by public posts by members of the community.8 All told, we received more contributions in December 2016 (~$430,000) than in the same month in either of the previous two years, when we actively ran Winter Fundraisers, an interesting data point for us. The following charts throw additional light on our supporters’ response to the fall fundraiser:

    Note that if we remove the Open Philanthropy Project’s grant from the Pre-Fall data, the ratios across the 4 time segments all look pretty similar. Overall, this data is suggestive that, rather than a group of new funders coming in at the last moment, a segment of our existing funders chose to wait until the end of the year to donate.
  • In 2016 the remarkable support we received from returning funders was particularly noteworthy, with 89% retention (in terms of dollars) from 2015 funders. To put this in a broader context, the average gift retention rate across a representative segment of the US philanthropic space over the last 5 years has been 46%.
  • The number of unique funders to MIRI rose 16% in 2016—from 491 to 571—continuing a general increasing trend. 2014 is anomalously high on this graph due to the community’s active participation in our memorable SVGives campaign.9
  • International support continues to make up about 20% of contributions. Unlike in the US, where increases were driven mainly by new institutional support (the Open Philanthropy Project), international support growth was driven by individuals across Europe (notably Scandinavia and the UK), Australia, and Canada.
  • Use of employer matching programs increased by 17% year-on-year, with contributions of over $180,000 received through corporate matching programs in 2016, our highest to date. There are early signs of this growth continuing through 2017.
  • An analysis of contributions made from small, mid-sized, large, and very large funder segments shows contributions from all four segments increased proportionally from 2015:

Due to the fact that we raised more than $2 million in 2016, we are now required by California law to prepare an annual financial statement audited by an independent certified public accountant (CPA). That report, like our financial reports of past years, will be made available by the end of September, on our transparency and financials page.


Going Forward

As of July 2016, we had the following outstanding goals from mid-2015:

  1. Accelerated growth: “expand to a roughly ten-person core research team.” (source)
  2. Type theory in type theory project: “hire one or two type theorists to work on developing relevant tools full-time.” (source)
  3. Independent review: “We’re also looking into options for directly soliciting public feedback from independent researchers regarding our research agenda and early results.” (source)

We currently have seven research fellows and assistant fellows, and are planning to hire several more in the very near future. We expect to hit our ten-fellow goal in the next 3–4 months, and to continue to grow the research team later this year. As noted above, we’re delaying moving forward on a type theorist hire.

The Open Philanthropy Project is currently reviewing our research agenda as part of their process of evaluating us for future grants. They released an initial big-picture organizational review of MIRI in September, accompanied by reviews of several recent MIRI papers (which Nate responded to here). These reviews were generally quite critical of our work, with Open Phil expressing a number of reservations about our agent foundations agenda and our technical progress to date. We are optimistic, however, that we will be able to better make our case to Open Phil in discussions going forward, and generally converge more in our views of what open problems deserve the most attention.

In our August 2016 strategic update, Nate outlined our other organizational priorities and plans:

  1. Technical research: continue work on our agent foundations agenda while kicking off work on AAMLS.
  2. AGI alignment overviews: “Eliezer Yudkowsky and I will be splitting our time between working on these problems and doing expository writing. Eliezer is writing about alignment theory, while I’ll be writing about MIRI strategy and forecasting questions.”
  3. Academic outreach events: “To help promote our approach and grow the field, we intend to host more workshops aimed at diverse academic audiences. We’ll be hosting a machine learning workshop in the near future, and might run more events like CSRBAI going forward.”
  4. Paper-writing: “We also have a backlog of past technical results to write up, which we expect to be valuable for engaging more researchers in computer science, economics, mathematical logic, decision theory, and other areas.”

All of these are still priorities for us, though we now consider 5 somewhat more important (and 6 and 7 less important). We’ve since run three ML workshops, and have made more headway on our AAMLS research agenda. We now have a large amount of content prepared for our AGI alignment overviews, and are beginning a (likely rather long) editing process. We’ve also released “Logical Induction” and have a number of other papers in the pipeline.

We’ll be providing more details on how our priorities have changed since August in a strategic update post next month. As in past years, object-level technical research on the AI alignment problem will continue to be our top priority, although we’ll be undergoing a medium-sized shift in our research priorities and outreach plans.10


  1. See our previous reviews: 2015, 2014, 2013.
  2. From 2015 in review: “Patrick LaVictoire joined in March, Jessica Taylor in August, Andrew Critch in September, and Scott Garrabrant in December. With Nate transitioning to a non-research role, overall we grew from a three-person research team (Eliezer, Benya, and Nate) to a six-person team.”
  3. As I noted in our AMA: “At MIRI, research fellow is a full-time permanent position. A decent analogy in academia might be that research fellows are to assistant research fellows as full-time faculty are to post-docs. Assistant research fellowships are intended to be a more junior position with a fixed 1–2 year term.”
  4. In the interim, our research intern Jack Gallagher has continued to make useful contributions in this domain.
  5. Note that numbers in this section might not exactly match previously published estimates, since small corrections are often made to contributions data. Note also that these numbers do not include in-kind donations.
  6. This figure only counts direct contributions through REG to MIRI. REG/EAF’s support for MIRI is closer to $150,000 when accounting for contributions made through EAF, many made on REG’s advice.
  7. We were also awarded a $75,000 grant from the Center for Long-Term Cybersecurity to pursue a corrigibility project with Stuart Russell and a new UC Berkeley postdoc, but we weren’t able to fill the intended postdoc position in the relevant timeframe and the project was canceled. Stuart Russell subsequently received a large grant from the Open Philanthropy Project to launch a new academic research institute for studying corrigibility and other AI safety issues, the Center for Human-Compatible AI.
  8. We received timely donor recommendations from investment analyst Ben Hoskin, Future of Humanity Institute researcher Owen Cotton-Barratt, and Daniel Dewey and Nick Beckstead of the Open Philanthropy Project (echoed by 80,000 Hours).
  9. Our 45% retention of unique funders from 2015 is very much in line with funder retention across the US philanthropic space, which combined with the previous point, suggests returning MIRI funders were significantly more supportive than most.
  10. My thanks to Rob Bensinger, Colm Ó Riain, and Matthew Graves for their substantial contributions to this post.

The post 2016 in review appeared first on Machine Intelligence Research Institute.

New paper: “Cheating Death in Damascus”

Sun, 03/19/2017 - 05:30

MIRI Executive Director Nate Soares and Rutgers/UIUC decision theorist Ben Levinstein have a new paper out introducing functional decision theory (FDT), MIRI’s proposal for a general-purpose decision theory.

The paper, titled “Cheating Death in Damascus,” considers a wide range of decision problems. In every case, Soares and Levinstein show that FDT outperforms all earlier theories in utility gained. The abstract reads:

Evidential and Causal Decision Theory are the leading contenders as theories of rational action, but both face fatal counterexamples. We present some new counterexamples, including one in which the optimal action is causally dominated. We also present a novel decision theory, Functional Decision Theory (FDT), which simultaneously solves both sets of counterexamples.

Instead of considering which physical action of theirs would give rise to the best outcomes, FDT agents consider which output of their decision function would give rise to the best outcome. This theory relies on a notion of subjunctive dependence, where multiple implementations of the same mathematical function are considered (even counterfactually) to have identical results for logical rather than causal reasons. Taking these subjunctive dependencies into account allows FDT agents to outperform CDT and EDT agents in, e.g., the presence of accurate predictors. While not necessary for considering classic decision theory problems, we note that a full specification of FDT will require a non-trivial theory of logical counterfactuals and algorithmic similarity.

“Death in Damascus” is a standard decision-theoretic dilemma. In it, a trustworthy predictor (Death) promises to find you and bring your demise tomorrow, whether you stay in Damascus or flee to Aleppo. Fleeing to Aleppo is costly and provides no benefit, since Death, having predicted your future location, will then simply come for you in Aleppo instead of Damascus.

In spite of this, causal decision theory often recommends fleeing to Aleppo — for much the same reason it recommends defecting in the one-shot twin prisoner’s dilemma and two-boxing in Newcomb’s problem. CDT agents reason that Death has already made its prediction, and that switching cities therefore can’t cause Death to learn your new location. Even though the CDT agent recognizes that Death is inescapable, the CDT agent’s decision rule forbids taking this fact into account in reaching decisions. As a consequence, the CDT agent will happily give up arbitrary amounts of utility in a pointless flight from Death.

Causal decision theory fails in Death in Damascus, Newcomb’s problem, and the twin prisoner’s dilemma — and also in the “random coin,” “Death on Olympus,” “asteroids,” and “murder lesion” dilemmas described in the paper — because its counterfactuals only track its actions’ causal impact on the world, and not the rest of the world’s causal (and logical, etc.) structure.

While evidential decision theory succeeds in these dilemmas, it fails in a new decision problem, “XOR blackmail.”1 FDT consistently outperforms both of these theories, providing an elegant account of normative action for the full gamut of known decision problems.

The underlying idea of FDT is that an agent’s decision procedure can be thought of as a mathematical function. The function takes the state of the world described in the decision problem as an input, and outputs an action.

In the Death in Damascus problem, the FDT agent recognizes that their action cannot cause Death’s prediction to change. However, Death and the FDT agent are in a sense computing the same function: their actions are correlated, in much the same way that if the FDT agent were answering a math problem, Death could predict the FDT agent’s answer by computing the same mathematical function.

This simple notion of “what variables depend on my action?” avoids the spurious dependencies that EDT falls prey to. Treating decision procedures as multiply realizable functions does not require us to conflate correlation with causation. At the same time, FDT tracks real-world dependencies that CDT ignores, allowing it to respond effectively in a much more diverse set of decision problems than CDT.

The main wrinkle in this decision theory is that FDT’s notion of dependence requires some account of “counterlogical” or “counterpossible” reasoning.

The prescription of FDT is that agents treat their decision procedure as a deterministic function, consider various outputs this function could have, and select the output associated with the highest-expected-utility outcome. What does it mean, however, to say that there are different outputs a deterministic function “could have”? Though one may be uncertain about the output of a certain function, there is in reality only one possible output of a function on a given input. Trying to reason about “how the world would look” on different assumptions about a function’s output on some input is like trying to reason about “how the world would look” on different assumptions about which is the largest integer in the set {1, 2, 3}.

In garden-variety counterfactual reasoning, one simply imagines a different (internally consistent) world, exhibiting different physical facts but the same logical laws. For counterpossible reasoning of the sort needed to say “if I stay in Damascus, Death will find me here” as well as “if I go to Aleppo, Death will find me there” — even though only one of these events is logically possible, under a full specification of one’s decision procedure and circumstances — one would need to imagine worlds where different logical truths hold. Mathematicians presumably do this in some heuristic fashion, since they must weigh the evidence for or against different conjectures; but it isn’t clear how to formalize this kind of reasoning in a practical way.2

Functional decision theory is a successor to timeless decision theory (first discussed in 2009), a theory by MIRI senior researcher Eliezer Yudkowsky that made the mistake of conditioning on observations. FDT is a generalization of Wei Dai’s updateless decision theory.3

We’ll be presenting “Cheating Death in Damascus” at the Formal Epistemology Workshop, an interdisciplinary conference showcasing results in epistemology, philosophy of science, decision theory, foundations of statistics, and other fields.4

Update April 7Decisions are for making bad outcomes inconsistent.


" style="margin-top:20px;"> Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.

jQuery(document).ready(function($) { $('#mc-embedded-subscribe-form3').validate({ rules: { EMAIL: { required: true, email: true }, FNAME: { required: true }, LNAME: { required: true } }, errorClass: 'text-error', errorPlacement: function(error, element) { error.appendTo(element.closest('.control-group')); }, highlight: function(element) { $(element).closest('.control-group').removeClass('success').addClass('error'); }, success: function(element) { $(element).closest('.control-group').removeClass('error').addClass('success'); }, submitHandler: function(form) { form.submit(); $('#NewPublicationsFormTab a[href="#NewPublicationsMessage"]').tab('show'); $('#NewPublicationsMessage').addClass('alert-success').html('×Almost finished... We need to confirm your email address. To complete the subscription process, please click the link in the email we just sent you.'); _gaq.push(['_trackEvent', 'other engagement', 'submit form', 'newsletter']); //form.remove(); } }) });


  1. Just as the variants on Death in Damascus in Soares and Levinstein’s paper help clarify CDT’s particular point of failure, XOR blackmail drills down more exactly on EDT’s failure point than past decision problems have. In particular, EDT cannot be modified to avoid XOR blackmail in the ways it can be modified to smoke in the smoking lesion problem.
  2. Logical induction is an example of a method for assigning reasonable probabilities to mathematical conjectures; but it isn’t clear from this how to define a decision theory that can calculate expected utilities for inconsistent scenarios. Thus the problem of reasoning under logical uncertainty is distinct from the problem of defining counterlogical reasoning.
  3. The name “UDT” has come to be used to pick out a multitude of different ideas, including “UDT 1.0” (Dai’s original proposal), “UDT 1.1”, and various proof-based approaches to decision theory (which make useful toy models, but not decision theories that anyone advocates adhering to).

    FDT captures a lot (but not all) of the common ground between these ideas, and is intended to serve as a more general umbrella category that makes fewer philosophical commitments than UDT and which is easier to explain and communicate. Researchers at MIRI do tend to hold additional philosophical commitments that are inferentially further from the decision theory mainstream (which concern updatelessness and logical prior probability), for which certain variants of UDT are perhaps our best concrete theories, but no particular model of decision theory is yet entirely satisfactory.

  4. Thanks to Matthew Graves and Nate Soares for helping draft and edit this post.

The post New paper: “Cheating Death in Damascus” appeared first on Machine Intelligence Research Institute.

March 2017 Newsletter

Thu, 03/16/2017 - 04:59

Research updates

General updates

  • Why AI Safety?: A quick summary (originally posted during our fundraiser) of the case for working on AI risk, including notes on distinctive features of our approach and our goals for the field.
  • Nate Soares attended “Envisioning and Addressing Adverse AI Outcomes,” an event pitting red-team attackers against defenders in a variety of AI risk scenarios.
  • We also attended an AI safety strategy retreat run by the Center for Applied Rationality.

News and links

The post March 2017 Newsletter appeared first on Machine Intelligence Research Institute.