2
$\begingroup$

I have come across a curious paradox concerning The Prisoner's Dilemma

Suppose 4 things :

  • prisoners A and B are rather stupid people and decide to use an artificial intelligence program to decide through a logical proof whether it is better to cooperate or defect.
  • the AI program is deterministic
  • both prisoners use the same program
  • programs are aware that the other decision is determined by the same program

The program can make two different demonstrations :

  1. Whathever the other program has decided, I am better of with choosing to defect (usual argument associated with the prisoner's dilemma). So, the best choice is to defect. Q.E.D
  2. As a deterministic machine, the other program will end up with the same choice as mine. As double cooperation is better than double defection, the best choice is to cooperate. Q.E.D

[EDIT : I need to add some precision about the nature of the Artificial Intelligence program I have in mind. Some people had supposed it was something similar to current generative IA program, typically using neural networks. Actually, like Caleb Stanford have reformulated it, it is any formal program (Turing Machine) that has the ability to produce correct logical proofs using a logical formal language]

Which reasoning is right and why the other is wrong ?

Proof 2 seems to outclass proof 1. But, so proof 1 is wrong while it is the same logical proof than the one that is used to prove that it is better to defect in the classical Prisoners' dilemma. In that proof, the hypothesis that the other prisoner will reason slightly differently than us is not mentioned. Is that hypothesis lacking in the prisoner's dilemma ?

$\endgroup$
7
  • $\begingroup$ Well, it depends on how the program works. If the goal is to optimize the present game or something else (like "optimize the long term outcome if the game is iterated"), $\endgroup$
    – lulu
    Commented Jul 11 at 13:48
  • $\begingroup$ If we're talking about the kind of AI that we know how to build, the question is fed into some LLM or neural network and an answer comes out, but nobody really knows why the answer is what it is. It's possible that if we add some data to the training set, we get the opposite answer. $\endgroup$
    – David K
    Commented Jul 11 at 13:48
  • $\begingroup$ This is an excellent question! :) You might consider in the future asking on Philosophy StackExchange or Computer Science StackExchange. $\endgroup$ Commented Jul 11 at 15:55
  • 3
    $\begingroup$ I think this would be more appropriate on philosophy stack exchange. $\endgroup$ Commented Jul 11 at 17:17
  • 5
    $\begingroup$ Neither of your proposed logical flows depend on the fact that it's an AI behind the logic. $\endgroup$
    – svavil
    Commented Jul 11 at 22:20

4 Answers 4

6
$\begingroup$

The problem is in the premises! In essence, you are assuming here that there is a deterministic AI program (i.e., a Turing machine) $M$ such that:

  1. $M$ always returns "cooperate" or "defect" (always halts)

  2. $M$ is able to reason about its own output, i.e. "if $M$ outputs $x$ then ..."

  3. $M$ returns $x$ if there exists a mathematical proof that $x$ is the best response to the output of $M$

As your question points out, under assumption 3, there exists a proof that "defect" is always the best response for $x$, for any $x$, therefore "defect" is the best response. But on the other hand, under assumption 2, there exists a proof that reasons as follows: if $M$ returns cooperate then opponent returns cooperate, and if $M$ returns defect then opponent returns defect, therefore cooperate is a better response than defect. I realize this is not entirely formal, but it can be made so, and it is the core idea of the question you have in mind.

So what gives? In fact, the above shows that assumptions (1)-(3) are inconsistent. But we can see this more directly: under assumptions (1) and (2) taken together, we get a contradiction as in the halting problem: $M$ can say, run $M$, check the output $x$, and return the opposite of whatever $M$ outputs. This is a contradiction, so it is therefore impossible for such a machine to exist.

Assumptions (1) and (3) are also quite suspect taken together. While you need to specify under what proof system you are working under in (3), regardless of the choice it is likely that there is no way to return whether there exists a proof on all inputs while at the same time halting on all inputs (that is, most proof systems are not decidable, per Godel and Turing).

$\endgroup$
4
  • 1
    $\begingroup$ Your formal reformulation makes things clearer. I need to check some points with you to be sure I fully understand. Due to undecidability of most proof systems, M (if premise 1 is met) should have the possibility to make a choice with or without proof. Suppose M gives its answer with an additional information ("and I proved it is the best strategy" set to true or false) Premise 2 does not suppose that x can be known withing the calculus of x itself (that would lead to infinite recursion). The reasoning of proof 2, just need to know that x exists. So is there still inconsistence ? $\endgroup$
    – Arnaud
    Commented Jul 11 at 20:23
  • $\begingroup$ You are right that premise 2 can be weakened. And, "due to undecidability of most proof systems" -- yes, that weakens premise 3. For example, the machine could just search for proofs up to some finite length, like proofs less than 1,000,000 characters. But I think it is still inconsistent. I would have to think about exactly where the inconsistency arises $\endgroup$ Commented Jul 11 at 21:37
  • 2
    $\begingroup$ @CalebStanford If one machine checks only to a certain depth, then the other needs to check to the same depth plus a few more steps, to both know the result and act on it. So they either are not equivalent, because one is allowed to run slightly longer, or they cannot act on the result of the other's calculations. $\endgroup$
    – mlk
    Commented Jul 12 at 7:56
  • 1
    $\begingroup$ I think the formalism of this answer is very helpful. I would add (and have added, in my answer) that there is also an assumption by OP that the prisoners are even able to ask the question they want answered. The reason there are so many alternative approaches to AI such as neural networks and LLMs is that it turns out to be incredible difficult to put many perfectly ordinary questions into a form that can be reasoned about using only mathematical logic. $\endgroup$
    – David K
    Commented Jul 12 at 19:07
4
$\begingroup$

Your "two different reasoning" are a minuscule fraction of the possible ways the AI might make its decision. Here are some other ways:

  1. After $17$ iterations, node $4321$ has weight $0.950293159$, which is within the tolerance to trigger node $92384$, therefore the answer is "cooperate".

  2. The literature in the LLM training set tends toward "defect" $85.03\%$ of the time, so the answer is "defect".

But it seems that what you mean by an "AI" in the question has nothing to do with any actual AI that has ever been built; it is rather a mechanized, deterministic substitute for the "perfect logician" that some puzzles refer to.

This presupposes that the answers are "yes" to all of the following questions:

  • Is it possible for a "perfect logician" to be mechanized and deterministic?

  • Is the question one that a deterministic perfect logician can answer?

  • Is it possible for both prisoners to receive truly identical copies of the same mechanized, deterministic perfect logician in the exact same initial state? [Note: this is probably the easiest of these questions to answer "yes" to, assuming the answer to the previous question is, "Yes, and it can be done with software that is just like the deterministic software programs we know today except that it has different lines of code."]

  • Is it possible for two "rather stupid people" to be able to pose this question to the AI, including all the information and preferences the AI needs to know about in order to make a decision, in such a way that the AI will actually be answering the desired question, not some completely different question that the users asked due to their incompetence?

  • If the two "rather stupid people" actually are able to use the AI competently, how can they be sure that they both feed the exact same question (up to differences that make no difference) into the AI?

  • How does prisoner B know that prisoner A is using the same AI (required in order for prisoner B to tell the AI "that the other decision is determined by the same program.")

A facile response to the question might be that there's no reason to believe that the circumstances posed in the question could ever be realized.


Assuming the answers to the first three questions were all "yes", the problem of how the prisoners ask the question might be addressed by giving each of them a document that they should type into the AI's interface verbatim in order to ask the question -- or better still, a device like a USB key that they simply insert in order to ask the question.

But a simpler solution is to deliver each of them an "AI" that simply prints out "cooperate" no matter what the input is.

To put it another way, in order to solve the various "how do they know how to ask the right question" problems, we need some helpful extra circumstances, such as coordination between the prisoners before they are isolated from each other or the intervention of a friendly third party.


I don't believe the original prisoner's dilemma requires any "hypothesis that the other prisoner will reason slightly differently than us". It merely requires the lack of a hypothesis that there is some unexplained chain of events that forces the other prisoner to make the same decision I make, no matter what that decision is (or at least do so in the case where I decide "cooperate"). The reasoning of the AI in this question is that chain of events (unexplained, because the question doesn't present a mechanical proof).

$\endgroup$
1
  • $\begingroup$ @YvesDaoust I think the whole point of the Prisoner's Dilemma is that there is no explicit assumption about whether the behaviors are identical or different. A question is whether we can prove that they would be identical if the prisoners both acted in perfect self-interest. This question appears to be an attempt to dodge the actual dilemma via a rather arcane procedure relying on many implausible hypotheses. $\endgroup$
    – David K
    Commented Jul 12 at 18:47
2
$\begingroup$

If they use the same algorithm, then effectively we don't have two people, but one. So one assumption, namely that either prisoner doesn't know what the other is choosing, might not be holding up.

The big question now is if that program is aware of the fact that the other program is an instance of itself, and if it is smart enough to use that fact.

If both apply, then we have two choices: Both defect, or both cooperate. Here, both will cooperate, because it's their better option.

If one of the two doesn't apply, we're back at the situation from the article you've linked. So here, both will defect.

$\endgroup$
1
  • $\begingroup$ "programs are aware that the other decision is determined by the same program" $\endgroup$ Commented Jul 12 at 16:52
1
$\begingroup$

I don't really see an issue. We can't decide ahead of time which one the AI will go for, but that doesn't make it a paradox. It could go for either one, and justify it perfectly well. It would be far more paradoxical if going for one always made the other option the better choice.

Consider the well-known liar's paradox: "This statement is false". If the statement is true, then it is false. And if it is false, then it is true. On the other hand, "this statement is true" is not a paradox. It could be true, it could be false, we don't know, but neither option leads to any problem.

Your setting looks more like the latter than the former. We can't tell which way it goes, but no matter which way it goes, we won't have to tie ourselves into an impossible logical knot.

Have the two prisoners use the AI to play a single round of rock-paper-scissors instead, where the winner goes free and the loser is executed (and they are both executed in case of a tie), and you'll see that the machines will catch on fire and explode before they can figure out the answer.

$\endgroup$
1
  • $\begingroup$ You look at the program as any kind a logical operations (and, or, not, ...) that comes up with a result : either cooperate or defect. I intended it as a deterministic agent capable of constructing logical proofs. Proof A => defect will provide better output than cooperate. Proof B => cooperate will provide better output than defect. This is an apparent paradox. $\endgroup$
    – Arnaud
    Commented Jul 11 at 14:55

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .