Eternal Sunshine Of The Spotless Mind: 「讀書心得」About Two and N Prisoners' Dilemma Games, Dominant Strategy and Nash Equilibrium

2010年12月12日星期日

「讀書心得」About Two and N Prisoners' Dilemma Games, Dominant Strategy and Nash Equilibrium

《兩人囚犯困境、優勢策略、納許均衡之相關淺見，並自行假設兩人暨多人囚犯困境之數學雛型》

1.〈Two Prisoners' Dilemma Games〉

Suppose that：警方抓到兩囚犯（甲與乙），分別訊問

甲、乙可採用之策略的集合各為 X,Y = {否認，承認}。

甲乙皆否認，各關一年。(-1,-1)

甲乙皆承認，各關六年。(-6,-6)

甲否認，乙承認，甲關九年，乙釋放。(-9,0)

甲承認，乙否認，甲釋放，乙關九年。(0,-9)

對甲，{max(x)M(x1,yj)∪M(x2,yj), j=1,2} = {0,-6}。由此可見，甲最適反應為「承認」((0,-9),(-6,-6))。

對乙，{max(y)M(xi,y1)∪M(xi,y2), i=1,2} = {0,-6}。由此可見，乙最適反應亦為「承認」((0,-9),(-6,-6))。

此亦為 Dominant Strategy (優勢策略)，其定義為：無論對手如何因應，所作決定皆為自身最好的選擇 (關六年總比關九年好。=.=)

關於 Prisoner's Dilemma Games 的應變策略，白話一點就是說：視對方可能出的所有策略，找出對自己最有利的策略並加以因應。

對甲來說，乙有兩種策略。

一種是乙「否認」。
那時自己也有兩種策略，一是否認 (-1,-1)，另一種是承認 (0,-9)。"-1" 與 "0"，有利的當然是 "o" 囉，所以選擇「承認」。

一種是乙「承認」。
同樣地，甲也有兩種策略，一是否認 (-9,0)，另一種是承認 (-6,-6)。"-9" 與 "-6"，有利的當然是 "-6" (關六年，總比關九年好) 囉，所以當然選「承認」。

對乙來說，甲有兩種策略。

一種是甲「否認」。
那時乙有兩種策略，一為否認 (-1,-1)，另一種是承認 (-9,0)。"0" 當然比 "-1" 好，所以當然選「承認」囉。

一種是甲「承認」。
那時乙有兩種策略，一為否認 (0,-9)，另一種是承認 (-6,-6)。"-6" 當然比 "-9" 好，所以一定要選「承認」囉。

其實「理想且理性」的方法是：兩人都選「否認」，這樣只要關一年就出來了，不用都被關六年。但古書《增廣賢文》有云：「莫信直中直，須防仁不仁。」如果一昧信任他人，自己樂觀地選擇「否認」，而他人選「承認」，那自己就要被關九年，而對方卻是無罪釋放了 (人家如今成了罪犯、壞人嗎？=.=)

所以遇到這種進退失據的情形時，「最實際且理想」之最佳策略就是：從最壞的打算裡，找出對自己最有利的策略 (先取 min，再取 Max，也就是「最大的極小策略均衡」(min-Max Equilibrium))

此外，Dominant Strategy (優勢策略) 一定是 Nash Equilibrium (納許均衡)，但反之亦未必成立 (Nash Equilibrium 不一定都是 Dominant Strategy)

範例：甲、乙兩家公司考慮同時推出新產品

甲、乙可採用之策略的集合各為 X,Y = {推出新產品，不推出}。

對甲，{max(x)M(x1,yj)∪M(x2,yj), j=1,2} = {10,15}。由此可見，甲最適反應為「推出」((10,2),(15,0))。

對乙，{max(y)M(xi,y1)∪M(xi,y2), i=1,2} = {2,5}。由此可見，乙沒有優勢策略 ((10,2),(12,5))。

當甲推出新產品時，乙的最適反應為「推出新產品」((10,2),(15,0))。

但甲不推出時，乙最適反應卻為「不推出」((3,3),(12,5))。

兩者矛盾，所以乙並無優勢策略解。

那麼，乙雖沒有 Dominant Strategy，但要如何求甲乙的 Nash Equilibrium 呢？

很簡單啊。請看對手的策略：甲有優勢策略，「一定會推出」新產品，所以乙可輕易從甲反應裡，找出對自己最有利的策略並加以因應，這樣不就好了嗎。超容易解決的啦。=.= ((10,2),(15,0))

所以本範例之 Nash Equilibrium 為「甲乙皆推出新產品」，其解為（10,2)。

此外，兩人的 Nash Equilibrium，亦可能發生雙重解 (Dominant Strategy 有唯一解，但 Dominant Strategy 經常不存在；一定會有 Nash Equilibrium，但 Nash Equilibrium 不一定只有一個)

範例：男女朋友約會，討論該去看球賽抑或聽音樂會

對男，{max(x)M(x1,yj)∪M(x2,yj), j=1,2} = {2,1}。由此可見，甲沒有優勢策略 ((2,1),(1,2))。

對女，{max(y)M(xi,y1)∪M(xi,y2), i=1,2} = {1,2}。由此可見，乙沒有優勢策略 ((2,1),(1,2))。

但有雙重 Nash Equilibrium，就是 (2,1) 和 (1,2)。

還有「志願者困境」：n 個參與者，n(-IN\{1}(n is every positive integer) 每人都面臨要麼犧牲自己的部分利益，以成就他人；要麼選擇搭便車取巧方式，安心坐享其成。此多人困境，讓我想到 N Prisoners' Dilemma Games（多位囚犯之困境，稍微複雜一些的基礎賽局論。）

2.〈N Prisoners' Dilemma Games〉

範例情境：某社區深夜傳出槍聲，社區里所有居民都聽到了，只要有一人打 110，警察就會趕到。但如果沒有志願者，所有人都將面臨不定程度的恐懼；而有一個人決定做志願者，其他人都會因為沒作為而獲益。

For every positive integer "n" (n∈IN\{1}), there is a corresponding number "an" and so a sequence {an} can be defined as a function whose domain is the set of positive integers.

(Graph of f = {(x,f(x))∣x∈domain of f})

1. No one calls 110, the finite series, nth partial sum "Sn", "Sn" is convergent and Sn = a1+a2+...+an = 0 = ak where "k" is any element of "n", n∈IN.

2. If there exists a number "ai", "ai" is anyone of n inhabitants and he/she is a volunteer whose ai=-1, and the finite series, the nth partial sum "S(n-1)", "S(n-1)" is divergent where every number of which is "ah", ah=1, h∈IN\{i}.

將上頭的數學語言 (1.& 2.) 翻譯為白話文：

1. 無人打 110，每個人皆無損亦無得。

2. 有一人打 110，其損失 1，其他所有人得 1。

使用〈直接証法〉或〈數學歸納法〉皆可證出：

一、Proof by Direct Method (〈直接証法〉)：

For every number ar, r∈IN,{max(ar)M(ar,as),s∈IN\{r}} = {0,1}。由此可輕易判斷出，"ar" (every inhabitant) 最適反應為：「不打電話。」(只要自己不用打電話因而損失 1，就好了啦。)

We have proved that if the hypothesis is true, then the conclusion is true; so the proof is complete.

二、Proof by Induction (〈數學歸納法〉)：

r=1, {max(a1)M(a1,as),s∈IN\{1}} = {0,1}。由此可輕易判斷出，"a1" 最適反應為：「不打電話。」

r=t≧1, {max(at)M(at,as),s∈IN\{t}} = {0,1}。由此可輕易判斷出，"at" 最適反應為：「不打電話。」

r=t+1, {max(a(t+1))M(a(t+1),as),s∈IN\{(t+1)}} = {0,1}。由此可輕易判斷出，"a(t+1)" 最適反應為：「不打電話。」

Therefore, the inequality is true for all n by induction, n∈IN. This shows, by mathematical induction, that "an" won't call 110 for every positive positive integer "n".

By Proof by Direct Method or Proof by Induction, We can get the same dominant strategy that the finite series, the nth partial sum "Sn", "Sn" is convergent and Sn = a1+a2+...+an = 0 = ak where "k" is any element of "n", n∈IN. (翻譯為中文：「無論使用〈直接証法〉或〈數學歸納法〉，所得之最適結論皆為『無人打電話』，由此得証。」)

結論：人都是自私的，不願意躬先士卒，做那種對自己有害但對大眾有利的義舉。大多都想讓別人送死先犧牲 (「送死」、「犧牲」這兩個詞，好像用的太過火了)，而自己坐享其成。但每個人都打如意主意，最終結果就是一起擺爛，大家賺不成。

但另一種情形下，結果就迥然不同了：

For every positive integer "n" (n∈IN\{1}), there is a corresponding number "an" and so a sequence {an} can be defined as a function whose domain is the set of positive integers.

1. No one calls 110, the finite series, the nth partial sum "Sn", "Sn" is divergent where every number of which "an", an=-k, where "-k" sufferings wrung all inhabitants.

2. If there exists a number "ai", "ai" is anyone of n inhabitants and he/she is a volunteer whose ai=-1, and the finite series, the nth partial sum "S(n-1)", "S(n-1)" is divergent. However, every number of "S(n-1) is "ah", ah=c where c makes all inhabitants with exultation.

結論：如果那個 -k 值 (每位居民所蒙受的損失值，我們假設同為 "-k"。我知道這不可能，但人家懶得再假設了啦，好麻煩。就當作是「理想」狀態。=.=) 大到讓居民認為「即使沒人打電話，我也要趕緊通報，因為賠不起這些損失阿。」(No one calls 110, the finite series, the nth partial sum "Sn", "Sn" is divergent where every number of which "an", an=-k, where "-k" sufferings wrung all inhabitants.) 那這個困境就解決了。

而且結果只是一人「稍受」損失，得到的收益卻是極大的 (指的是與沒打 110，全體居民所受損失相較的話。) (and the finite series, the nth partial sum "S(n-1)", "S(n-1)" is divergent. However, every number of "S(n-1) is "ah", ah=c where c makes all inhabitants with exultation.)

所以了，不是所有的困境都是大家一起擺爛，皆無作為。舉個貼切的例子好了，如果社區裡發生火災，所有居民會選擇坐視不管，寧可讓大火蔓延燒到自家也不打電話，就是死眉瞪眼地等著別人叫消防車？絕不可能吧。=.=

Ref.：
1. The Def. Of Two Prisoners' Dilemma Games

2. CALCULUS: Concepts AND Contexts, James Stewart, McMaster University

3. FOUNDATIONS OF HIGHER MATHEMATICS, THIRD EDITION, PETER FLETCHER & C. WAYNE PATTY, VPI & State University