Материал: искусственный интеллект

Внимание! Если размещение файла нарушает Ваши авторские права, то обязательно сообщите нам

suai.ru/our-contacts

quantum machine learning

Quantum Cognition   3

Figure 1: Two dimensional vector space. Green orthogonal axes represent WB answers, red orthogonal axes represent BW answers, blue line represents initial state.

Wang & Busemeyer (2013) present the actual N − dimensional model with multi-dimensional subspaces .

The first step that we need to make is the choice of a basis to represent events for each question. The choice of the first basis is arbitrary because it simply determines the coefficients assigned to each coordinate. We will start with the simplest, standard basis, by assuming that the answers to WB question are represented by two orthonormal basis vectors (see green lines in Figure 1):

Vn = 01 ,Vy = 01 .

The ray spanned by Vy is the subspace representing the answer “yes” and the ray spanned by Vn is the subspace representing the answer “no” to the W-B question. Each of these subspaces correspond to a projector:

0

0

 

1

0

 

,

PWB (y) = Vy Vy

=

 

, PWB (n) = Vn Vn

=

 

0

 

 

0

1

 

0

 

 

which projects vectors in the vector space onto the corresponding subspace (represents the Hermitian transpose).

The participant’s beliefs about the WB question is determined by their background knowledge of racism. These beliefs are represented by a unit length vector in this vector space. Suppose the coordinates for the belief state regarding racial issues, with respect to the

WB basis, are defined by the following 2 × 1 column matrix (see blue line in Figure 1)

 

 

 

.4

SR =

 

.

 

.6

 

 

 

 

In this case, the person is “superposed” between the two possible answers: The square of the first coordinate (.4) gives the probability of answering no, and the square of the second coordinate (.6) gives the probability of answering yes. Even though the answers to be reported are mutually exclusive (the person can only pick one), both answers have non-zero probabilities of being selected.

The rule for computing the probability of an event is simple: project the belief state onto the subspace for the event, and take the squared length. Using this rule, the probabilities for each answer are (see the arrow associated with sqrt(.6) in Figure 1)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

0

 

 

 

 

2

 

 

0

 

 

 

2

p(WB = y) =

 

 

 

 

PWB(y) ·SR

 

 

 

2

 

.4

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

·

 

 

 

 

=

 

 

 

 

 

 

 

= .6.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

1

 

.6

 

 

 

 

 

.6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

p(WB = n) =

 

 

 

P

(n) ·S

R

 

 

 

2

= .4.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If the answer to the first question turns out to be “yes,” then the belief state is “collapsed” to this subspace, and a new state is formed by normalizing the projection on the answer yes, which becomes

Downloaded from https://www.cambridge.org/core. Auckland University of Technology, on 28 Dec 2019 at 13:01:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/sjp.2019.51

suai.ru/our-contacts

4   J. R. Busemeyer and Z. J. Wang

 

P

y

)

S

 

0

 

 

 

0

 

 

 

Sy =

 

WB (

 

R

 

 

 

 

 

 

 

 

 

 

 

=

 

 

/ .6

=

 

.

 

 

 

 

 

 

 

 

 

 

p(WB = y)

 

.6

 

 

1

with respect to the WB basis. Now, after answering “yes,” if asked again, the person is certain to say “yes” to the WB question.

What about the BW question? To represent these answers, we need to rotate from the original basis {Vn, Vy} used to represent the WB question to a new basis {Un, Uy} that provides this BW perspective. Suppose that the basis vectors used to represent the BW answers are obtained from the WB basis vectors by the unitary matrix

UBW

.8090

−.5878

 

=

.8090

.

 

.5878

 

.8090

 

The first column of U, i.e., Un =

, represents

.5878

 

the basis vector for “no;” the second column, i.e.,

−.5878

Uy = .8090 , represents the basis vector for “yes” for

the BW question (see red lines in Figure 1).

Then the projectors for the answers to the BW question equal

P

y

)

= U

P

y

)

U

 

,

 

 

 

BW (

 

 

(

BW

WB (

 

 

BW )

 

 

 

 

 

 

 

 

.8090

−.5878

 

0

0

.8090

.5878

 

 

 

=

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.5878

.8090

 

0

1

−.5878

.8090

 

 

 

 

.3455

−.4755

,

 

 

 

 

 

 

 

=

−.4755

.6545

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.6545

.4755

PBW (n) = (UBW PWB (n) UBW )

=

 

 

.

 

 

 

 

 

 

 

 

 

 

 

 

 

.4755

.3455

If the person first said “yes” to the WB question, then the conditional probabilities of each answer to the BW question equal

p(BW = y|WB = y) = PBW (y) ·Sy 2 = .6545,

p(BW = n|WB = y) = PBW (n) ·Sy 2 = .3455.

Note that if the person answers “yes” to the WB question (so that the state has collapsed to Sy, and the person is certain to say “yes” if asked again about the WB question), then the person must be uncertain about the BW question, because the state Sy has non zero projections on both of the BW events. This illustrates how the quantum uncertainty principle arises. Being certain about one event (the answer to WB is yes) must make one uncertain about a different, incompatible event (the answer to BW is uncertain). A person cannot be certain about both incompatible measurements at the same time.

Finally the sequential probability of answering “yes” to the WB question and then “no” to the BW question equals (see the arrow associated with sqrt(.207) in Figure 1)

quantum machine learning

p(WB = y,BW = n) = p(WB = y) · p(BW = n|WB = y)

=PWB (y) SR 2 PBW (n) Sy 2

=PBW (n) PWB (y) SR 2 = .2073.

The opposite order produces

p(BW = n,WB = y) = p(BW = n) · p(WB = y|BW = n)

= PWB (y) PBW (n) SR 2 = .3231.

This produces an order effect.

Now we turn to the general model. We assume that events are represented as subspaces of a finite dimensional Hilbert space H. A finite dimensional Hilbert space is a vector space defined on a complex field endowed with an inner product. The dimension of the vector space can be arbitrary, say N −dimensional. The state representing the beliefs of a person is a vector |S H. A projector for an event such as the answer “yes” to question A is an linear operator PA(y) in the Hilbert

space that satisfies PA(y) = PA(y)= PA(y) · PA(y). The projector for the complement, i.e., the answer to ques-

tion A is “no”, is PA(n) = I PA(y), where I is the identify operator, and note that PA(y) · PA(n) = 0. If question A is asked before question B, then we denote the probability of observing the answer “yes” to question A (e.g., the WB question) and then the answer “no” to question B (e.g., the BW question) as p(A = y, B = n). The opposite order is denoted p(B = n, A = y). Then the general model for question order states simply that

p(A = y,B = n) = PB (n) ·PA (y) · S2 ,

p(B = n,A = y) = PA (y) ·PB (n) · S2 .

If we condition on the AB order, then the 2 × 2 joint frequencies for the A,B pair of questions can be described as a classical joint probability distribution; likewise. if we condition on BA order, then the 2 × 2 joint frequencies for the A,B pair of questions also can be described as a classical probability distribution. This produces two classical joint distributions that can perfectly describe the empirical results. But these are two separate and unrelated distributions, which simply reproduce the empirical results. The advantage of the quantum probability model comes from providing a mathematical system that relates the two different joint distributions and makes a priori predictions about this relationship. Wang & Busemeyer (2013) proved the following theorem that makes an a priori prediction for any dimension N, and for any projectors representing questions A, B. The quantum probability model must predict a very special pattern of order effects that we call the QQ equality (Wang & Busemeyer, 2013):

Q= (p(A = y,B = n) + p(A = n,B = y))

(p(B = y,A = n) + p(B = n,A = y)) = 0

Downloaded from https://www.cambridge.org/core. Auckland University of Technology, on 28 Dec 2019 at 13:01:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/sjp.2019.51

suai.ru/our-contacts

This theoretical prediction was established first, and then we empirically tested it later and we found it to be supported across a wide range of 70 national field experiments that examined question-order effects (Wang et al., 2014).

After we published our results, two other non-quantum and post hoc explanations were put forward to account for the QQ equality (Kellen et al., 2018; Costello & Watts, 2018). However, these accounts of the QQ equality were put forth after the empirical finding and they were designed specifically for this particular application and empirical finding. The advantage of the quantum probability model is that it is more general, and it makes new additional predictions. The same general quantum model for order effects has been successfully applied to new applications including (a) multi-valued (more than 2) ratings scales (Wang & Busemeyer, 2016a), and (b) for the effects that the ordering of evidence has on inference (Yearsely & Trueblood, 2017). The nonquantum post hoc accounts of the QQ (Kellen et al., 2018; Costello & Watts, 2018) are unable to apply to these new situations.

Conjunction and disjunction probability judgment errors

Tversky & Kahneman (1983) reported what are called “conjunction errors,” which might be considered the strongest evidence that human reasoning under uncertainty does not obey the Kolmogorov axioms. A conjunction error occurs when a person judges the probability of a conjunction of two events to be greater than one of the single events. One of the most famous examples is based on the “Linda” scenario (but there are many more examples and replications of this finding): Linda is initially described to appear to be a very strong and liberal and intellectual women. Then participants are asked to judge the likelihood of various statements about Linda, including the statement that “Linda is a bank teller” (B) and that “Linda is a feminist and a bank teller” (F and B). Participants typically judge the (F and B) event as more likely than the B event. Moreover, they also commit a “disjunction error:” they judge the likelihood of (F or B) to be less than the likelihood of F alone (e.g., Morier & Borgida 1984).

Below we begin with a “toy” quantum model to account for these probability judgment errors. To make the model for this situation as simple as possible, we again use a 2-dimensional vector space and one dimensional subspaces (rays). The actual full model for N −dimensional spaces and multi-dimension subspaces is described in Busemeyer et al. (2011).

We assume that the two questions about bank teller and feminist are incompatible. The intuition is that a person rarely experiences these together, and so they

quantum machine learning

Quantum Cognition   5

had very few opportunities to learn a compatible representation of features for the simultaneous occurrence of the two events. Instead, they need to view feminism relative to one set of attitude features, and then view bank teller relative to a different set of employment features.

The first step that we need to make is the choice of a basis to represent events for each question. The choice of the first basis is arbitrary and so we start with the standard basis. We will start by assuming that the (yes,no) answers to feminism question are represented by two orthonormal basis vectors:

Vy = 01 ,Vn = 01 , which produce projectors:

PF (n) = 00 01 , PF (y) = 01 00 .

The participant’s beliefs are initially determined by the Linda story. Given the story, it is plausible to assume that the coordinates for the belief state, with respect to the feminism basis, will have higher magnitudes assigned to “yes.” For example, the initial beliefs can be represented by the following 2 × 1 column matrix

.9877

SL = −.1564 .

Using the quantum rule for computing probability of an event, the probabilities for each answer are

p(F = y) = PF (y) ·SL 2 = .9755,

p(F = n) = PF (n) ·SL 2 = .0245,

To represent the bank teller answer, we need to rotate to a new basis that provides this view. Suppose that the basis vectors used to represent the bank teller answers are obtained from the feminism basis vectors by the rotation matrix

.3090

−.9511

UBF =

.3090

.

.9511

 

The first column of U represents the basis vector for “yes” and the second column represents the basis vector for “no” for the bank teller question. Then the projectors for the answers to the BW question equal

PB (y) = (UBF PF (y) UBF),PB (n) = (UBF PF (n) UBF).

Starting from the Linda story, the probabilities for the answers to the bank teller question are

p(B = y) = PB (y) ·SL 2 = .0245,

p(B = n) = PB (n) ·SL 2 = .9755.

Downloaded from https://www.cambridge.org/core. Auckland University of Technology, on 28 Dec 2019 at 13:01:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/sjp.2019.51

suai.ru/our-contacts

6   J. R. Busemeyer and Z. J. Wang

Finally the probability of answering “yes” to feminism and then “yes” to bank teller equals

p(F = y,B = y) = p(F = y) · p(B = y|F = y)

=

 

 

 

PB (y) PF (y) SL

 

 

 

2 = .0932.

 

 

 

 

This last results reproduces the conjunction error because we have p(B = y) = .0245 which is less than p(F = y, B = y) = .0932.

The probability of the disjunction “feminist or bank teller” is computed from

p(F = yorB = y) = 1− p(B,F)

=1− PF (n) PB (n) SL 2

=.9069.

This last results reproduces the disjunction error because we have p(F = y) = .9755 which is greater than p(F = y or B = y) = .9069. Thus the same rotation of basis reproduces both the conjunction and disjunction errors.

Now we turn to the general model. Once again, we assume that events are represented as subspaces of a finite dimensional Hilbert space H. The dimension of the vector space can be arbitrary, say N −dimensional. We define |S H as the vector representing beliefs after hearing a experimental cover story (e.g. a story about Linda). Suppose A, B are two events and A is more likely than B. Define PA(y) as a projector operating in H representing the answer “yes” to question A (e.g., feminist) and define PB(y) as a projector operating in H representing the answer “yes” to question B (e.g. bank teller). According to quantum probability rules, the sequential probability of answering “yes” to question A and then “yes” to question B is

p(A = y,B = y) = PB (y) PA (y) |S 2 .

The probability of answering “yes” to question B by itself is

p(B = y) = PB (y) |S 2

=PB (y) I |S 2

=PB (y) (PA (y) + PA (n)) |S 2

=PB (y) PA (y) |S + PB (y) PA (n) |S 2

=PB (y) PF (y) |S 2 + PB (y) PF (n) |S 2 + Int,

where Int is called the interference term, which contains the remaining crossproduct terms produced by squaring the length of a sum of two parts. This interference term can be positive or negative or zero. According

to this model, the conjunction fallacy occurs whenever

Int < − PB (y) PF (n) |SL 2.

Although this account of the conjunction fallacy is very general (it does not depend on any specific dimension N. and it does not depend on any specific

quantum machine learning

unitary matrix) it is post hoc. However, the quantum model makes many additional a priori predictions. In particular, if p(A = y, B = y) > p(B = y), then this model must predict that p(B = y|A = y) > p(B = y). There is supporting evidence for this prediction (see Busemeyer et al. 2011). Also, this model cannot predict that p(A = y, B = y) > p(A = y) and p(A = y, B = y) > p(B = y) both occur. Although there is some debate about this issue, double conjunction errors are rare (see Busemeyer et al. 2011). In addition, this model predicts that p(B = y) > p(B = y, A = y). That is, the model predicts conjunction fallacies depend on the order of evaluating questions. Some researchers (Costello et al., 2017) do not find empirical evidence for this prediction, whereas others (Yearsely & Trueblood, 2017) report empirical evidence for a predicted correlation between order effects and conjunction errors.

Interference of categorization on decision

Another interesting application of quantum cognition concerns some puzzling findings obtained from a categorization - decision task (Busemeyer et al., 2009). In these experiments, participants are shown faces. On some trials they categorize the faces as “good guys” or “bad guys”, and then decide to “attack” or “withdraw” (this is called the categorization-decision condition); on other trials they only decide to “attack” or “withdraw” without making any categorization (this is called the deci- sion-alone condition). Participants are usually rewarded for “attacking” the “bad guys” and for “withdrawing” from the “good guys.” These experiments allow a test to see if the total probability of an action obtained from the categorization-decision condition

pT (A) = p(G) p(A|G) + p(B) p(A|B)

equals the probability to attack p (A) under the decision alone condition. The difference Int = p(A) − pT(A) is called an interference effect, which indicates an effect of measurement about the category on the final action decision. Several experiments reported positive interference effects (Busemeyer et al., 2009; Wang & Busemeyer, 2016b). Even more interesting, Busemeyer et al. (2009) found the largest interference effect, and in this experiment, the probability to “attack” after categorizing the face as “bad” was lower than the probability to “attack” when no categorization was made at all!

Below we show how a “toy” quantum model easily accounts for these interference effects. To make the model for this situation as simple as possible, we again use a 2-dimensional vector space and one dimensional subspaces (rays). The actual model used in Wang & Busemeyer (2016b) was a 4 −dimensional model with 2-dimensional subspaces.

Downloaded from https://www.cambridge.org/core. Auckland University of Technology, on 28 Dec 2019 at 13:01:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/sjp.2019.51

suai.ru/our-contacts

In this application, the incompatibility between the categorization and decision events arises from a dynamic process that first views the face from an evidence basis to select a category, and then rotates to an evaluation basis to choose an action. The first step that we need to make is the choice of a evidence basis to represent events for categorization. We will start by assuming that the (good,bad) answers to categorization question are represented by two orthonormal basis vectors:

VG = 01 ,VB = 01 , which produce projectors:

PC (g) = 00 01 ,PC (b) = 01 00 .

The participant’s beliefs about the categories depends on the face. Suppose the face looks like a “bad guy,” and so coordinates for the belief state, with respect to the evidence basis, are defined by the following 2 × 1 column matrix

.9491

SF = −.3150 .

Using the quantum rule for computing the probability of an event, the probabilities for each answer are

p(C = g) =

 

 

 

 

 

 

 

PC (g) ·SF

 

 

 

 

 

2

= .10,

 

 

 

 

 

 

p(C = b) =

 

 

 

PC (b) ·SF

 

 

 

 

2

= .90.

 

 

 

 

If the face is categorized as “good,” then the state collapses to Sg = [0 1]in the evidence basis; and if face is categorized as “bad,” the state collapses to Sb = [1 0]in the evidence basis.

To represent the decision, we need to rotate to a new basis that evaluates the payoffs for actions. Suppose that the evaluation basis is obtained from the evidence basis vectors by the rotation matrix

.7765

.6301

UDC =

−.6301

.

 

.7765

The first column of U represents the basis vector for “attack” and the second column represents the basis vector for “withdraw” for the evaluation basis. Then the projectors for the answers to the action decision equal

PD (a) = (UDC PC (b) UDC),PD (w) = (UDC PC (g) UDC).

Starting from the face, the probabilities for the decisions (when it is made alone) are

p(D = a) = PD (a) ·SF 2 = .8751,

p(W = w) = PD (w) ·SF 2 = .1249.

The probability of the decision to “attack” conditioned on each category response equal

quantum machine learning

Quantum Cognition   7

p(D = a|C = g) = PD(a) ·Sg 2 = .3971

p(D = a|C = b) = PD(aSb 2 = .6029.

Note that the probability to “attack” in the decision alone condition equals .8751, which exceeds both of the above conditional probabilities. Therefore, this toy model produces both a positive interference effect as well as producing a higher “attack” rate for the decision alone condition as compared to the decision after categorizing the face as “bad.”

Now we turn to the general model again. As before, we assume that events are represented as subspaces of a finite dimensional Hilbert space H. The dimension of the vector space can be arbitrary, say N −dimensional. We define |S H as the vector representing beliefs about the category after seeing the face stimulus. Define Pc (b) as a projector operating in H representing the answer “bad guy” to the categorization question, and define Pc (g) as a projector operating in H representing the answer “good guy” to the categorization question. According to quantum probability rules, probability of deciding to “attack” in the decision alone condition equals

p(D = a) = PD (a) |S 2

=PD (a) PC (g) |S 2 + PD (a) PC (b) |S 2 + Int,

and again the interference term, Int, can be positive so that

PD (a) |S 2 > PD (a) PC (g) |S 2 + PD (a) PC (b) |S 2 = p(C = g) p(D = a|C = g)

+ p(C = b) p(D = a|C = b).

Wang & Busemeyer (2016a) go further by quantitatively testing and comparing quantum versus Markov models with respect to their abilities to make new predictions for the categorization-decision task. They used a generalization criterion method: they estimated the parameters from both the quantum model and a classical Markov model using data obtained from a first set of payoff conditions. For this first set of conditions, the same number of parameters were estimated from the data for each model. Then they used the parameters estimated from the first set of payoff conditions to make generalization predictions for two new payoff conditions. The results supported the quantum model, which made more accurate generalization predictions than the Markov model.

Summary

In this article, we first provided psychological reasons for exploring the applications of quantum probability theory to human judgment and decision making behavior. Second, we presented three very different

Downloaded from https://www.cambridge.org/core. Auckland University of Technology, on 28 Dec 2019 at 13:01:39, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/sjp.2019.51