• Tidak ada hasil yang ditemukan

Quantum gradient descent algorithm

Routine 4.3.3: Density matrix exponentiation (in superposition)

8.2 Quantum gradient descent algorithm

The Hessian matrix can be expressed as

H= 1 2

K

X

α=1 p

X

j=1

" p X

i=1i6=j

p

Y

k=1k6=i

θA(α)k θ

(A(α)i )T +A(α)i θ θT

(A(α)j )T +A(α)j +

p

Y

i=1i6=j

θA(α)i θ

(A(α)j )T +A(α)j

# , (8.7)

and will likewise be translated into a quantum operatorH. Proof. Similar to above.

real amplitudes. Measure the state in the basis

|yesi= 1

√2(|0i+i|1i), |noi= 1

√2(i|0i+|1i).

Measuring ‘yes’ results in a state

θ(t+1)i= 1 p2pDpyes

cosγ |ψθ(t)i − 1 CD

sinγ D|ψθ(t)i

,

wherepyes is the probability of success given by

pyes= 1 2pD

"

cos2γ+ 1

CD2 sin2γhψθ|DD|ψθi −2 cosγ sinγhψθ|D|ψθi

# .

Proof. The state |ψiin yes-no basis reads

|yesihyes|+|noihno| 1

√pD

cosγ|0i|ψθ(t)i+ i CD

sinγ|1iD|ψθ(t)i

= 1

√pD

cosγ hyes|0i|yesi |ψθ(t)i+ i CD

sinγhyes|1i|yesiD|ψθ(t)i+ cosγ hno|0i |noi|ψθ(t)i+ i

CD

sinγhno|1i|noiD|ψθ(t)i

!

= 1

√2pD

cosγ|ψθ(t)i+ 1

CD sinγ D|ψθ(t)i

| {z }

1i

|yesi+ 1

√2

−i cosγ|ψθ(t)i+ i sinγ D|ψθ(t)i

| {z }

2i

|noi

The probability|hψ11i|2of measuring|yesiis given by pyes= 1

2pD

(cos2γ+ 1 CD

sin2γhψθ|DD|ψθi −cosγ sinγ(hψθ|Dθi+hψθ|D|ψθi), which for Hermitian operators can be summarised as above. The probability|hψ22i|2of measur- ing|noiis given by

pno= 1 2pD

(cos2γ+ 1 CD

sin2γhψθ|DD|ψθi+ cosγ sinγ(hψθ|Dθi+hψθ|D|ψθi), which with the definition ofpD adds up to 1.

Choosing the external parameterγso that

cosγ= 1

p1 +η2/CD2 , sinγ= η p1 +η2/CD2 , which results in

θ(t+1)i= 1 C(t+1)

θ(t)i −η|∇o(θ(t))i , where the normalisation term is given by

C(t+1)= 1−2ηhψθ(t)|D|ψθ(t)i+η hψθ(t)|D2θ(t)i.

This encodes the classical vector

θ(t+1)= 1

C(t+1)(t)−η(∇o(θ(t)),

which is exactly a gradient descent update which is normalised to unit length. With the choice of the free parameterγ, and noting that hψθ(t)|D|ψθ(t)iis the inner product of the current state with the gradient andhψθ(t)|D2θ(t)ithe length of the gradient, the acceptance probability can also be written as

pyes= 1

2 − 2η(θ(t))T ∇o(θ(t)) 1 +η2∇o(θ(t))T ∇o(θ(t)). Series expansion of this expression reveals

pyes= 1

2−2η(θ(t))T ∇o(θ(t)) +O(η2), which is sufficiently large for small enoughη.

Note that the cost for one step in the quantum gradient descent method depends on the time to implement and apply the operator D, and the preparation for copies of the initial state. If no specific guess for the initial state is given, it can be chosen conveniently so that time is at worst linear in the number of qubits or O(logN). Each update step furthermore has a probability of 1−pyes to fail.

8.2.2 How to calculate the gradient in amplitude encoding

The following shows how to implement the operatorD|ψθi=|ψiused for the quantum gradient descent step above. The indextindicating the current step is omitted for readability, and ∇ is a shorthand for∇o(θ(t)) (the gradient of the objective function is always evaluated at the current point).

As explained before, to implementD|ψθione can evolve the quantum system with a Hamiltonian corresponding toD, eiD∆tθi, and use the techniques from [90] to writeD’s eigenvalues into the amplitudes. The problem is that we cannot guaranteeD to be sparse in matrix representation, and again the solution is to resort to the density matrix exponentiation technique, however with a novel variation. Instead of simulating a swap operator, as-sparse operatorMDwhich is specifically constructed for the purpose will be used to exponentiateD:

MD=1 2

X

j

X

α

 O

i6=j

Aαi

⊗

(A(α)j )T +A(α)j .

This operator is similar to Equation (8.4), but sums over different permutations of theAαj. In the j’th term, the operator acting on the last Hilbert space is given by the matrixAαj.

Example 8.2.1: The operator M

D Forp= 2, we get

MD=X

α

(Aα1 ⊗Aα2 +Aα2 ⊗Aα1).

In each term of the sum, anotherAαi,i= 1...2 acts on the 2nd Hilbert space.

Applying this operator topcopies of the quantum systemρand taking the trace over all but the last system approximates the exponentiation ofD with error ∆t2,

trp1{eiMD∆tρpeiMD∆t}=eiD∆tρ eiD∆t+O(∆t2).

Proof. The operatorDfrom Equation 8.6 which we seek to implement contains the factorsθAαiθ.

If θ is interpreted as a quantum state vector and Aαi an observable, this corresponds to the expectation valuehψθ|Aαiθi= tr{Aαiρ} whereρ=|ψθihψθ|is the corresponding density matrix.

The full operatorD=P

j

P

α

Q

i6=jθAαiθ

(A(α)j )T +A(α)j

can therefore be reproduced by a quantum operator that is formally equal to

D= tr1...p1(p1)MD}

whereρ(p1)=ρ⊗. . .⊗ρ⊗ Iis the joint quantum state ofp−1 copies ofρand an identity. To show this, consider the relation

D= 1 2

X

α,j

 Y

i6=j

hx|Aαi|xi

(A(α)j )T+A(α)j

= 1 2

X

α,j

 Y

i6=j

tr{ρAαi}

(A(α)j )T +A(α)j

= 1 2

X

α,j

tr





(ρ⊗. . .⊗ρ)

| {z }

p1 times

 O

i6=j

Aαi





(A(α)j )T +A(α)j

= 1 2

X

α,j

trp1

(ρ⊗. . .⊗ρ

| {z }

p−1 times

⊗I)

 O

i6=j

Aαi

⊗

(A(α)j )T +A(α)j

= 1 2trp1

(ρ⊗. . .⊗ρ

| {z }

p−1 times

⊗I)

 X

α,j

 O

i6=j

Aαi

⊗

(A(α)j )T +A(α)j

= trp1

(ρ⊗. . .⊗ρ

| {z }

p−1 times

⊗I)MD

With this formal equality one can show that exponentiatingMD applied topcopies of the stateρ (joint with one identity) approximates the procedure of exponentiatingD and applying it to one copy ofρ.

trp1{e−iMD∆tρ⊗peiMD∆t}= trp1⊗p+i∆t[ρ⊗p, MD]}+O(∆t2)

=ρ+i∆ttrp1{[ρ⊗p, MD]}+O(∆t2)

=ρ+i∆t ρtrp1{(ρp1⊗ I)MD} −i∆ttrp1{MDp1⊗ I)}ρ+O(∆t2)

=ρ+i∆t ρtrp−1{(ρp1⊗ I)MD} −i∆ttrp−1{(ρp1⊗ I)MD}ρ+O(∆t2)

=ρ+i∆t ρD−i∆t Dρ+O(∆t2)

≈eiD∆tρ eiD∆t

Note that in general MD might be a non-Hermitian operator, but as mentioned before we can consider the Hermitian extended operator

D:=

0 MD

MD 0

, (8.9)

For the following discussion it is therefore assumed that without loss of generalityMDis Hermitian.

As explained in the last chapter and shown in Appendix A.2, we have to extend this procedure in order to prepare the state for the quantum phase estimation, for which anotherO(−3) copies of each of thepHilbert spaces is needed. Afterwards, quantum phase estimation and a postselective amplitude update complete the matrix multiplication as described repeatedly in earlier chapters.

In order to use this technique within the framework of an update step in the last section, the procedure has to be executed controlled by the ancilla in Equation 8.8. I will briefly sketch the particulars of integrating density matrix exponentiation with the updating step.

Let {|χli} be an eigenbasis of D with corresponding eigenvalues {λl} so that the current state can be written as|ψθi=P

lβlliwithβl=hχl||ψθii. At the beginning of each update we have copies of the state

(cosγ|0i+ i sinγ|1i)|ψθi. (8.10) After the density matrix exponentiation and quantum phase estimation conditioned on the |1i branch of the ancilla, we obtain:

|ψi= cosγ|0i|ψθi|0i+ i sinγ|1iX

l

βlli|λli

!

, (8.11)

Now perform the postselective amplitude update and uncompute the eigenvalue register in the

|1i-branch to arrive at the state

√1pD

cosγ|0i|ψθi+ i sinγ|1iX

l

CDλlβlli|λli

!

. (8.12)

We chose a constant CD = O(1/κD), where κD is the condition number of D. This result is equivalent to the desired state

√1pD

(cosγ|0i|ψθi+ iCDsinγ|1iD|ψθi). (8.13)

The success probability of the measurement is given by

pD= cos2γ+CD2 sin2γhψθ|D2θi. (8.14)

8.2.3 Resources needed for quantum gradient descent

The number of operations in each iteration of quantum gradient descent is defined by different subroutines:

ˆ the multiplication withDneeds of the order ofO(p1

D) repetitions to be successful

ˆ the update step needs of the order ofO(p1

yes) repetitions to be successful

ˆ density matrix exponentiation requires a number of operations that is polynomial inp, s,logN While the logarithmic dependency on the dimension in the number of operations is the main advantage of the quantum method, its main caveat lies in the number of copies of the current state required to produce a successful copy for the next iteration. Since some operations in the algorithm are only successful to a certain probability, one requires a large number of copies to make sure that at least some quantum systems have performed the computation successfully. On top of this, the accuracy with which density matrix exponentiation is performed also grows with the number of copies used. For example, if in every iteration half of the copies are consumed, the number of systems that need to be prepared in the initial state grows exponentially with the number of itera- tionsT. This point seems intrinsic to quantum iterative methods and requires further investigation.

The number of copies that are on average ‘consumed’ in one iteration of the method can be estimated as follows: Following a more refined error analysis for density matrix exponentiation with erroneous inputs presented (see Appendix B.3) we require of the order ofO(p 1

Dpyes

1 +p22t

t+1

1 2t+1) copies of the input (accurate up to errort) to thetth iteration to gain one copy with utmost error t+1.