Quantum gradient descent algorithm - Density matrix exponentiation (in superposition)

Routine 4.3.3: Density matrix exponentiation (in superposition)

8.2 Quantum gradient descent algorithm

The Hessian matrix can be expressed as

H= 1 2

α=1 p

j=1

" _p X

i=1i6=j







k=1k6=i

θ^†A^(α)_k θ







(A^(α)_i )^T +A^(α)_i θ θ^T

(A^(α)_j )^T +A^(α)_j +







i=1i6=j

θ^†A^(α)_i θ







(A^(α)_j )^T +A^(α)_j

# , (8.7)

and will likewise be translated into a quantum operatorH. Proof. Similar to above.

real amplitudes. Measure the state in the basis

|yesi= 1

√2(|0i+i|1i), |noi= 1

√2(i|0i+|1i).

Measuring ‘yes’ results in a state

|ψ_θ(t+1)i= 1 p2pDpyes

cosγ |ψ_θ(t)i − 1 CD

sinγ D|ψ_θ(t)i

wherepyes is the probability of success given by

pyes= 1 2pD

cos²γ+ 1

C_D² sin²γhψθ|D^†D|ψθi −2 cosγ sinγhψθ|D|ψθi

# .

Proof. The state |ψiin yes-no basis reads

|yesihyes|+|noihno| 1

√pD

cosγ|0i|ψ_θ^(t)i+ i CD

sinγ|1iD|ψ_θ^(t)i

= 1

√pD

cosγ hyes|0i|yesi |ψ_θ(t)i+ i CD

sinγhyes|1i|yesiD|ψ_θ(t)i+ cosγ hno|0i |noi|ψ_θ(t)i+ i

sinγhno|1i|noiD|ψ_θ(t)i

= 1

√2pD

cosγ|ψ_θ^(t)i+ 1

CD sinγ D|ψ_θ^(t)i

| {z }

|ψ1i

|yesi+ 1

√2

−i cosγ|ψ_θ^(t)i+ i sinγ D|ψ_θ^(t)i

| {z }

|ψ2i

|noi

The probability|hψ1|ψ1i|²of measuring|yesiis given by pyes= 1

2pD

(cos²γ+ 1 CD

sin²γhψθ|D^†D|ψθi −cosγ sinγ(hψθ|D^†|ψθi+hψθ|D|ψθi), which for Hermitian operators can be summarised as above. The probability|hψ2|ψ2i|²of measuring|noiis given by

pno= 1 2pD

(cos²γ+ 1 CD

sin²γhψθ|D^†D|ψθi+ cosγ sinγ(hψθ|D^†|ψθi+hψθ|D|ψθi), which with the definition ofpD adds up to 1.

Choosing the external parameterγso that

cosγ= 1

p1 +η²/C_D² , sinγ= η p1 +η²/C_D² , which results in

|ψ_θ(t+1)i= 1 C^(t+1)

|ψ_θ(t)i −η|∇o(θ^(t))i , where the normalisation term is given by

C^(t+1)= 1−2ηhψ_θ(t)|D|ψ_θ(t)i+η hψ_θ(t)|D²|ψ_θ(t)i.

This encodes the classical vector

θ^(t+1)= 1

C^(t+1)(θ^(t)−η(∇o(θ^(t)),

which is exactly a gradient descent update which is normalised to unit length. With the choice of the free parameterγ, and noting that hψ_θ^(t)|D|ψ_θ^(t)iis the inner product of the current state with the gradient andhψ_θ(t)|D²|ψ_θ(t)ithe length of the gradient, the acceptance probability can also be written as

pyes= 1

2 − 2η(θ^(t))^T ∇o(θ^(t)) 1 +η²∇o(θ^(t))^T ∇o(θ^(t)). Series expansion of this expression reveals

pyes= 1

2−2η(θ^(t))^T ∇o(θ^(t)) +O(η²), which is sufficiently large for small enoughη.

Note that the cost for one step in the quantum gradient descent method depends on the time to implement and apply the operator D, and the preparation for copies of the initial state. If no specific guess for the initial state is given, it can be chosen conveniently so that time is at worst linear in the number of qubits or O(logN). Each update step furthermore has a probability of 1−pyes to fail.

8.2.2 How to calculate the gradient in amplitude encoding

The following shows how to implement the operatorD|ψθi=|ψ_∇iused for the quantum gradient descent step above. The indextindicating the current step is omitted for readability, and ∇ is a shorthand for∇o(θ^(t)) (the gradient of the objective function is always evaluated at the current point).

As explained before, to implementD|ψθione can evolve the quantum system with a Hamiltonian corresponding toD, e^iD∆t|ψθi, and use the techniques from [90] to writeD’s eigenvalues into the amplitudes. The problem is that we cannot guaranteeD to be sparse in matrix representation, and again the solution is to resort to the density matrix exponentiation technique, however with a novel variation. Instead of simulating a swap operator, as-sparse operatorMDwhich is specifically constructed for the purpose will be used to exponentiateD:

MD=1 2



 O

i6=j

A^α_i



⊗

(A^(α)_j )^T +A^(α)_j .

This operator is similar to Equation (8.4), but sums over different permutations of theA^α_j. In the j’th term, the operator acting on the last Hilbert space is given by the matrixA^α_j.

Example 8.2.1: The operator M

D Forp= 2, we get

MD=X

(A^α1 ⊗A^α2 +A^α2 ⊗A^α1).

In each term of the sum, anotherA^α_i,i= 1...2 acts on the 2nd Hilbert space.

Applying this operator topcopies of the quantum systemρand taking the trace over all but the last system approximates the exponentiation ofD with error ∆t²,

trp−1{e⁻îM^D^∆tρ^⊗^peîM^D^∆t}=e⁻îD∆tρ eîD∆t+O(∆t²).

Proof. The operatorDfrom Equation 8.6 which we seek to implement contains the factorsθ^†A^α_iθ.

If θ is interpreted as a quantum state vector and A^α_i an observable, this corresponds to the expectation valuehψθ|A^α_i|ψθi= tr{A^α_iρ} whereρ=|ψθihψθ|is the corresponding density matrix.

The full operatorD=P

i6=jθ^†A^α_iθ

(A^(α)_j )^T +A^(α)_j

can therefore be reproduced by a quantum operator that is formally equal to

D= tr1...p−1{ρ^⊗^(p⁻¹⁾MD}

whereρ^⊗^(p⁻¹⁾=ρ⊗. . .⊗ρ⊗ Iis the joint quantum state ofp−1 copies ofρand an identity. To show this, consider the relation

D= 1 2

α,j



 Y

i6=j

hx|A^α_i|xi





(A^(α)_j )^T+A^(α)_j

= 1 2

α,j



 Y

i6=j

tr{ρA^α_i}





(A^(α)_j )^T +A^(α)_j

= 1 2

α,j







(ρ⊗. . .⊗ρ)

| {z }

p−1 times



 O

i6=j

A^α_i











(A^(α)_j )^T +A^(α)_j

= 1 2

α,j

trp−1







(ρ⊗. . .⊗ρ

| {z }

p−1 times

⊗I)







 O

i6=j

A^α_i



⊗

(A^(α)_j )^T +A^(α)_j











= 1 2trp−1







(ρ⊗. . .⊗ρ

| {z }

p−1 times

⊗I)



 X

α,j



 O

i6=j

A^α_i



⊗

(A^(α)_j )^T +A^(α)_j











= trp−1







(ρ⊗. . .⊗ρ

| {z }

p−1 times

⊗I)MD







With this formal equality one can show that exponentiatingMD applied topcopies of the stateρ (joint with one identity) approximates the procedure of exponentiatingD and applying it to one copy ofρ.

trp−1{e^−iM^D^∆tρ^⊗pe^iM^D^∆t}= trp−1{ρ^⊗p+i∆t[ρ^⊗p, MD]}+O(∆t²)

=ρ+i∆ttrp−1{[ρ^⊗p, MD]}+O(∆t²)

=ρ+i∆t ρtrp−1{(ρ^⊗^p⁻¹⊗ I)MD} −i∆ttrp−1{MD(ρ^⊗^p⁻¹⊗ I)}ρ+O(∆t²)

=ρ+i∆t ρtr_p−1{(ρ^⊗^p⁻¹⊗ I)MD} −i∆ttr_p−1{(ρ^⊗^p⁻¹⊗ I)MD}ρ+O(∆t²)

=ρ+i∆t ρD−i∆t Dρ+O(∆t²)

≈e⁻^iD∆tρ e^iD∆t

Note that in general MD might be a non-Hermitian operator, but as mentioned before we can consider the Hermitian extended operator

MˆD:=







0 MD

M_D^† 0





, (8.9)

For the following discussion it is therefore assumed that without loss of generalityMDis Hermitian.

As explained in the last chapter and shown in Appendix A.2, we have to extend this procedure in order to prepare the state for the quantum phase estimation, for which anotherO(⁻³) copies of each of thepHilbert spaces is needed. Afterwards, quantum phase estimation and a postselective amplitude update complete the matrix multiplication as described repeatedly in earlier chapters.

In order to use this technique within the framework of an update step in the last section, the procedure has to be executed controlled by the ancilla in Equation 8.8. I will briefly sketch the particulars of integrating density matrix exponentiation with the updating step.

Let {|χli} be an eigenbasis of D with corresponding eigenvalues {λl} so that the current state can be written as|ψθi=P

lβl|χliwithβl=hχl||ψθii. At the beginning of each update we have copies of the state

(cosγ|0i+ i sinγ|1i)|ψθi. (8.10) After the density matrix exponentiation and quantum phase estimation conditioned on the |1i branch of the ancilla, we obtain:

|ψi= cosγ|0i|ψθi|0i+ i sinγ|1iX

βl|χli|λli

, (8.11)

Now perform the postselective amplitude update and uncompute the eigenvalue register in the

|1i-branch to arrive at the state

√1pD

cosγ|0i|ψθi+ i sinγ|1iX

CDλlβl|χli|λli

. (8.12)

We chose a constant CD = O(1/κD), where κD is the condition number of D. This result is equivalent to the desired state

√1pD

(cosγ|0i|ψθi+ iCDsinγ|1iD|ψθi). (8.13)

The success probability of the measurement is given by

pD= cos²γ+C_D² sin²γhψθ|D²|ψθi. (8.14)

8.2.3 Resources needed for quantum gradient descent

The number of operations in each iteration of quantum gradient descent is defined by different subroutines:

the multiplication withDneeds of the order ofO(_p¹

D) repetitions to be successful

the update step needs of the order ofO(_p¹

yes) repetitions to be successful

density matrix exponentiation requires a number of operations that is polynomial inp, s,logN While the logarithmic dependency on the dimension in the number of operations is the main advantage of the quantum method, its main caveat lies in the number of copies of the current state required to produce a successful copy for the next iteration. Since some operations in the algorithm are only successful to a certain probability, one requires a large number of copies to make sure that at least some quantum systems have performed the computation successfully. On top of this, the accuracy with which density matrix exponentiation is performed also grows with the number of copies used. For example, if in every iteration half of the copies are consumed, the number of systems that need to be prepared in the initial state grows exponentially with the number of itera- tionsT. This point seems intrinsic to quantum iterative methods and requires further investigation.

The number of copies that are on average ‘consumed’ in one iteration of the method can be estimated as follows: Following a more refined error analysis for density matrix exponentiation with erroneous inputs presented (see Appendix B.3) we require of the order ofO(_p ¹

Dp_yes

1 +^p²²^t

t+1

1 ²_t+1) copies of the input (accurate up to errort) to thetth iteration to gain one copy with utmost error t+1.

Dalam dokumen Quantum machine learning for supervised pattern recognition. (Halaman 154-159)