The quantum linear regression algorithm - Density matrix exponentiation (in superposition)

Routine 4.3.3: Density matrix exponentiation (in superposition)

7.2 The quantum linear regression algorithm

The quantum linear regression algorithm is supposed to end up in a quantum state from which the value for ˜y given in Equation 7.4 can be extracted by measurement. In order to harvest a potential exponential speedup the information will be processed in amplitude encoding. I follow the approach to assume that all classical information (i.e., the input data matrix X, the output data vectoryas well as the new input ˜x) is encoded into the amplitudes of three quantum states

|ψXi,|ψyiand |ψ˜xi. Visit Section 4.2 for a dicussion of the costs of state preparation that have to be added to the runtime considerations. The number of qubits needed to represent the M training input vectors with N entries each, M training outputs and N entries of the new input is 2dlogNe+ 2dlogMe. The central advantage of the quantum algorithm arises only if it remains

polynomial in the number of qubits (or “super-efficient”), thereby taking time logarithmic in the size of the training set and the dimension of the inputs.

7.2.1 Summary

In order to ‘compute’ ˜y we need to extract and invert the singular values of X, which are at the same time the eigenvalues of the Hermitian matrix X^†X. As explained in Section 3.4.5, this can be done if we simulate a Hamiltonian corresponding toX^†X, write the eigenvalues into basis encoding via phase estimation and perform a postselective amplitude update to invert the eigenvalues. For the Hamiltonian simulation to be efficient in general, X^†X has to be sparse, which poses a severe restriction on the data, and it could be argued that for sparse data, classical methods might exist to find the inverse in logarithmic time as well. This is why I will follow a different strategy and apply the density matrix exponentiation Routine 4.3.2. As a reminder, this technique shows how to efficiently simulate a Hamiltonian corresponding to an arbitrary density matrix, or to ‘exponentiate a density matrix’.

The input to the algorithm are the three quantum states

|ψXi=

N−1

j=0 M−1

m=0

x^(m)_j |ji|mi, (7.5)

|ψyi=

M−1

µ=0

y^(µ)|µi, (7.6)

|ψx˜i=

N−1

γ=0

xγ|γi, (7.7)

where the amplitudes are real and normalised asP

m,j(x^(m)_j )²=P

µ(y^(µ))²=P

γ(˜xγ)²= 1. The first state represents the data matrix of training inputs and we require a number of copies of it that grows with the required accuracy of the result.

Given|ψXiit turns out to be trivial to get a density matrixρXX^†that is entrywise similar toXX^† and which can be ‘applied’ to|ψXiin order to extract the eigenvalues ofXX^†. These eigenvalues can then be used to invert the singular values with known techniques, and in a last step I will use

|ψyiand|ψx˜ito arrive at a quantum state that carries the desired output 7.4 as an off-diagonal element of a qubit. The steps and corresponding techniques are summarised below and will be explained in further detail in the remainder of this section.

Step 1: Exponentiate ρXX^† and apply to data state using the techniques of quantum principal component analysis (Routine 4.3.2)

Step 2: Extract the eigenvalues viaquantum phase estimation (Routine 3.4.2)

Step 3: Invert the eigenvalues through apostselective amplitude update(Routine 3.4.3) Step 4: Predict the new output with aninterference circuit (Routine 3.4.4)

In order to make the next steps visible, one needs to reformulate|ψXi in the eigenbasis ofXX^†

andX^†X, which consists of the singular vectors we are looking for:

|ψXi=

N−1

j=0 M−1

m=0

x^(m)_j |ji|mi,

N−1

j=0 M−1

m=0 R

r=1

σrumrvrj|ji|mi,

r=1

σr|ψvri|ψuri,

where |ψvri = PJ

j=1(vj)r|ji and |ψuri = PM

m=1(um)r|mi. This reformulation is known in quantum information as theSchmidt decomposition.

Now consider the corresponding density matrixρX =|ψXihψX|. Excluding the |miregister from the description is mathematically implemented by a trace operation over that register, leading to

ρX^†X=

j,j⁰=1 M

m=1

x^(m)_j x^(m)_j0 |jihj⁰|,

which is entrywise equivalent toX^†X. One could alternatively trace out the j register and use XX^† with the same result, as they share the eigenvalues; the choice depends on whetherN > M or not.

7.2.2 Density matrix exponentiation and eigenvalue extraction (Step 1)

Density matrix exponentiation is based on the evolution of a swap operator applied to two copies ofρX^†X (i.e., two quantum systems in this state), and tracing out the second copy results in an effective simulation of a Hamiltonian that corresponds toρX^†X up to an error of order ∆t² (which is chosen to be small by simulating the swap operator for a short time only):

tr2

e^−iS∆t(ρX^†X⊗ρX^†X)e^iS∆t =e^−iH^ρ^∆t+O(∆t²). (7.8) ApplyingHX^†Xto |ψXievaluates to

r=1

σre^−iλ^r^∆t|ψvri|ψuri.

In order to apply the quantum phase estimation technique, we need to be able to apply powers of the Hamiltonian entangled with an index register,

k=1 R

r=1

σr|kie⁻^ikλ^r^∆t|ψv_ri|ψu_ri.

and Routine 4.3.3 shows how to use several copies ofρX^†X joined with an index register, X

|kihk| ⊗ρX^†X⊗ρ⁽¹⁾_X†X⊗. . .⊗ρ^(N)_X†X

and simulate a product of 2-qubit swap operators which swap the first state ρX^†X with the gth copyρ^(g)_X†X, where the product runs up to copynfor each term in the superposition:

1 2^q

k=1

|kihk| ⊗

g=1

e⁻^iS^g^∆t.

The density matrix exponentiation routine requires a large amount of copies of the input state

|ψXifor the amended version that prepares the state for phase estimation in the next step. More precisely, if is the error we allow for the final state (i.e. ||ρdesired−ρfinal|| < ), the number of copies is of the orderO(⁻³). Only if X^†Xis dominated by a few large eigenvalues, the routine is logarithmic in the dimensions of the dataset (i.e. the number of amplitudes): It takes time t = O(1/δ) to simulate e^iHt for a Hamiltonian H up to error δ [90], and with the trick from [140] it takes time t² to do the same for e^iρt. This means that if we want to resolve relatively uniform eigenvalues of the order of 1/N, time grows quadratically with N and the logarithmic dependency is lost. This means that the method is only “super-efficient” if the density matrix has a low-rank approximation, which is true if there is a large amount of redundancy in the dataset.

One important point to consider here is that for low-rank-approximations of the data matrix, also classical algorithms can be optimised to perform a lot better at matrix inversion. A thorough comparison of the nature of the speedup would therefore be an important future investigation.

7.2.3 Eigenvalue extraction (Step 2)

The quantum phase estimation algorithm applied to the result of the previous step leads to a state

r=1

σr|ψvri|ψuri|λri,

in which the eigenvaluesλr= (σr)²ofρx^†xare approximately encoded in theτ qubits of an extra third register that was initially in the ground state.

7.2.4 Eigenvalue inversion (Step 3)

A postselective amplitude update then yields

r=1

σr|ψvri|ψuri|λri



 s

1− c

λr

|0i+ c λr|1i



.

The constantc is chosen so that the inverse eigenvalues are not larger than 1, which is given if it is smaller than the smallest nonzero eigenvalueλ^minofX^†X, or equivalently, the smallest nonzero squared singular value (σ^min)²ofH. We perform a conditional measurement on the ancilla qubit, only continuing the algorithm (‘accepting’) if the ancilla is in state|1i (else the entire procedure has to be repeated). Considering that (σr)²=λr and denoting the probability of acceptance by

p(1) =P

c λr

the result of this step is

1 pp(1)

r=1

σr|ψv_ri|ψu_ri|λri.

Uncomputing and discarding the eigenvalue register gives 1

pp(1)

r=1

σr|ψv_ri|ψu_ri, which already contains all necessary elements for the final solution.

The singular value inversion procedure determines the runtime’s dependency on the condition number of X, κ= σ^max(σ^min)⁻¹. The probability to measure the ancilla in the excited state is given by

p(1) =X

c λr

≤R

λ^min λ^max

=Rκ⁻⁴,

which means one needs on average less than κ⁴ tries to accept the conditional measurement.

Amplitude amplification as in [90, 126] reduces this to a factor ofO(κ²) in the runtime, which can become significant for matrices that are close to being singular.

7.2.5 Prediction (Step 4)

In order to calculate the new output ˜y, the inner products between |ψvri and |ψx˜i as well as between |ψu_ri and |ψyi have to be evaluated. As outlined in Section 3.4.6, the usual strategy would be a swap routine which reveals the abolute square of an inner product. The problem with the swap routine is that it does not reveal the sign of the inner product, which might be important for the task at hand, especially if the data is preprocessed to a zero mean. A simple trick can help, which is to ‘add an amplitude’ fixed to the value√

0.5 to all quantum states, which will in most applications not even require to add a qubit when the dimension of the Hilbert space 2ⁿ is larger than the dimension of the data matrix and we ‘fill up’ with zero amplitudes. An inner product of such two quantum states will always have a term of 0.5 stemming from the product of the two additional amplitudes, and the sum of all other terms has to be between [−0.5,0.5].

Therefore, the additional amplitude shifts the result of the inner product from the interval [−1,1]

to the interval [0,1] and reveals negative outcomes.

A slightly different strategy can be followed if we can implement the entire routine above including state preparation controlled by an additional qubit to get

1 pp(1)

r=1

σr|ψvri|ψuri|0i+|ψyi|ψ˜xi|1i.

If we trace out all registers except from the ancilla, the offdiagonal elementsρ12, ρ21of the ancilla’s density matrix read

c 2p

p(1) X

1 σr

(vj)rx˜j

(u^m)ry^(m),

0 5 10 15 20 25 30 35 40 Condition number

0.00 0.05 0.10 0.15 0.20 0.25 0.30

Distribution

2x2 5x5 10x10 20x20

0 5 10 15 20 25 30 35 40

Condition number 0.0

0.1 0.2 0.3 0.4 0.5

Distribution

2x3 4x7 10x15 19x22

Figure 7.1: Distribution of condition numbers for matrices of different dimensions with entries drawn uniformly at random from [−1,1]. For square matrices (left) the mean of the distribution increases with the dimension, while for rectangular matrices (right) the increase is much slower, and curiously enough in all simulations performed sublinear in the smallest dimension. The growth of the condition number with the dataset has to be taken into account if the “super-efficient” speedup is to be maintained. Note that the distributions remain the same if sparse or symmetric matrices are generated.

and contain the desired result (7.4) up to a known normalisation factor ^c

2√

p(1) as well as the rescaling factor|X|⁻¹|x˜|⁻¹|y|⁻¹stemming from the initial normalisation of the data that allowed us to encode it into quantum states. Whichever way is chosen, the prediction step is linear in the number of qubits used.

In summary, the upper bound for the runtime of the quantum linear regression algorithm can be roughly estimated as O(logN κ²⁻³) with the error , condition number κand input dimension N. For the swap test, a factor logM for the swap operator has to be considered. Remember that this does not include the costs of quantum state preparation in case the algorithm processes classical information. The quantum algorithm is “super-efficient” if the condition number scales logartihmically in the dimensions of the dataset, which as Figure 7.1 suggests is not always the case.

Compared to the previous result by Wiebe, Braun and Lloyd, this version boasts an improvement of a factorκ⁻⁴ whereas the dependence on the accuracy is worse by a factor⁻². The algorithm can efficiently be applied to nonsparse, but low rank approximations of the matrixX^†X.

Dalam dokumen Quantum machine learning for supervised pattern recognition. (Halaman 138-143)