Routine 4.3.3: Density matrix exponentiation (in superposition)
8.3 Extension to Newton’s method
the multiplication withDneeds of the order ofO(p1
D) repetitions to be successful
the update step needs of the order ofO(p1
yes) repetitions to be successful
density matrix exponentiation requires a number of operations that is polynomial inp, s,logN While the logarithmic dependency on the dimension in the number of operations is the main advantage of the quantum method, its main caveat lies in the number of copies of the current state required to produce a successful copy for the next iteration. Since some operations in the algorithm are only successful to a certain probability, one requires a large number of copies to make sure that at least some quantum systems have performed the computation successfully. On top of this, the accuracy with which density matrix exponentiation is performed also grows with the number of copies used. For example, if in every iteration half of the copies are consumed, the number of systems that need to be prepared in the initial state grows exponentially with the number of itera- tionsT. This point seems intrinsic to quantum iterative methods and requires further investigation.
The number of copies that are on average ‘consumed’ in one iteration of the method can be estimated as follows: Following a more refined error analysis for density matrix exponentiation with erroneous inputs presented (see Appendix B.3) we require of the order ofO(p 1
Dpyes
1 +p22t
t+1
1 2t+1) copies of the input (accurate up to errort) to thetth iteration to gain one copy with utmost error t+1.
on, we can use multiple copies of the current stateρ=|ψθihψθ|to perform tr1...p−1{e−iMHA∆t(ρ⊗. . .⊗ρ)⊗σeiMHA∆t}
≈e−iHA∆tσeiHA∆t. (8.17) The Lie product formula adds an error of the orderO(∆t2) that has to be considered in the analysis.
The application of the inverse Hessian for Newton’s method runs polynomial in the sparsity s0 and only increases the number of required copies by a factor of order O
1 pH
1 + pt+122t
1 3t+1
, wherepH is the equivalent topD but for the multiplication of the inverse Hessian.
In summary, the runtime reflects the exponential speed-up in the size of the inputs (here the parameter space) known from other quantum machine learning algorithms, provided theAαi are sparse and accessible via an oracle for simulation. On the downside, the share of copies consumed in every step restricts us to only perform very few iterations. This is unfortunate for general training procedures which require many iterations when searching for the optimal parameter, but may prove useful in cases where the initial guess is close to the desired optimum. An example are situations in which the parameters have been pre-trained by other methods. This appears typically in deep neural networks, where restricted Boltzmann machines or autoencoders prepare the parameter vector. Another suitable situation arises when the number of the parameters is large enough to justify the spatial resources needed for the quantum algorithm. Fast convergence in the proximity of optima is guaranteed for Newton’s method.
This study has only looked at a very specific type of objective functions, namely homogeneous polynomials of even order with unit sphere constraints. As mentioned earlier, for applications in machine learning other objective functions, as well as the effect of normalisation constraints as a possible regularisation mechanism need to be investigated further.
Quantum nearest neighbour algorithm
The first part of Section 9.1 (introducing the quantum nearest neighbour algorithm in basis encod- ing) has been published in Ref. [225]: Schuld, Sinayskiy, Petruccione (2014) Quantum Computing for Pattern Classification, Lecture Notes in Computer Science Vol 8862, Springer Verlag, pp.
208-220. I was responsible for the idea, development, analysis and write-up of the content of the publication. The second part (the quantum nearest neighbour algorithm in amplitude encoding) has been further developed together with Mark Fingerhut and the manuscript “Computing distances with quantum interference: An implementation of a weighed nearest neighbour quantum machine learning algorithm” is about to be submitted.
Thek-nearest neighbour algorithm has been introduced in Section 2.3.4.2 of Chapter 2 as one of the most simple but yet surprisingly successful classifiers. It can be formulated as the following rule:
Predict the class for a new input that most of its kclosest training vectors are assigned to. Quantum algorithms that apply Grover search tok nearest neighbour have been discussed in the literature review of Section 6.2. It has also been mentioned that some versions of the method weigh the neighbours by their distance to the new input, so that closer neighbours have more influence on the prediction than those further away. With this adaptation one takes into account the entire dataset of M inputs, which effectively means that k = M. The weighing rule then defines how fast the ‘influence’ decreases with distance and can therefore be understood as a distance measure or kernelκ(|x˜−xm|). An illustration is presented in Figure 9.1.
In this chapter I will consider a binary pattern classification problem and present two versions of a novel quantum nearest neighbour algorithm, one which represents the dataset in basis encoding and the other representing the training inputs in amplitude encoding as well as the target outputs in basis encoding (i.e., by a single qubit). In both cases the idea is to construct a quantum state in which the probability of measuring one ofM basis states corresponds to the weight assigned to the corresponding input, which in turn is a function of its distance to the new input. In other words, the amplitude distribution takes the role of the kernel function. A measurement of the output qubit then reveals the probability of the class being represented by the training inputs in close
145
x1
x2
x
˜ x1
˜ x2
Figure 9.1: Illustration of all-nearest neighbour where the neighbours are weighted by the Euclidean distance to the new input. The symbols show the 2-dimensional inputs that have each a class attribute ‘circle’ or ‘rectangle’. The new input is located at ˜x= (˜x1,x˜2), and as in the k-nearest neighbour illustration it will be classified as a circle.
proximity to the new input. Although not necessarily introducing a speed-up compared to the quantum method, the quantum machine learning algorithms are a fruitful demonstration of how to represent a distance measure in the amplitudes and are pre-designed for a comparably simple implementation in current-day hardware.