Routine 4.3.3: Density matrix exponentiation (in superposition)
5.1 Time complexity
The concept of time complexity has already been used extensively in the previous sections, where it was assumed that the reader is familiar with the basic concepts. It shall be explained in more detail here. The runtime of an algorithm on a computer is the time it takes to execute the algorithm, in other words, the number of elementary operations multiplied by their respective execution times. In conventional computers, the number of elementary operations could in theory still be established by choosing the fastest implementations for every subroutine and count the logic gates. However, with a fast technological advancements in the IT industry it becomes obvious that a device-dependent runtime is not a suitable theoretical tool to measure the general speed of an algorithm. This is why computational complexity theory looks at the asymptotic complexity or the rate of growth of the runtime with the size of the input digits. (Asymptotic refers to the fact that one is interested in laws for sufficiently large inputs only.) If the resources needed to execute an algorithm or the number of elementary operations grow polynomially with the size of the input, it is tractable and the problem in theory efficiently solvable. Exponential
1Thanks to a talk by Vedran Dunjko at the Quantum Machine Learning Workshop in South Africa in July 2016.
87
growth makes an algorithmintractable and the problemhard.
An illustrative example for an intractable problem is guessing the number combination for the security code of a safe (which, without further structure requires a brute force search): While a code of two digits only requires 100 attempts in the worst case (and half of those guesses in the average case), a code of 10 digits requires ten billion guesses in the worst case, and a code of 30 digits has more possible configurations than our estimation for the number of stars in the universe. No advancement in computer technology could possibly crack this code.
When thinking about quantum computers, estimating the actual runtime of an algorithm is even more problematic. Not only do we not have a unique implementation of qubits yet that gives us a set of elementary operations to work with (as well as their time scales), but even if we decided on a set of elementary gates, it is nontrivial to decompose quantum algorithms into this set.2 In many cases, we can claim that we know there is such a sequence of elementary gates, but it is by no means clear how to construct it.
Due to the lack of fault-tolerant universal quantum computing, the vast majority of authors in quantum information processing are therefore interested in the asymptotic complexity of their routines, and how they compare to classical algorithms. The field ofquantum complexity theory has been developed as an extension to classical complexity theory [141, 142] and is based on the question whether quantum computers are in principle able to solve computational problems faster in relation to the time complexity. (In the following, complexity will always refer to the asymptotic runtime complexity unless stated otherwise.) A runtime advantage in this context is called a quantum enhancement, or simply a quantum speed-up. The termquantum supremacy also gained popularity among researchers to refer to applications that no classical computer could possibly compute.
A collaboration of researchers at the forefront of benchmarking quantum computers came up with a useful typology for the term of ‘quantum speed-up’ [143].
1. Aprovable quantum speed-up requires a proof that there can be no classical algorithm that performs as well or better than the quantum algorithm. A provable speed-up has been demonstrated for Grover’s algorithm, which scales quadratically better than classical [64]
given the oracle to mark the desired state [144].
2. Astrong quantum speed-up compares the quantum algorithm with the best known classical algorithm. The most famous example of a strong speed-up is Shor’s quantum algorithm to factorise numbers in time growing polynomially (instead of exponentially) with the number of digits of the prime number, which due to far-reaching consequences for cryptography systems gave rise to the first major investments into quantum computing.
3. If we relax the ‘best’ classical algorithm (which is often not known) to the ‘best available’
classical algorithm we get the definition of acommon quantum speed-up.
4. The fourth category ofpotential quantum speed-up relaxes the conditions further by compar- ing two specific algorithms and relating the speed-up to this instance only.
2Thanks to Matthias Troyer for his talk at the Quantum Research Group in October 2016 for this insight.
5. Lastly, and useful when doing benchmarking with quantum annealers, is the limited quan- tum speed-up that compares to “corresponding” algorithms such as quantum and classical annealing.
Although the holy grail of quantum computing (and consequently quantum machine learning) remains to find a provable exponential speed-up, the wider definition of a quantum advantage in terms of the asymptotic complexity opens a lot more avenues for realistic investigations. Two common pitfalls have to be avoided. Firstly, quantum algorithms often have to be compared with classical sampling, which is likewise nondeterministic and has close relations to quantum computing (see Section 3.4.4). Secondly, some claim that complexity is hidden in spatial resources, for example the resources to implement a quantum random access memory, and that quantum computers have to be compared to cluster computing, which will surely be its main competitor in years to come [145, 143].
I want to introduce some particulars of how to formulate the asymptotic complexity in relation to quantum machine learning algorithms. The ‘size’ of the input in the context of machine learning usually refers to the number of data pointsM as well as the number of featuresN. When dealing with sparse inputs that can be represented more compactly, the latter is sometimes replaced by the maximum numbers of nonzero elements in a training input. The complexity commonly considers other numbers of merit for the analysis: The errorof a valuezis the precision to which the corresponding value z0 calculated by the algorithm is correct, = ||z−z0|| (with a suitable norm). When dealing with matrices, the condition number κ, which is the ratio of the largest and the smallest eigenvalue or singular value, is sometimes of interest. Many quantum machine learning algorithms have a chance of failure, for example because of a conditional measurement.
In this case the average number of attempts required to execute it successfully needs to be taken into account as well. A success probability ofpsgenerally leads to a factor of 1/psin the runtime (for example, if it will only succeed in 1% of the cases, one has to repeat it on average for 100 times).
To express the asymptotic complexity, the runtime’s dependency on these variables is expressed in the ”big-O” notation (see Figure 5.1):
O(g(n)) means that the running time has an upper bound of λg(n) for someλ∈Rand the inputn.
Ω(g(n)) means that the running time has a lower bound of λg(n) for some λ∈R and the inputn.
Θ(g(n)) means that the running time has a lower bound ofλ1g(n) and an upper bound of λ2g(n) for someλ1, λ2∈Rand the inputn.
Having introduced time complexity, the question remains what speed-ups can be detected specifically in quantum-enhanced machine learning. Of course, machine learning is based on common computational routines such as search or matrix inversion, and quantum machine learning derives its advantages from tools developed by the quantum information processing community. It is therefore no surprise that the speed-ups achieved in quantum-enhanced machine learning are directly derived from the tool box of quantum information processing. Roughly speaking (and with more details in the next chapter), three different types of speed-ups are claimed:
λg(x)
f(x) f(x)∈ O(g(x))
λg(x) f(x)
f(x)∈Ω(g(x)) λ1g(x)
f(x) λ2g(x) f(x)∈Θ(g(x))
Figure 5.1: Illustration of the big-O notation. If a functionf(x) (in this contextf is the runtime andxthe input) is ‘in’O(g(x)), there exists aλ∈Rsuch that|f(x)| ≤λg(x) for large enoughx.
The Ω symbol stands for the inequality|f(x)| ≥λg(x), while the Θ symbol signifies that there are twoλ1< λ2∈Rsuch thatf(x) lies between the functionsλ1g(x) andλ2g(x).
1. Provable quadratic quantum speed-ups derive from variations of Grover’s algorithm applied to search problems, as outlined in more detail in Section 6.2 of the following chapter. Learning can always be understood as a search in a space of hypotheses [146]. Examples of such speed- ups are Wiebe et al. [147] who search for discrimination hyperplanes for a perceptron in their representation as points on a sphere, and Low et al. [148] who prepare quantum states that capture the probability distribution of Bayesian nets.
2. Exponential speed-ups are naturally more tricky, even if we only look at the categories of either strong orcommon speed-ups. Within the discipline, they are usually only claimed by quantum machine learning algorithms that execute linear algebra computations in amplitude encoding (see Section 6.1). But these come with serious conditions, for example about the sparsity or redundancy of the dataset, which may allow for very fast classical algorithms as well. The question of state preparation furthermore excludes many states from super-efficient preparation.
3. More specific comparisons occur when quantum annealing is used to solve optimisation prob- lems derived from machine learning tasks [149], or to prepare a quantum system in a certain distribution [150]. In summary, it is not clear yet whether any provable speed-ups will occur, but benchmarking against specific classical algorithms show that there are problems where quantum annealing can be of advantage (see Section 6.3 and 6.4). While offering only lim- ited theoretical prediction, these approaches have the advantage to be testable in the lab and serve as vehicles to further our understanding of present-day implementations.
The next chapter will investigate this summary in more detail, which will also be the focus of the original contributions in Part III.