Routine 4.3.3: Density matrix exponentiation (in superposition)
5.2 Sample complexity
λg(x)
f(x) f(x)∈ O(g(x))
λg(x) f(x)
f(x)∈Ω(g(x)) λ1g(x)
f(x) λ2g(x) f(x)∈Θ(g(x))
Figure 5.1: Illustration of the big-O notation. If a functionf(x) (in this contextf is the runtime andxthe input) is ‘in’O(g(x)), there exists aλ∈Rsuch that|f(x)| ≤λg(x) for large enoughx.
The Ω symbol stands for the inequality|f(x)| ≥λg(x), while the Θ symbol signifies that there are twoλ1< λ2∈Rsuch thatf(x) lies between the functionsλ1g(x) andλ2g(x).
1. Provable quadratic quantum speed-ups derive from variations of Grover’s algorithm applied to search problems, as outlined in more detail in Section 6.2 of the following chapter. Learning can always be understood as a search in a space of hypotheses [146]. Examples of such speed- ups are Wiebe et al. [147] who search for discrimination hyperplanes for a perceptron in their representation as points on a sphere, and Low et al. [148] who prepare quantum states that capture the probability distribution of Bayesian nets.
2. Exponential speed-ups are naturally more tricky, even if we only look at the categories of either strong orcommon speed-ups. Within the discipline, they are usually only claimed by quantum machine learning algorithms that execute linear algebra computations in amplitude encoding (see Section 6.1). But these come with serious conditions, for example about the sparsity or redundancy of the dataset, which may allow for very fast classical algorithms as well. The question of state preparation furthermore excludes many states from super-efficient preparation.
3. More specific comparisons occur when quantum annealing is used to solve optimisation prob- lems derived from machine learning tasks [149], or to prepare a quantum system in a certain distribution [150]. In summary, it is not clear yet whether any provable speed-ups will occur, but benchmarking against specific classical algorithms show that there are problems where quantum annealing can be of advantage (see Section 6.3 and 6.4). While offering only lim- ited theoretical prediction, these approaches have the advantage to be testable in the lab and serve as vehicles to further our understanding of present-day implementations.
The next chapter will investigate this summary in more detail, which will also be the focus of the original contributions in Part III.
membership oracle
quantum membership
oracle
example oracle
quantum example oracle x
f(x) f(x); x∼p(x)
|x,0i
|x, f(x)i
|0...0i P
x
pp(x)|x, f(x)i
Figure 5.2: Different types of oracles to determine the sample complexity of a learning algorithm.
A membership oracle takes queries for a certain input x and returns the value f(x), while an example oracle is activated and draws samples ofxfrom a certain (usually unknown) distribution p(x), again returning the value f(x). The quantum version of a membership oracle is a function evaluation on a register|xi ⊗ |0i → |xi ⊗ |f(x)i, while a quantum example oracle has been defined as the qsample of the distributionp(x).
to the effort of performing a single experiment one cannot allow for arbitrarily many data points to be generated.
Considerations about sample complexity are usually based on binary functionsf :{0,1}N → {0,1}. The sample complexity of a machine learning algorithm refers to the number of samples that are required to learn a concept from a given concept class. A concept is the rule f that divides the input space into subsets of the two class labels 0 and 1, in other words, it is the law that we want to recover with a model.
There are two important settings in which sample complexity is analysed.
1. In exact learning from membership queries [151], one learns the function f by querying a membership oracle with inputsxand receives the answer whetherf(x) evaluates to 1 or not.
2. The framework ofProbably Approximately Correct (PAC)learning was introduced by Valiant [152] and asks how many examples from the original concept are needed in the worst case to train a model so that the probability of an error (i.e., the probability of assigning the wrong label) is smaller than 1−δfor 0≤δ≤1. The examples are drawn from an arbitrary distribution via an example oracle. This framework is closely linked to the VC-dimension d of a model (see Chapter 2).
To translate these two settings into a quantum framework (see Figure 5.2), a quantum membership oracle as well as a quantum example oracle are introduced. They are in a sense parallelised versions of the classical sample generators, and with quantum interference of amplitudes this parallelism can be used to extract more information. Rather surprisingly, it turns out that the classical and quantum sample complexity are polynomially equivalent, or as stated by Servedio and Gortler [104]:
[F]or any learning problem, if there is a quantum learning algorithm which uses poly- nomially many [samples] then there must also exist a classical learning algorithm which uses polynomially many [samples].
Note that this only concerns the sample complexity; the same authors find an instance for a problem that is efficiently learnable by a quantum algorithm in terms oftime complexity, while the best classical algorithm is intractable. I want to briefly explain this finding in more detail for the two different learning frameworks and their translation to quantum computing.
5.2.1 Exact learning from membership queries
Sample complexity in relation to queries is closely related to the concept of ‘quantum query com- plexity’ which is an important figure of merit in quantum computing in the oracular setting (for example to determine the runtime of Grover’s algorithm). A quantum oracle can be described as a unitary operation
U :|xi ⊗ |si → |xi ⊗ |s⊕f(x)i, wherex∈ {0,1}n is encoded in the computational basis. 3.
Two famous quantum algorithms that demonstrate how for specific types of problems only a single quantum query can be sufficient are the Deutsch-Josza and the Bernstein-Vazirani algorithm.
They are both based on the principle of applying the quantum oracle to a register in uniform superposition, thereby querying all possible inputs in parallel. Writing the outcome into the phase and interfering the amplitudes then reveals information on the concept, for example if it was a balanced (half of all inputs map to 1) or constant (all inputs map to 1) function. Note that this does not mean that the function itself is learnt (i.e., which inputs of the balanced function map to 0 or 1 respectively) and is therefore not sufficient as an example to prove theorems on general quantum learnability.
Servedio and Gortler [104] analysed the query complexity in the quantum membership oracle setting and found that if any classC of boolean functions f{0,1}n → {0,1} is learnable from Q quantum membership queries, it is then learnable byO(nQ3) classical membership queries. This result shows that classical and quantum learnability are the same in this framework, with at most a polynomial overhead. Shortly after, Hunziker et al. [154] introduce a larger framework which they call “impatient learning” and proposed the following two conjecture on the actual number of samples required in the (asymptotic) quantum setting:
Conjecture 1: For any family of concept classesC={Ci}with|C| → ∞, there exists a quantum learning algorithm with membership oracle query complexityO(p
|C|).
Conjecture 2: For any family of concept classesC={Ci}with|C| → ∞, there exists a quantum learning algorithm with membership oracle query complexity O(log√|γC|) [where γ ≤ 1/3 is a measure of how easy it is to distinguish between concepts, and smallγindicate a harder class to learn].
The classical upper bound for exact learning from membership queries is given by O(logγ|C|).
While the first conjecture was proven by Ambainis et al. [155], the second conjecture was proven by Atici and Servedio [156] shortly after up to a log log|C|factor.
3Note that counting ‘parallel’ function evaluations by an oracle and a classically computable function is not an uncontested comparison [153]
With these results, it becomes apparent that no exponential speed-up can be expected from quan- tum sample complexity in the way defined above, while quadratic speed-ups are known to be achieved. This generalises the optimality of Grover search and is a fundamental limit to quantum computation.
5.2.2 PAC learning from examples
It is a well-established fact from classical learning theory that a (, δ)-PAC learning algorithm for a nontrivial concept classC of VC-dimensiond(see Section 2.1.2) requires at least Ω(1log1δ+d) examples, but can be learnt with at mostO(1log1δ +dlog1) examples [157, 158].
A first contribution to quantum PAC learning was made by Bhshouty and Jackson in 1998 [159], who define the important notion of a quantum example oracle. They show that a certain class of functions (so called “polynomial-size disjuncitve normal form expressions”) which are actively investigated in the PAC setting are efficiently learnable by a quantum computer from ‘quantum example oracles’ which draw examples from a uniform distribution. (Note that the PAC model requires learnability under any distribution, so this result is much more specific). The quantum example oracle is equivalent to what has been introduced earlier as aqsample,
X
i
pp(x)|x, f(x)i, (5.1)
where the probabilities of measuring the basis states in the computational basis reflects the distributionp(x) from which the examples are drawn.4
Servedio and Gortler [104] use this definition to show the equivalence of classical and quantum PAC learning in the given framework by proving that if any classC of boolean functionsf{0,1}n → {0,1} is learnable from Q evaluations of the quantum example oracle, it is then learnable by O(nQ) classical examples. Improvements in a later paper by Atici and Servedio [156] prove a lower bound on quantum learning that is close to the classical setting, i.e. that any (, δ)-PAC learning algorithm for a concept class of VC-dimensiondmust make at least Ω(1log1δ +d+√d) calls to the quantum example oracle from Equation (5.1). This was again improved by Arunachalam et al. [160] to finally show that in the PAC setting, quantum and classical learners require the same amount of examples.
5.2.3 Introducing noise
Even though the evidence suggests that classical and quantum sample complexity are similar up to at most polynomial factors, an interesting observation derives from the introduction of noise into the model. Noise refers to corrupted query results or examples for which the valuef(x) is flipped with probabilityµ. For the quantum example oracle, noise is introduced by replacing it
4Their algorithm interfers the amplitudes of the qsample via a quantum Fourier transform, thereby effectively changing the distribution from which to sample and leaving some question whether the comparison to an external example generator is fair. However, they show that while the quantum example oracle can be simulated by a membership query oracle, this is not true vice versa, i.e. the ‘quantum example oracle’ cannot be used to perform arbitrary membership queries efficiently (which is clear from the Grover limit, since one would have to transform an arbitrary superposition into a certain measurement of thexquery). It seems therefore that the quantum example oracle ranges somewhere between a query and an example oracle.
with a mixture of the original qsample and a corrupted qsample weighted byµ.
In the PAC setting with a uniform distribution investigated by Bshouty and Jackson, the quantum algorithm can still learn the function by consuming more examples, while noise is believed to render the classical problem unlearnable [159]. A similar observation is made by Cross, Smith and Smolin [161], who consider the problem of learningnbit parity functions by membership queries and examples, a task that is easy both classically and quantumly, and in the sample as well as time complexity sense. (Parity functions evaluate to 1 if and only if the number of 1s in the input string is odd). However, they find that in the presence of noise, the classical case becomes intractable while the quantum samples required only grow logarithmically. To ensure a fair comparison, the classical oracle is modelled as a dephased quantum channel of the membership oracle and the quantum example oracle respectively, and as a toy model they consider a slight adaptation of the Bernstein-Vazirani algorithm.
These observations are evidence that a fourth category of potential quantum advantages, the robustness against noise, could be a fruitful avenue for further research.