Amplitude encoding - SWAP test - Quantum machine learning for supervised pattern recognition.

Routine 3.4.4: SWAP test

4.2 Amplitude encoding

Some quantum machine learning algorithms encode the (normalised) dataset into the amplitudes of a quantum state

|ψ_Di=

i=1 M

m=1

x^m_i |ii|mi=

i=1

|ψx^mi|mi. (4.1)

This quantum state hasN M amplitudes and resembles a classical vector containing all training inputs in a large concatenation. The training outputs can be added in an extra qubit |y^mi entangled with the|miregister.

The main advantage is that it allows the construction of algorithms with only logarithmic runtime dependency on the input dimension and/or dataset size. For example, the quantum Fourier transform manipulates the qubits polynomial in n, so that the 2ⁿ amplitudes yield the desired result. This is truly astonishing considering that the basis-2 logarithm of one billion is merely thirty! Such promises sound strange to machine learning practitioners, because simply loading the N M features from the memory hardware takes time that is of course linear in N M. And indeed, the promise of an exponential speedup only holds if state preparation can also be done in time polynomial in the number of qubitsn[121], which I will refer to here assuper-efficient. This is in fact possible in some cases, which the example of preparing a uniform superposition with n Hadamard gates clearly demonstrates. However, there are subspaces in a Hilbert space that cannot be reached from a given initial state [122], which the Grover limit illustrates beautifully. It is therefore an important and nontrivial open question which classes of relevant states for machine learning can be prepared efficiently. Similar caution is necessary for the readout of all amplitudes, which is often again linear inM, N. If the result of the computation is encoded in one amplitude ai only, the number of measurements needed to retrieve it are of the order ofO(1/|a_i|²).

4.2.1 State preparation

One, if not the major, challenge of amplitude encoding is the question of how to prepare an arbitrary quantum state|ψi= √¹

2ⁿ

P2ⁿ−1

i=0 ai|ii, P

i|ai|² = 1, for example to encode a data set.

Of particular interest are preparation routines that run polynomial in the number of qubits. I will focus the discussion on real amplitudes. Note that when we separate a complex amplitude αi= e^iφⁱai into a phase factor e^iφⁱ and a nonnegative real numberai, if we know how to prepare P

iai|iithe final state can be constructed by applying small phase rotation gates, approximately

x p(x) α100_t=3

β101_t=3

00 01 10 11 i

θk

Figure 4.3: Illustration of two super-efficient state preparation routines. Left: Grover and Rudolph’s state preparation algorithm for efficiently integrable probability distributions [125]. Af- ter stept= 2 the domain is distinguished into four regionsi2= 00,01,10,11 of size ∆x= 1/2²and amplitudes p⁽²⁾₀₀, p⁽²⁾₀₁, p⁽²⁾₁₀, p⁽²⁾₁₁. In the third step, the i2nd region (here demonstrated for i2 = 2) gets split into two. The parametersα23 andβ23 are the probability to find a random variable in the left or right part of the region. This procedure successively prepares a finer and finer discretisation of the probability distribution. Right: In thek’th step of the routine of Soklakov and Schack [124], an oracle is applied to mark the states whose probabilitiespiare larger than a certain thresholdθk

(here in red). These states are amplified with the Grover iterator, resulting in a state that looks qualitatively like the red bars only. In the next step, the threshold is lowered to include the states with a slightly smaller probability in amplitude amplification, until the desired distribution of the pi is prepared.

takingP

iai|ii →P

iaie^iφⁱ|ii(see also [123, 124]).

In the gate model of quantum computing, a number of proposals have been brought forward to prepare specific classes of states ‘super-efficiently’. For example, Grover and Rudolph suggest a scheme that is linear in the number of qubits for the case that we know an efficiently integrable one-dimensional probability distributionp(x) of which the state is a discrete (i.e. coarse-grained) representation [125]. Iteratively, alln qubits are rotated such that after the last step the desired qsampleP

i√pi|iiis prepared, where the indeximarks the interval [i∆x,(i+1)∆x] with ∆x=₂¹n. In steptthet’th qubit|0tiis rotated,

2^t−1

it=0

p^(t)_i |iti|0ti →

2^t+1−1

it+1=0

√αit|iti|0ti+p

βit|iti|1ti,

such that αi_t, [βi_t] are the probabilities for a random variable to lie in the left [right] half of region of theit’th interval of a coarse graining into 2^t regions of size ∆x = ₂¹t (see Figure 4.3 left). With each step, this process prepares an increasingly fine discretisation of the probability distributionp(x). Essentially the same idea was proposed at a similar time by Kaye and Mosca [123], who do not refer to probability distributions but demand in general that the conditional probabilityp(qk = 1|q1...q_k−1) =_α_q

1,...,qk−1,1

αq1,...,qk−1

, which is the chance that given the stateq1...q_k−1 of the previous qubits, the k’th qubit is in state 1 (or the random variable falls in the right region), is easy to compute. A state for which calculating the conditional probabilities is easy is a superposition of basis states of a certain number of nonzero qubits. However, for most states p(qk= 1|q1...qk−1) entails a number of probabilities that is exponential ink.

Soklakov and Schack [124] propose an efficient scheme to approximately prepare quantum states whose amplitudesai =√pi represent a discrete probability distributionpi, i= 1, ...,2ⁿ=N, and

all probabilitiespi are of the order of 1/ηN for 0< η < 1. The algorithm then runs polynomial in the number of qubits and in the inverse parameters η⁻¹, ⁻¹, ν⁻¹ and produces a result that with probability greater than 1−ν has an error smaller than. To sketch the basic idea (Figure 4.3 right), a series of oracles are defined that mark statesifor which pi are larger than a certain threshold, and with each oracle this threshold is lowered by a constant amount (controlling the maximum error of the final state). The states marked by k’th oracle include the states marked by any oracle before, and increases their amplitude with Grover iterations. The probability distribution is successively shaped this way. Note that to know the optimal number of Grover iterations, quantum counting has to be applied to estimate the number of marked states in the current run.

Under the condition that the states to prepare are sufficiently uniform, a more generic approach is to refer to a quantum random access memory, or technological implementations that allow to access information ‘in superposition’, by querying an index register [90, 126, 127, 128]. More precisely, a quantum random access memory implements the query 1/√

i |ii|0i →1/√ NP

i |ii|xii, retriev- ing the basis encoded quantum states |xii. A postselective amplitude update can then prepare the stateP

ixi|ii(see also [129]). However, the price to pay is that the entire routine has to be repeated if the conditional measurement fails, and the probability of successpsuccess=P

i|xi|²/N is obviously dependent on the state to encode. While a uniform distribution of thexi ensures that psuccess=P

i 1

2N = ¹₂, a sparse vector puts a lot of weight into the ‘failure branch’ of the ancilla to be measured, and in the extreme of only one nonzero entry we getpsuccess= _N¹ which is exponentially small inn. In other words, we would have to repeat the measurementO(N) times on average to prepare the correct state. Luckily, very sparse states can be prepared by other methods. (For example, for a one-sparse vector one can simply flip the qubit register from the ground state into a basis state representing the indexi.) Zhao et al. [130] therefore propose that in case ofssparse vectors, one does not apply the quantum random access memory to a uniform superposition, but a superposition only of the basis states representing nonzero indices,

√1s X

i|xi6=0

|ii.

There are many other possible ways to prepare quantum states beyond quantum circuits. An interesting perspective is offered by Aharanov et al. [114]. They present the framework of

“adiabatic state generation” as a natural (and polynomially equivalent) language to analyse state preparation in the gate model. The idea is to perform an adiabatic evolution of a quantum system initially in the ground state of a generic Hamiltonian to the ground state of a final Hamiltonian, which is the final state we wish to prepare. This translates the question of which initial states can be easily prepared to the question of which ground states of Hamiltonians are in reach of adiabatic schemes, i.e. have small spectral gaps between the ground and the first excited state. They show that given a sequence of simulable (i.e. sparse) Hamiltonians with certain properties (i.e., nonnegligible spectral gaps to ensure the transition time is polynomial, strong enough overlap between the ground states of two consecutive Hamiltonians), then there is an efficient quantum algorithm to reach the desired ground state. They further- more provide techniques to efficiently generate states connected to certain classes of Markov chains.

A similar idea is to use the unique stationary states of a dissipative process in an open quantum system allow for a generic method of state preparation. Of course, as in adiabatic state preparation the time to reach this state is crucial. For Markovian evolutions described by a Gorini-Kossakowski-Sudarshan-Lindblad master equation it can be shown that for any|ψinithere is a master equation with a relaxiation timeTrelax∝1/(γl)min where (γl)minis the smallest of the dissipation rates or couplings in the master equation [131]. Another challenge is to engineer an environment that obeys the desired dynamic equations.

In summary, quantum state preparation for amplitude encoded states relevant to machine learning algorithms is a topic that still requires a lot of attention, and claims of exponential speedups should therefore be viewed with a pinch of skepticism.

4.2.2 Readout

In amplitude encoding, the amplitudes themselves contain the result of a computation. One way to access all amplitudes of an-qubit quantum state is to execute a fullquantum state tomography to retrieve the 2ⁿ×2ⁿ entries of the corresponding density matrix. Again, only a single amplitude cannot be ‘read out’ super-efficiently in general, since it indicates the probability of a measurement result which can be exponentially small in the number of qubits. Many proposals for approximate tomography have been made, some of which even consider methods of machine learning [112, 132].

But to remain efficient regarding the number of qubits, one has to apply clever tricks to encode the desired outcome in statistical information that can be extracted by only a few measurements.

In quantum machine learning, this is often done by a swap test and its variations, which reveals the inner product of two quantum states.

Dalam dokumen Quantum machine learning for supervised pattern recognition. (Halaman 96-99)