Quantum algorithm - Near-term quantum simulation of wormhole teleportation 43

Chapter III: Near-term quantum simulation of wormhole teleportation 43

A.3 Quantum algorithm

less than or equal to Kˆ_NTK(1 −) in magnitude. There exists sufficiently large data dimension dof orderO(logn)such that the inner product(˜k_NTK)^T_∗y approximates (k_NTK)^T_∗y up to O(1/poly n) error.

Proof. Per Assumption A.1.3, there exists constant such that for all i in the -neighborhood N = {i : 1−x_i ·x∗ ≥ }, we have y_i = y∗ with high probability. Moreover, since the data is approximately uniformly distributed within the -neighborhood, |N| can be approximated to be proportional to the area of a spherical cap on a d-dimensional sphere.

We now consider a similar argument for ⁰. Note that if ⁰ is sufficiently small that no examples in the training set satisfy x∗ ·x_i ≥ 1−⁰, then no truncation occurs and the theorem is trivially satisfied. Truncation occurs beyond magnitude Kˆ_NTK(1−⁰). Since0< ⁰ < <1, the error ξ introduced by truncation is given by the excess magnitude within the ⁰-neighborhood N⁰, i.e. ξ ∼ P

i∈N0

NTK(xi,x∗) KˆNTK(1−⁰) −1

. Applying the matrix element bounds of Theorem A.1.8 and Corollary A.1.5, we have

i∈N0

K_NTK(x_i,x∗)

Kˆ_NTK(1−⁰) ≤ |N⁰| · Kˆ_NTK(1−δ)

Kˆ_NTK(1−⁰) ≤O(1)|N⁰| · 1/n²

(1−⁰)(δ/n)^O(1). (A.27) Since δ = Ω(1/poly n), the error is upper-bounded by a term scaling like

|N⁰| ·n^O(1). As argued for N, the size |N⁰| is well-approximated by the area of a spherical cap. Writing such an area in terms of regularized beta functions, we find fractional error

|N⁰|

|N| ·n^O(1) ≈ I1−(1−⁰)²((d−1)/2,1/2)

I₁₋₍₁₋₎²((d−1)/2,1/2) ·n^O(1) ≤ ⁰

(d−1)/2

·n^O(1) (A.28)

≤ 2

d−1

log(2/⁰)

·n^O(1). (A.29)

Hence, for sufficiently large data dimension d of sizeO(logn), the error introduced by clipping(k_NTK)∗ will be suppressed by small⁰.

Quantum random access memory

A key feature of attractive applications in quantum machine learning is achiev- ing polylogarithmic dependence on training set size. However, the initial encoding of a training set trivially requires linear time, since each data example must be recorded once. To ensure that this linear overhead only occurs once, quantum random access memory (QRAM) can be used to prepare a classical data structure once and then efficiently read out data with quantum circuits in logarithmic time. We use the binary tree QRAM subroutine proposed by Kerenidis and Prakash [9] and applied commonly in quantum machine learning [10, 11]. The QRAM consists of a classical data structure that encodes a data matrix S∈R^n×d with efficient quantum access.

Definition A.3.1 (Quantum access). Let|S_ii= _||S¹

i||

Pd−1

j=0S_ij|ji denote the amplitude encoding of theith row of dataS ∈R^n×d. Quantum access provides the mappings

• |ii |0i 7→ |ii |Sii

• |0i 7→ _||S||¹

i||S_i|| |ii

in timeT for i∈[n].

The QRAM by Kerenidis and Prakash [9] provides quantum access in time T that is polylogarithmic complexity with respect to both n and d.

Theorem A.3.1 (QRAM). For S ∈ R^n×d, there exists a data structure that stores S such that the time to insert, update or delete entry S_ij isO(log²(n)).

Moreover, a quantum algorithm with access to the data structure provides quantum access in time O(polylog(nd)).

Because the mapping|ii |0i 7→ |ii |S_iiis efficient, we can prepare a uniform su- perpositionPn−1

i=0 |ii |0i 7→Pn−1

i=0 |ii |S_iiof the entire dataset. While preparing an arbitrary superposition is difficult, a uniform superposition is achieved with a constant-depth quantum circuit by applying Hadamard gates to all qubits.

Hence, after a singleO(n)operation to prepare the data structure in QRAM, the dataset can be efficiently accessed by a quantum computer.

In our application of QRAM, we need to prepare states |xi = ^√¹_nPn−1 i=0 |xii and |yi = ^√¹_nPn−1

i=0 y_i|ii. For the state |xi, Assumption A.1.4 ensures that

||x_i||= 1, allowing |xi to be directly prepared. For labelsy_i, the classification problem ensures a known normalization factor √

n. Preparation of kernel states

To evaluate the neural network’s prediction under the approximationk^T_∗y, the NTK must be evaluated between a test data point x∗ and the entire training set {x_i}. In particular, we prepare the quantum state |k∗i = Pn−1

i=0 |ii |k_ii, where k_i corresponds to an encoding of kernel elements K_NTK(x∗,x_i) up to error ξ.

Since the NTK is only a function of the inner product ρ_i = x_? ·x_i, we can use previous work on inner product estimation [10] to construct the kernel elements. By preparing this inner product in a quantum register, the NTK

— which is efficient to compute classically on a single pair of data points by since δ = Ω(1/poly n) — can be efficiently evaluated between the test data point and the entire training dataset. However, we first need the well-known subroutines of amplitude estimation [12] and median evaluation [13] as well as a basic translation from bitstring representations to amplitudes.

Lemma A.3.2 (Amplitude estimation). Consider a quantum algorithm A :

|0i 7→ √

p|v,1i+√

1−p|g,0i for some garbage state |gi. For any positive integer P, amplitude estimation outputs p˜∈[0,1] such that

|˜p−p| ≤2π

pp(1−p)P +

π P

(A.30) with probability at least 8/π² using P iterations of the algorithm A. If p= 0, then p˜= 0 with certainty, and similarly for p= 1.

Lemma A.3.3(Median evaluation). Consider a unitaryU :|0^⊗mi 7→√

α|v,1i+

√1−α|g,0i for some 1/2 < α ≤1 in time T. Then there exists a quantum algorithm that, for any ∆>0and for any 1/2< α₀ ≤α, produces a state |ψi such that || |ψi −

0^⊗mL

|xi || ≤√

2∆ for some integer L in time 2T

log(1/∆) 2(|α₀| −1/2)²

. (A.31)

Lemma A.3.4(Amplitude encoding).Given state ^√¹_nPn−1

i=0 |kiiwith0≤ki ≤ 1, the state ^√¹

Pn−1

i=0 k_i|iimay be prepared in timeO(1/P)withP =Pn−1 i=0 k_i². Proof. We consider a single element |k_ii in the superposition ^√¹_nPn−1

i=0 |k_ii. Adding an ancilla to perform the map|kii |0i 7→ |kii |arccoskii, each bit of the

binary expansion |k_ii |arccosk_ii = |k_ii |b₁i. . .|b_mi can be used as a rotation angle. Specifically, insert the ancilla ^√¹₂(|0i+|1i) and apply m controlled ro- tationsexp(ib_jσ^z/2^j)to obtain the state |k_ii |arccosk_ii(|k_i| |0i+p

1−k²_i |1i). By including an additional rotation controlled on the sign of k_i, the state k_i|0i+p

1−k²_i |1i can be prepared. Applying the above in superposition, we have the state ^√¹_nPn−1

i=0 |ii

k_i|0i+p

1−k_i²|1i

. Letting P = Pn−1 i=0 k_i², post-selection on the final state gives ^√¹_P Pn−1

i=0 ki|ii in time O(1/P).

Lemmas A.3.2 through A.3.4 provide the basic quantum computing back- ground required. We may now prepare a quantum state corresponding to a superposition of K_NTK(x∗,x_i) for all i in the training set.

Theorem A.3.5 (Kernel estimation). Let S ∈ R^n×d be the training dataset of {xi} unit norm vectors stored in the QRAM described in Theorem A.3.1.

Consider the neural tangent kernel described in Eq. A.3 with coefficient of nonlinearity µ. For test data vector x_∗ ∈ R^d in QRAM and constant ⁰, there exists a quantum algorithm that maps

√1 n

n−1

i=0

|ii |0i 7→ 1

√P

n−1

i=0

k_i|ii. (A.32)

Here, k_i = ˆK_NTK(ρ_i)/Kˆ_NTK(1−⁰) is restricted to −1≤ k_i ≤1, i.e. clipping all|Kˆ_NTK(ρ_i)|>Kˆ_NTK(1−⁰). The state is prepared with error|ρ_i−x∗·x_i| ≤ξ with probability 1−2∆ in time O(polylog(nd) log(1/∆)/ξ).˜

Proof. Since the NTK is only a function of the inner product x∗ ·x_i, we can compute the kernel elements after estimating the inner product between the test data and training data, following a similar approach to Kerenidis et al.

[10]. Consider the initial state |ii^√¹

2(|0i+|1i)|0i. Using the QRAM as an oracle controlled on the second register, we can in O(polylog(nd)) time map

|ii |0i |0i 7→ |ii |0i |x_ii and similarly |ii |1i |0i 7→ |ii |1i |x∗i. (If |x∗i is not in QRAM, this operation only takes O(d) time.) Applying a Hadamard gate on the second register, the state becomes

2|ii(|0i(|xii+|x∗i) +|1i(|xii − |x∗i)). (A.33) Measuring the second qubit in the computational basis, the probability of obtaining the |1i state isp_i = ¹₂(1− hx_i|x_?i) since the vectors are real-valued.

Writing the state |1i(|x_ii − |x_∗i) as|v_i,1i, we have the mapping A:|ii |0i 7→ |ii(√

p_i|v_i,1i+p

1−p_i|g_i,0i), (A.34) where |g_ii is a garbage state. The runtime ofA is O(polylog(nd))˜ .

Applying amplitude estimation with A, we obtain a unitary U that performs U :|ii |0i 7→ |ii √

α|˜p_i, g,1i+√

1−α|g⁰,0i

(A.35) for garbage registers g, g⁰. By Lemma A.3.2, we have |˜pi − pi| ≤ and 8/π² ≤ α ≤ 1 after O(1/) iterations. At this point, we now have runtime O(polylog(nd)/)˜ .

Applying median estimation, we finally obtain a state |ψ_ii such that || |ψ_ii −

|0i^⊗L|˜p_i, gi || ≤√

2∆ in runtime O(polylog(nd) log(1/∆)/)˜ . Performing this entire procedure but on the initial superposition Pn−1

i=0 |ii^√¹₂(|0i+|1i)|0i, we now have the final state Pn−1

i=0 |ψii. Since

p˜_i− ^1−x₂^∗^·xⁱ

≤ξ, we can recover the inner productx∗·x_i as a quantum state. In general, there exists a unitary V : P

x|x,0i 7→P

x|x, f(x)i for any classical functionf with the same time complexity asf. Hence, we can choose f that recovers x∗ ·x_i ≈ 1−2˜p_i up to O(ξ) error with probability 1−2∆. Because δ= Ω(1/poly n), evaluating the NTK between two data points takes timeO(polylog(n)/µ)given their inner product. Again evaluating the classical function, we obtain the state ^√¹_nPn−1

i=0 |ii |k_iiwherek_ihas≤O(ξ)error in time O(polylog(nd) log(1/∆)/ξµ)˜ .

Finally, we need to prepare the state |k∗i = ^√¹

Pn−1

i=0 k_i|ii for P = P

i(k_i)², where by Corollary A.1.11 we haveP = Ω(n/polylogn). Applying Lemma A.3.4, preparing |k_∗i requires O(1/P) time, which is efficient in the training set size.

Efficient readout

To estimate the sign of(k_NTK)^T_∗ygiven our quantum states|k∗i and|yi, pair- wise measurements can be performed up to O(1/n) error. To estimate the sign of hk∗|yi, we encode the relative phase the states and perform an inner product estimation procedure such that m measurements of the state gives 1/√

m variance [14].

Lemma A.3.6 (Inner product estimation). Given states |k∗i,|yi ∈Rⁿ, estimating hk∗|yi with m measurements has variance at most 1/√

Proof. Prepare initial state ^√¹₂(|0i |k_∗i+|1i |yi). Applying a Hadamard gate to the first qubit, we obtain state ¹₂(|0i(|k_∗i+|yi)+|1i(|k_∗i−|yi)). Measuring the first qubit, the probability of obtaining |0i is p= ¹₂(1 +hk∗|yi). The binomial distribution over m trials given this probability has variance mp(1−p), and thus the variance of the estimate of pis p(1−p)/m. Transforming to get the variance of the overlap estimate hk_∗|yi, we find variance of q

1−(hk∗|yi)²

m . Since (hk_∗|yi)² ≤1, this is upper-bounded by 1/√

To make sure the inner product has sufficient overlap for efficient readout, we provide the following result based on the data assumptions.

Theorem A.3.7 (Efficient readout). Given states |k∗iand|yi, a test example x∗ can be classified up to O(1/n) error with a logarithmic number of measurements in n.

Proof. Under the identity matrix approximation ofK_NTK, the prediction ofy∗

is given by the sign ofk^T_∗y. As stated in Corollary A.2.7, this approximation is correct up toO(1/n)error. By Theorem A.2.8, clipping the kernel evaluations ofk∗ →k˜∗ with a1−⁰ threshold only contributesO(1/n)error for sufficiently high-dimensional data and sufficiently small⁰. The inner productk˜^T_∗yis given by

k˜^T_∗y=hk∗|yi ·√ P n

Kˆ_NTK(1−⁰)

(K_NTK)₁₁ . (A.36)

To show thathk_∗|yican be efficiently estimated by Lemma A.3.6, we must show that the overlap is at least Ω(1/polylog n). This is seen by breaking down states into positive and negative kernels and labels: |k_∗⁺i,|k_∗⁻i,|y⁺i,|y−i. The presence of an -neighborhood implies that at least one of | hk_∗⁺|y⁺i |² or

| hk_∗⁺|y⁻i |² is at leastΩ(1/polylogn). By Corollary A.1.11, the normalization factorP⁺ corresponding to|k⁺iis upper-bounded byn, since we are summing over O(n) terms of magnitude at least Ω(1/polylog n). By our data assumptions, yi =y∗ with high probability in N ={i: x∗·xi ≥ 1−}. Since is a constant and examples are sampled i.i.d., |N|=O(n). By Lemma A.1.10 we have k⁺_i ≥ k₀ for some k₀ = Ω(1/polylogn). Hence, we have for one of y_i⁺ or y⁻i (whichever corresponds to the majority label of neighborhoodN) that

√ 1 P⁺√

i∈N

k_i⁺y_i

= O(|N|)

√P⁺√ n

i∈N

k_i ≥O(1)k₀ ∼Ω(1/polylogn). (A.37)

Thus, we are guaranteed at least one efficiently measurable quantity between

| hk_∗⁺|y₊i |² and| hk_∗⁺|y₋i |². We can expandhk_∗|yiin terms of the positive and negative components

hk∗|yi=

√P₊

p(P₊+P−)(|Y₊|+|Y−|)

p|Y₊| | k_∗⁺

y₊

| −p

|Y−| | k_∗⁺

y−

− s

|Y₊|P−

P₊ | k_∗⁻

y₊

|+ s

|Y−|P−

P₊ | k⁻_∗

y−

(A.38) where Y₊ ={i : y_i = + = 1} and similarly for Y−. Since at least one of the terms has Ω(1/polyylog n) size and the probability of non-negligible terms fully cancelling is vanishingly small withO(n) labels assigned to both+1 and

−1, the producthk∗|yi will be of at least Ω(1/polylogn) size. Hence, we can apply Lemma A.3.6 to perform a polylogarithmic number of measurements and recover the sign of k^T_∗y up to O(1/n) error by Eq. A.36.

Dalam dokumen and Physics Simulation (Halaman 80-86)