您当前的位置: 首页 > 网页快照
Noise-induced barren plateaus in variational quantum algorithms | Nature Communications
Results . General framework . In this work we analyze a general class of parameterized ansatzes U ( θ ) that can be expressed as a product of L unitaries sequentially applied by layers $$U({{{{{{{\boldsymbol{\theta }}}}}}}})={U}_{L}({{{{{{{{\boldsymbol{\theta }}}}}}}}}_{L})\cdots {U}_{2}({{{{{{{{\boldsymbol{\theta }}}}}}}}}_{2})\cdot {U}_{1}({{{{{{{{\boldsymbol{\theta }}}}}}}}}_{1})\ .$$ (1) Here \({{{{{{{\boldsymbol{\theta }}}}}}}}={\{{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{l}\}}_{l = 1}^{L}\) is a set of vectors of continuous parameters that are optimized to minimize a cost function C that can be expressed as the expectation value of an operator O : $$C={{{{{{{\rm{Tr}}}}}}}}[OU({{{{{{{\boldsymbol{\theta }}}}}}}})\rho {U}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}})]\ .$$ (2) As shown in Fig.? 2 , ρ is an n -qubit input state. Without loss of generality we assume that each U l ( θ l ) is given by $${U}_{l}({{{{{{{{\boldsymbol{\theta }}}}}}}}}_{l})=\mathop{\prod}\limits_{m}{e}^{-i{\theta }_{lm}{H}_{lm}}{W}_{lm}\ ,$$ (3) where H l m are Hermitian operators, θ l ??=?{ θ l m } are continuous parameters, and W l m denote unparametrized gates. We expand H l m and O in the Pauli basis as $${H}_{lm}={\eta }_{lm}\cdot {\sigma}_{n}=\mathop{\sum}\limits_{i}{\eta }_{lm}^{i}{\sigma }_{n}^{i}\ ,\quad O=\omega \cdot {\sigma}_{n}=\mathop{\sum}\limits_{i}{\omega }^{i}{\sigma }_{n}^{i}\ ,$$ (4) where \({\sigma }_{n}^{i}\in {\{{\mathbb{1}},X,Y,Z\}}^{\otimes n}\) are Pauli strings, and \({\eta }_{lm}\) and \({\omega }\) are real-valued vectors that specify the terms present in the expansion. Defining \({N}_{lm}= {{\eta }}_{lm}\) and \({N}_{O}={\omega }\) as the number of non-zero elements, i.e., the number of terms in the summations in Eq. ( 4 ), we say that H l m and O admit an efficient Pauli decomposition if \({N}_{lm},{N}_{O}\in {{{{{{{\mathcal{O}}}}}}}}({{{{{{{\rm{poly}}}}}}}}(n))\) , respectively. Fig. 2: Setting for our analysis. An n -qubit input state ρ is sent through a variational ansatz U ( θ ) composed of L unitary layers U l ( θ l ) sequentially acting according to Eq. ( 1 ). Here, \({{{{{{{{\mathcal{U}}}}}}}}}_{l}\) denotes the quantum channel that implements the unitary U l ( θ l ). The parameters in the ansatz \({{{{{{{\boldsymbol{\theta }}}}}}}}={\{{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{l}\}}_{l = 1}^{L}\) are trained to minimize a cost function that is expressed as the expectation value of an operator O as in Eq. ( 2 ). We consider a noise model where local Pauli noise channels \({{{{{{{{\mathcal{N}}}}}}}}}_{j}\) act on each qubit j before and after each unitary. Full size image We now briefly discuss how the QAOA, UCC, and Hardware Efficient ansatzes fit into this general framework. We refer the reader to the Methods for additional details. In QAOA one sequentially alternates the action of two unitaries as $$U({\gamma },{\beta })={e}^{-i{\beta }_{p}{H}_{M}}{e}^{-i{\gamma }_{p}{H}_{P}}\cdots {e}^{-i{\beta }_{1}{H}_{M}}{e}^{-i{\gamma }_{1}{H}_{P}}\ ,$$ (5) where H P and H M are the so-called problem and mixer Hamiltonian, respectively. We define N P ?( N M ) as the number of terms in the Pauli decomposition of H P ?( H M ). On the other hand, Hardware Efficient ansatzes naturally fit into Eqs. ( 1 )–( 3 ) as they are usually composed of fixed gates (e.g, controlled NOTs), and parametrized gates (e.g., single qubit rotations). Finally, as detailed in the Methods, the UCC ansatz can be expressed as $$U({\theta })=\mathop{\prod}\limits_{lm}{U}_{lm}({\theta }_{lm})=\mathop{\prod}\limits_{lm}{e}^{i{\theta }_{lm}\mathop{\sum}\limits_{i}{\mu }_{lm}^{i}{\sigma }_{n}^{i}},$$ (6) where \({\mu }_{lm}^{i}\in \{0,\pm\! 1\}\) , and where θ l m are the coupled cluster amplitudes. Moreover, we denote \({\widehat{N}}_{lm}= {\mu}_{lm}\) as the number of non-zero elements in \({\sum }_{i}{\mu }_{lm}^{i}{\sigma }_{n}^{i}\) . As shown in Fig.? 2 , we consider a noise model where local Pauli noise channels \({{{{{{{{\mathcal{N}}}}}}}}}_{j}\) act on each qubit j before and after each unitary U l ( θ l ). The action of \({{{{{{{{\mathcal{N}}}}}}}}}_{j}\) on a local Pauli operator σ ∈ ?{ X ,? Y ,? Z } can be expressed as $${{{{{{{{\mathcal{N}}}}}}}}}_{j}(\sigma )={q}_{\sigma }\sigma \ ,$$ (7) where ?1?cost function is given by $$\widetilde{C}={{{{{{{\rm{Tr}}}}}}}}\left[O\ \left({{{{{{{\mathcal{N}}}}}}}}\circ {{{{{{{{\mathcal{U}}}}}}}}}_{L}\circ \cdots \circ {{{{{{{\mathcal{N}}}}}}}}\circ {{{{{{{{\mathcal{U}}}}}}}}}_{1}\circ {{{{{{{\mathcal{N}}}}}}}}\right)(\rho )\right]\ .$$ (8) General analytical results . There are some VQAs, such as the VQE 5 for chemistry and other physical systems, where it is important to accurately characterize the value of the cost function itself. We provide an important result below in Lemma 1 that quantitatively bounds the cost function itself, and we envision that this bound will be especially useful in the context of VQE. On the other hand, there are other VQAs, such as those for optimization 13 , 14 , 15 , 16 , compiling 22 , 23 , 24 , and linear systems 17 , 18 , where the key goal is to learn the optimal parameters and the precise value of the cost function is either not important or can be computed classically after learning the parameters. In this case, one is primarily concerned with trainability, and hence the gradient is a key quantity of interest. These applications motivate our main result in Theorem 1, which bounds the magnitude of the gradient. We remark that trainability is of course also important for VQE, and hence Theorem 1 is also of interest for this application. With this motivation in mind, we now present our main results. We first present our bound on the cost function, since one can view this as a phenomenon that naturally accompanies our main theorem. Namely, in the following lemma, we show that the noisy cost function concentrates around the corresponding value for the maximally mixed state. Lemma 1 . (Concentration of the cost function). Consider an L -layered ansatz of the form in Eq. ( 1 ). Suppose that local Pauli noise of the form of Eq. ( 7 ) with noise strength q acts before and after each layer as in Fig.? 2 . Then, for a cost function \(\widetilde{C}\) of the form in Eq. ( 8 ), the following bound holds $$\left\widetilde{C}-\frac{1}{{2}^{n}}{{{{{{{\rm{Tr}}}}}}}}[O]\right\ \leqslant \ G(n)\ {\left\Vert \rho -\frac{{\mathbb{1}}}{{2}^{n}}\right\Vert }_{1}\ ,$$ (9) where $$G(n)={N}_{O}\parallel{\omega}{\parallel }_{\infty }\ {q}^{2L+2}\ .$$ (10) Here ∥ ? ∥ ∞ is the infinity norm, ∥ ? ∥ 1 is the trace norm, \({\omega }\) is defined in Eq. ( 4 ), and \({N}_{O}={\omega}\) is the number of non-zero elements in the Pauli decomposition of O . This lemma implies the cost landscape exponentially concentrates on the value \({{{{{{{\rm{Tr}}}}}}}}[O]/{2}^{n}\) for large n , whenever the number of layers L scales linearly with the number of qubits. While this lemma has important applications on its own, particularly for VQE, it also provides intuition for the NIBP phenomenon, which we now state. Let \({\partial }_{lm}\widetilde{C}=\partial \widetilde{C}/\partial {\theta }_{lm}\) denote the partial derivative of the noisy cost function with respect to the m -th parameter that appears in the l -th layer of the ansatz, as in Eq. ( 3 ). For our main technical result, we upper bound \( {\partial }_{lm}\widetilde{C}\) as a function of L and n . Theorem 1 . (Upper bound on the partial derivative). Consider an L -layered ansatz as defined in Eq. ( 1 ). Let θ l m denote the trainable parameter corresponding to the Hamiltonian H l m in the unitary \({U}_{l}({\theta}_{l})\) appearing in the ansatz. Suppose that local Pauli noise of the form in Eq. ( 7 ) with noise parameter q acts before and after each layer as in Fig.? 2 . Then the following bound holds for the partial derivative of the noisy cost function $$ {\partial }_{lm}\widetilde{C} \,\leqslant\, F(n),$$ (11) where $$F(n)=\sqrt{8{{{{{{\mathrm{ln}}}}}}}\,2}\ {N}_{O}\parallel\!{H}_{lm}{\parallel}_{\infty }\parallel\!{\omega}{\parallel }_{\infty }{n}^{1/2}{q}^{L+1}\ ,$$ (12) and \({\omega }\) is defined in Eq. ( 4 ), with number of non-zero elements N O . Let us now consider the asymptotic scaling of the function F ( n ) in Eq. ( 12 ). Under standard assumptions such as that O in Eq. ( 4 ) admits an efficient Pauli decomposition and that H l m has bounded eigenvalues, we now state that F ( n ) decays exponentially in n , if L grows linearly in n . Corollary 1 . (Noise-induced barren plateaus). Let \({N}_{lm},{N}_{O}\in {{{{{{{\mathcal{O}}}}}}}}({{{{{{{\rm{poly}}}}}}}}(n))\) and let \({\eta }_{lm}^{i},{\omega }^{j}\in {{{{{{{\mathcal{O}}}}}}}}({{{{{{{\rm{poly}}}}}}}}(n))\) for all i ,? j . Then the upper bound F ( n ) in Eq. ( 12 ) vanishes exponentially in n as $$F(n)\in {{{{{{{\mathcal{O}}}}}}}}({2}^{-\alpha n})\ ,$$ (13) for some positive constant α if we have $$L\in {{\Omega }}(n)\ .$$ (14) The asymptotic scaling in Eq. ( 13 ) is independent of l and m , i.e., the scaling is blind to the layer, or the parameter within the layer, for which the derivative is taken. This corollary implies that when Eq. ( 14 ) holds, i.e. L grows at least linearly in n , the partial derivative \( {\partial }_{lm}\widetilde{C}\) exponentially vanishes in n across the entire cost landscape. In other words, one observes a Noise-Induced Barren Plateau (NIBP). We note that Eq. ( 14 ) is satisfied for all q ?exponential scaling. In addition, Corollary 1 implies that NIBPs are conceptually different from noise-free barren plateaus. First, NIBPs are independent of the parameter initialization strategy or the locality of the cost function. Second, NIBPs exhibit exponential decay of the gradient itself; not just of the variance of the gradient, which is the hallmark of noise-free barren plateaus. Noise-free barren plateaus allow the global minimum to sit inside deep, narrow valley in the landscape 34 , whereas NIBPs flatten the entire landscape. One of the strategies to avoid the noise-free barren plateaus is to correlate parameters, i.e., to make a subset of the parameters equal to each other 37 . We generalize Theorem 1 in the following remark to accommodate such a setting, consequently showing that such correlated or degenerate parameters do not help in avoiding NIBPs. In this setting, the result we obtain in Eq. ( 16 ) below is essentially identical to that in Eq. ( 12 ) except with an additional factor quantifying the amount of degeneracy. Remark 1 . (Degenerate parameters). Consider the ansatz defined in Eqs. ( 1 ) and ( 3 ). Suppose there is a subset G s t of the set { θ l m } in this ansatz such that G s t consists of g parameters that are degenerate: $${G}_{st}=\left\{{\theta }_{lm}\ \ {\theta }_{lm}={\theta }_{st}\right\}\ .$$ (15) Here, θ s t denotes the parameter in G s t for which \({N}_{lm}\parallel\!{\eta}_{lm}{\parallel }_{\infty }\) takes the largest value in the set. ( θ s t can also be thought of as a reference parameter to which all other parameters are set equal in value.) Then the partial derivative of the noisy cost with respect to θ s t is bounded as $$ {\partial }_{st}\widetilde{C} \,\leqslant \,\sqrt{8{{{{{{\mathrm{ln}}}}}}}\,2}\ g{N}_{O}\parallel\!{H}_{lm}{\parallel }_{\infty }\parallel\!{\omega}{\parallel}_{\infty }{n}^{1/2}{q}^{L+1},$$ (16) at all points in the cost landscape. Remark 1 is especially important in the context of the QAOA and the UCC ansatz, as discussed below. We note that, in the general case, a unitary of the form of Eq. ( 3 ) cannot be implemented as a single gate on a physical device. In practice one needs to compile the unitary into a sequence of native gates. Moreover, Hamiltonians with non-commuting terms are usually approximated with techniques such as Trotterization. This compilation overhead potentially leads to a sequence of gates that grows with n . Remark 1 enables us to account for such scenarios, and we elaborate on its relevance to specific applications in the next subsection. In reality, noise on quantum hardware can be non-local. For instance in certain systems one can have cross-talk errors or coherent errors. We address such extensions to our noise model in the following remark. Remark 2 . (Extensions to the noise model). Consider a modification to each layer of noise \({{{{{{{\mathcal{N}}}}}}}}\) in Eq. ( 8 ) to include additional k -local Pauli noise and correlated coherent (unitary) noise across multiple qubits. Under such extensions to the noise model, we obtain the same scaling results as those obtained in Lemma 1 and Theorem 1. We discuss this in more detail in Supplementary Note? 5 . Finally, we present an extension of our main result to the case of measurement noise. Consider a model of measurement noise where each local measurement independently has some bit-flip probability given by (1??? q M )/2, which we assume to be symmetric with respect to the 0 and 1 outcomes. This leads to an additional reduction of our bounds on the cost function and its gradient that depends on the locality of the observable O . Proposition 1 . (Measurement noise). Consider expanding the observable O as a sum of Pauli strings, as in Eq. ( 4 ). Let w denote the minimum weight of these strings, where the weight is defined as the number of non-identity elements for a given string. In addition to the noise process considered in Fig.? 2 , suppose there is also measurement noise consisting of a tensor product of local bit-flip channels with bit-flip probability (1??? q M )/2. Then we have $$\left\widetilde{C}-\frac{1}{{2}^{n}}{{{{{{{\rm{Tr}}}}}}}}[O]\right\, \,\leqslant\,\, {q}_{M}^{w}\ G(n)\ {\left\Vert \rho -\frac{{\mathbb{1}}}{{2}^{n}}\right\Vert }_{1}$$ (17) and $$ {\partial }_{lm}\widetilde{C}\, \,\leqslant\,\, {q}_{M}^{w}F(n)$$ (18) where G ( n ) and F ( n ) are defined in Lemma 1 and Theorem 1, respectively. Proposition 1 goes beyond the noise model considered in Theorem 1. It shows that in the presence of measurement noise there is an additional contribution from the locality of the measurement operator. It is interesting to draw a parallel between Proposition 1 and noise-free barren plateaus, which have been shown to be cost-function dependent and in particular depend on the locality of the observable O 34 . The bounds in Proposition 1 similarly depend on the locality of O . For example, when w ?=? n , i.e., global observables, the factor \({q}_{M}^{w}\) will hasten the exponential decay. On the other hand, when w ?=?1, i.e., local observables, the scaling is unaltered by measurement noise. In this sense, a global observable exacerbates the NIBP issue by making the decay more rapid with n . Application-specific analytical results . Here we investigate the implications of our results from the previous subsection for two applications: optimization and chemistry. In particular, we derive explicit conditions for NIBPs for these applications. These conditions are derived in the setting where Trotterization is used, but other compilation strategies incur similar asymptotic behavior. We begin with the QAOA for optimization and then discuss the UCC ansatz for chemistry. Finally, we make a remark about the Hamiltonian Variational Ansatz (HVA), as well as remark that our results also apply to a generalized cost function that can employ training data. Corollary 2 . (Example: QAOA). Consider the QAOA with 2 p trainable parameters, as defined in Eq. ( 5 ). Suppose that the implementation of unitaries corresponding to the problem Hamiltonian H P and the mixer Hamiltonian H M require k P - and k M -depth circuits, respectively. If local Pauli noise of the form in Eq. ( 7 ) with noise parameter q acts before and after each layer of native gates, then we have $$ {\partial }_{{\beta }_{l}}\widetilde{C} \ \leqslant \sqrt{8{{{{{{\mathrm{ln}}}}}}}\,2}\ {g}_{l,P}{N}_{P}\parallel\! {H}_{P}{\parallel}_{\infty }\parallel\!{\omega}{\parallel }_{\infty }{n}^{1/2}{q}^{({k}_{P}+{k}_{M})p+1},$$ (19) $$ {\partial }_{{\gamma }_{l}}\widetilde{C} \ \leqslant \sqrt{8{{{{{{\mathrm{ln}}}}}}}\,2}\ {g}_{l,M}{N}_{P}\parallel\! {H}_{M}{\parallel }_{\infty }\parallel\! {\omega}{\parallel }_{\infty }\ {n}^{1/2}{q}^{({k}_{P}+{k}_{M})p+1},$$ (20) for any choice of parameters β l ,? γ l , and where O ?=? H P in Eq. ( 2 ). Here g l , P and g l , M are the respective number of native gates parameterized by β l and γ l according to the compilation. Corollary 2 follows from Remark 1 and it has interesting implications for the trainability of the QAOA. From Eqs. ( 19 ) and ( 20 ), NIBPs are guaranteed if p k P scales linearly in n . This can manifest itself in a number of ways, which we explain below. First, we look at the depth k P required to implement one application of the problem unitary. Graph problems containing vertices of extensive degree such as the Sherrington-Kirkpatrick model inherently require Ω( n ) depth circuits to implement 55 . On the other hand, generic problems mapped to hardware topologies also have the potential to incur Ω( n ) depth or greater in compilation cost. For instance, implementation of MaxCut and k -SAT using SWAP networks on circuits with 1-D connectivity requires depth Ω( n ) and Ω( n k ?1 ) respectively 15 , 63 . Such mappings with the aforementioned compilation overhead for k ? ?2 are guaranteed to encounter NIBPs even for a fixed number of rounds p . Second, it appears that p values that grow at least lightly with n may be needed for quantum advantage in certain optimization problems (for example 64 , 65 , 66 , 67 ). In addition, there are problems employing the QAOA that explicitly require p scaling as poly( n ) 21 , 68 . Thus, without even considering the compilation overhead for the problem unitary, these QAOA problems may run into NIBPs particularly when aiming for quantum advantage. Moreover, weak growth of p with n combined with compilation overhead could still result in an NIBP. Finally, we note that above we have assumed the contribution of k P dominates that of k M . However, it is possible that for choice of more exotic mixers 16 , k M also needs to be carefully considered to avoid NIBPs. Corollary 3 . (Example: UCC). Let H denote a molecular Hamiltonian of a system of M e electrons. Consider the UCC ansatz as defined in Eq. ( 6 ). If local Pauli noise of the form in Eq. ( 7 ) with noise parameter q acts before and after every U l m ( θ l m ) in Eq. ( 6 ), then we have $$ {\partial }_{{\theta }_{lm}}\widetilde{C}\, \leqslant\, \sqrt{8{{{{{{\mathrm{ln}}}}}}}\,2}\ {\widehat{N}}_{lm}{N}_{H}\parallel\!{\omega}{\parallel }_{\infty }\ {n}^{1/2}{q}^{L+1},$$ (21) for any coupled cluster amplitude θ l m , and where O ?=? H in Eq. ( 2 ). Corollary 3 allows us to make general statements about the trainability of UCC ansatz. We present the details for the standard UCC ansatz with single and double excitations from occupied to virtual orbitals 50 , 69 (see Methods for more details). Let M o denote the total number of spin orbitals. Then at least n ?=? M o qubits are required to simulate such a system and the number of variational parameters grows as \({{\Omega }}({n}^{2}{M}_{e}^{2})\) 63 , 70 . To implement the UCC ansatz on a quantum computer, the excitation operators are first mapped to Pauli operators using Jordan-Wigner or Bravyi-Kitaev mappings 71 , 72 . Then, using first-order Trotterization and employing SWAP networks 63 , the UCC ansatz can be implemented in Ω( n 2 M e ) depth, while assuming 1-D connectivity of qubits 63 . Hence for the UCC ansatz, approximated by single- and double-excitation operators, the upper bound in Eq. ( 21 ) (asymptotically) vanishes exponentially in n . To target strongly correlated states for molecular Hamiltonians, one can employ a UCC ansatz that includes additional, generalized excitations 56 , 73 . A Ω( n 3 ) depth circuit is required to implement the first-order Trotterized form of this ansatz 63 . Hence NIBPs become more prominent for generalized UCC ansatzes. Finally, we remark that a sparse version of the UCC ansatz can be implemented in Ω( n ) depth 63 . NIBPs still would occur for such ansatzes. Additionally, we can make the following remark about the Hamiltonian Variational Ansatz (HVA). As argued in 56 , 74 , 75 , the HVA has the potential to be an effective ansatz for quantum many-body problems. Remark 3 . (Example: HVA). The HVA can be thought of as a generalization of the QAOA to more than two non-commuting Hamiltonians. It is remarked in ref. 57 that for problems of interest the number of rounds p scales linearly in n . Thus, considering this growth of p and also the potential growth of the compiled unitaries with n , the HVA has the potential to encounter NIBPs, by the same arguments made above for the QAOA (e.g., Corollary 2). Remark 4 . (Quantum Machine Learning). Our results can be extended to generalized cost functions of the form \({C}_{{{{{{{{\rm{train}}}}}}}}}={\sum }_{i}{{{{{{{\rm{Tr}}}}}}}}[{O}_{i}U({{{{{{{\boldsymbol{\theta }}}}}}}}){\rho }_{i}{U}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}})]\) where { O i } is a set of operators each of the form ( 4 ) and { ρ i } is a set of states. This can encapsulate certain quantum machine learning settings 58 , 59 , 60 , 61 , 62 that employ training data { ρ i }. As an example of an instance where NIBPs can occur, in one study 62 an ansatz model has been proposed that requires at least linear circuit depth in n . Numerical simulations of the QAOA . To illustrate the NIBP phenomenon beyond the conditions assumed in our analytical results, we numerically implement the QAOA to solve MaxCut combinatorial optimization problems. We employ a realistic noise model obtained from gate-set tomography on the IBM Ourense superconducting qubit device. In the Methods we provide additional details on the noise model and the optimization method employed. Let us first recall that a MaxCut problem is specified by a graph G ?=?( V ,? E ) of nodes V and edges E . The goal is to partition the nodes of G into two sets which maximize the number of edges connecting nodes between sets. Here, the QAOA problem Hamiltonian is given by $${H}_{P}=-\frac{1}{2}\mathop{\sum}\limits_{ij\in E}{C}_{ij}({\mathbb{1}}-{Z}_{i}{Z}_{j})\ ,$$ (22) where Z i are local Pauli operators on qubit (node) i , C i j ?=?1 if the nodes are connected and C i j ?=?0 otherwise. We analyze performance in two settings. First, we fix the problem size at n ?=?5 nodes (qubits) and vary the number of rounds p (Fig.? 3 ). Second, we fix the number of rounds of QAOA at p ?=?4 and vary the problem size by increasing the number of nodes (Fig.? 4 ). Fig. 3: QAOA heuristics in the presence of realistic hardware noise: increasing number of rounds for fixed problem size. a The approximation ratio averaged over 100 random graphs of 5 nodes is plotted versus number of rounds p . The black, green, and red curves respectively correspond to noise-free training, noisy training with noise-free final cost evaluation, and noisy training with noisy final cost evaluation. The performance of noise-free training increases with p , similar to the results in Ref.? 15 . The green curve shows that the training process itself is hindered by noise, with the performance decreasing steadily with p for p ?>?4. The dotted blue lines correspond to known lower and upper bounds on classical performance in polynomial time: respectively the performance guarantee of the Goemans-Williamson algorithm 77 and the boundary of known NP-hardness 78 , 79 . b The deviation of the cost from \({{{{{{{\rm{Tr}}}}}}}}[{H}_{P}]/{2}^{n}\) (averaged over graphs and parameter values) is plotted versus p . As p increases, this deviation decays approximately exponentially with p (linear on the log scale). c The absolute value of the largest partial derivative, averaged over graphs and parameter values, is plotted versus p . The partial derivatives decay approximately exponentially with p , showing evidence of Noise-Induced Barren Plateaus (NIBPs). Full size image Fig. 4: QAOA heuristics in the presence of realistic hardware noise: increasing problem size for a fixed number of rounds. The approximation ratio averaged over 60 random graphs of increasing number of nodes n and fixed number of rounds p ?=?4 is plotted. The black, green, and red curves respectively correspond to noise-free training, noisy training with noise-free final cost evaluation, and noisy training with noisy final cost evaluation. a For a problem size of 8 nodes or larger, the noisily-trained approximation ratio falls below the performance guarantee of the classical Goemans-Williamson algorithm. b The depth of the circuit (red curve) scales linearly with the number of qubits, confirming we are in a regime where we would expect to observe Noise-Induced Barren Plateaus. Full size image In order to quantify performance for a given n and p , we randomly generate 100 graphs according to the Erd?s-Rényi model 76 , such that each graph G is chosen uniformly at random from the set of all graphs of n nodes. For each graph we run 10 instances of the parameter optimization, and we select the run that achieves the smallest energy. At each optimization step the cost is estimated with 1000 shots. Performance is quantified by the average approximation ratio when training the QAOA in the presence and absence of noise. The approximation ratio is defined as the lowest energy obtained via optimizing divided by the exact ground state energy of H P . In our first setting we observe in Fig.? 3 a that when training in the absence of noise, the approximation ratio increases with p . However, when training in the presence of noise the performance decreases for p ?>?2. This result is in accordance with Lemma 1, as the cost function value concentrates around \({{{{{{{\rm{Tr}}}}}}}}[{H}_{P}]/{2}^{n}\) as p increases. This concentration phenomenon can also be seen clearly in Fig.? 3 b, where in fact we see evidence of exponential decay of cost value with p . In addition, we can see the effect of NIBPs as Fig.? 3 a also depicts the value of the approximation ratio computed without noise by utilizing the parameters obtained via noisy training. Note that evaluating the cost in a noise-free setting has practical meaning, since the classicality of the Hamiltonian allows one to compute the cost on a (noise-free) classical computer, after training the parameters. For p ?>?4 this approximation ratio decreases, meaning that as p becomes larger it becomes increasingly hard to find a minimizing direction to navigate through the cost function landscape. Moreover, the effect of NIPBs is evident in Fig.? 3 c where we depict the average absolute value of the largest cost function partial derivative (i.e., \(\mathop{\max }\nolimits_{lm} {\partial }_{lm}\widetilde{C}\) ). This plot shows an exponential decay of the partial derivative with p in accordance with Theorem 1. Finally, in Fig.? 3 a we contextualize our results with previously known two-sided bounds on classical polynomial-time performance. The lower bound corresponds to the performance guarantee of the classical Goemans-Williamson algorithm 77 , whilst the upper bound is at the value 16/17 which is the approximation ratio beyond which Max-Cut is known to be NP-hard 78 , 79 . In our second setting we find complementary results. In Fig.? 4 a we observe that at a problem size of 8 qubits or larger, 4 rounds of QAOA trained on the noisy circuit falls short of the performance guarantees of the classical Goemans-Williamson algorithm. As we increase the number of qubits, we also observe this increases the depth of the circuit linearly (Fig.? 4 b), thus confirming we are in a regime of NIBPs. Our numerical results show that training the QAOA in the presence of a realistic noise model significantly affects the performance. The concentration of cost and the NIBP phenomenon are both also clearly visible in our data. Even though we observe performance for n ?=?5 and p ?=?4 that is NP-hard to achieve classically, any possible advantage would be lost for large problem sizes or circuit depth due to bad scaling. Hence, noise appears to be a crucial factor to account for when attempting to understand the performance of the QAOA. Implementation of the HVA on superconducting hardware . We further demonstrate the NIBP phenomenon in a realistic hardware setting by implementing the Hamiltonian Variational Ansatz (HVA) on the IBM Quantum ibmq_montreal 27-qubit superconducting device. At time of writing this holds the record for the largest quantum volume measured on an IBM Quantum device, which was demonstrated on a line of 6 qubits 80 . We implement the HVA for the Transverse Field Ising Model as considered in ref. 57 , with a local measurement O ?=? Z 0 Z 1 on the first two qubits of the Ising chain. We assign the number of layers L of the ansatz to increase linearly with the number of qubits n according to the relationship L ?=? n ???1. In order to minimize SWAP gates used in transpilation (and the accompanying noise that they incur), we modify each layer of the HVA ansatz to only include entangling gates between locally connected qubits. Figure? 5 plots the partial derivative of the cost function with respect to the parameter in the final layer of the ansatz, averaged over 100 random parameter sets. We also plot averaged cost differences from the corresponding maximally mixed values, as well as the variance of both quantities. In the noise-free case both the partial derivative and cost value differences decrease at a sub-exponential rate. Meanwhile, in the noisy case we observe that both the partial derivatives and cost value differences vanish exponentially until their variances reach the same order of magnitude as the shot noise floor. (As the shot budget on the IBM Quantum device is limited, this leads to a background of shot noise, and we plot the order of magnitude of this with a dotted line.) This explicitly demonstrates that the problem of barren plateaus is one of resolvability . In principle, if one has access to exact cost values and gradients one may be able to navigate the cost landscape, however, the number of shots required to reach the necessary resolution increases exponentially with n . Fig. 5: Implementation on the ibmq_montreal superconducting-qubit device. We consider the HVA with the number of layers growing linearly in the number of qubits, n . a The average magnitude of the partial derivative of the noisy and noise-free cost, with respect to the parameter in the final layer, is plotted versus n . The average is taken over 100 randomly selected parameter sets. As n increases, the noisy average partial derivative decreases approximately exponentially, until around n ?=?9. This shows evidence of Noise Induced Barren Plateaus on real quantum hardware. b The deviation from exponential scaling can be understood by observing that it coincides with the point that the variance of the noisy partial derivatives reaches the same order of magnitude as the shot noise given by a finite sample budget of 8192 shots. Thus, from this point onward we expect fluctuations in the partial derivative to be dominated by shot noise, and gradients to be unresolvable. c The difference of the cost value from its corresponding maximally mixed value is plotted versus n . d The variance of this difference is plotted versus n . Both these quantities also show exponential decay until the variance of cost difference approaches the shot noise floor, which shows evidence of exponential cost concentration on this device. Full size image .
From:
系统抽取主题     
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)  
(1)