• Tidak ada hasil yang ditemukan

From Network Dynamics to Optimization Algorithms

N/A
N/A
Protected

Academic year: 2023

Membagikan "From Network Dynamics to Optimization Algorithms"

Copied!
288
0
0

Teks penuh

In Part I of this thesis, we study one of the most important families of network dynamics, namely epidemics or spreading processes. In Part IV of this thesis, we characterize several properties, such as minimax optimality and implicit regularization, of SGD and more generally of the family of stochastic mirror descent (SMD).

LIST OF ILLUSTRATIONS

109 5.3 The profit under the normal (no restriction) condition and under. optimal) strategic constraint, as a function of the size of the aggregator in the IEEE test case network: a) IEEE 14-Bus Case, b) IEEE 30-Bus Case and c) IEEE 57-Bus Case. Note that each of the four histograms corresponds to an 11×106-dimensional weight vector that perfectly interpolates the data.

LIST OF TABLES

INTRODUCTION

  • Major Challenges
  • Synopsis of Part I: Network Dynamics
  • Synopsis of Part II: Incentives and Markets
  • Synopsis of Part III: Distributed Computation
  • Synopsis of Part IV: Learning from Data

Distributed computing: The massive computational needs due to the scale of the system (and the fact that data can be distributed across multiple entities) make it virtually impossible to perform computations in a central unit. Moreover, none of the existing schemes is accompanied by a computational algorithm for non-convex overheads.

Network Dynamics

EPIDEMICS OVER COMPLEX NETWORKS: ANALYSIS OF EXACT AND APPROXIMATE MODELS

Introduction

A healthy node has a chance to become infected if it has infected neighbors in the network. Apparently, because the analysis of this Markov chain is too complicated, various 𝑛-dimensional linear and nonlinear approaches have been proposed in the literature.

Models

  • Susceptible-Infected-Susceptible (SIS) .1 Exact Markov Chain Model
    • Nonlinear Model
    • Linear Model
  • Susceptible-Infected-Recovered-Susceptible (SIRS)
    • Exact Markov Chain Model
    • Nonlinear Model
    • Linear Model
  • Susceptible-Infected-Vaccinated (SIV) .1 Exact Markov Chain Model
    • Nonlinear Model
    • Linear Model

However, the transition matrix of the discrete-time Markov chain model can have non-zero entries anywhere (except the row of the absorbing state). 𝑃𝑛(𝑡) = 0) is a trivial fixed point of the above model, which is consistent with the absorptive state of the Markov chain model.

Figure 2.1: State diagram of a single node in different models. Wavy arrows represent exogenous (network-based) transition
Figure 2.1: State diagram of a single node in different models. Wavy arrows represent exogenous (network-based) transition

Results on the Nonlinear MFA Model

  • SIRS/SEIRS/Immune-Admitting-SIS
  • SIV/SEIV (Infection-Dominant)
  • SIV/SEIV (Vaccination-Dominant)

The disease-free fixed point of the nonlinear model for the infection-dominant SIV and the infection-dominant SEIV epidemics is . The disease-free fixed point of the nonlinear model for the vaccination-dominant SIV and the vaccination-dominant SEIV epidemics is .

Figure 2.3: Summary of known results for different models. The results have been illustrated as a function of 𝛽𝜆 𝑚𝑎 𝑥 𝛿 ( 𝐴)
Figure 2.3: Summary of known results for different models. The results have been illustrated as a function of 𝛽𝜆 𝑚𝑎 𝑥 𝛿 ( 𝐴)

Results on the Exact Markov Chain Model

  • Connection to the Linear Model
  • Connection to the Nonlinear Model

However, it turns out that it provides an upper bound on the probability that the chain is not in the fully healthy state (i.e., the existence of infection) [4], if one initializes the nonlinear model from the state of all infected . Finally we should note that the reason why it is possible for the nonlinear map to converge to a unique non-origin fixed point when 𝛽𝜆max𝛿(𝐴) >1, even though the original Markov chain model always converges to the fully healthy state, is that this is only an upper bound on P(𝜉(𝑡) ≠ ¯0|𝜉(0) = 𝑋).

Figure 2.4: A typical example of the evolution of an SIS epidemic over an Erdős- Erdős-Rényi graph with 𝑛 = 2000 nodes and 𝜆 max ( 𝐴 ) = 16
Figure 2.4: A typical example of the evolution of an SIS epidemic over an Erdős- Erdős-Rényi graph with 𝑛 = 2000 nodes and 𝜆 max ( 𝐴 ) = 16

Heterogeneous Network Models

The origin is the unique fixed point that is globally stable if the largest eigenvalue of 𝑀 is less than 1. It also has a unique nontrivial fixed point that is globally stable if the largest eigenvalue of 𝑀 is greater than 1.

Pairwise and Higher-Order Approximate Models

Similar to the previous examples, 𝜆max(𝑀) < 1 ensures that the mixing time of the Markov chain defined by the transition matrix 𝑆(𝑀) is 𝑂(log𝑛).

Summary and Conclusion

Finally, we should note that characterizing the exact epidemic threshold of the Markov chain model is still an open problem. The nonlinear model has the same Jacobian matrix as that of the previous section.

Figure 2.5: The evolution of (a) SIS/SIRS/SEIRS, (b) SIV/SEIV (infection-dominant), (c) SIV/SIEV (vaccination-dominant) epidemics over an Erdős-Rényi graph with 𝑛 = 2000 nodes
Figure 2.5: The evolution of (a) SIS/SIRS/SEIRS, (b) SIV/SEIV (infection-dominant), (c) SIV/SIEV (vaccination-dominant) epidemics over an Erdős-Rényi graph with 𝑛 = 2000 nodes

IMPROVED BOUNDS ON THE EPIDEMIC THRESHOLD OF THE EXACT MODELS

  • Introduction
  • The Markov Chain and Marginal Probabilities of Infection
    • An Alternative Bounding Technique
    • Connection to Mixing Time of the Markov Chain
    • A Lower Bound on the 𝑝 𝑖 𝑗 ’s
  • An Alternative Pairwise Probability ( 𝑞 𝑖 𝑗 )
  • Conclusion and Future Work

If 𝜌(𝑀0) < 1, then the mixing time of the Markov chain whose transition matrix 𝑆 is described by Eq. If 𝜌(𝑀00) < 1and1−𝛿−𝛽 ≥ 0, then the mixing time of the Markov chain whose transition matrix𝑆 is described by Eq.

Table 3.1: Performance of the proposed bounds 𝑀 0 and 𝑀 00 in comparison with the previous bound 𝑀
Table 3.1: Performance of the proposed bounds 𝑀 0 and 𝑀 00 in comparison with the previous bound 𝑀

Incentives and Markets

OPTIMAL PRICING IN MARKETS WITH NON-CONVEX COSTS

Introduction

We formalize the desired properties in the literature in Section 4.2 and discuss the properties of the existing schemes in Section 4.5. For example, most of the existing schemes mentioned above are proposed for specific classes of non-convex cost functions and cannot handle more general non-convex ones.

Market Description and Pricing Objectives

  • Market Model
  • Pricing Objectives

Although in certain cases (e.g. IP pricing [159] for start-up and linear costs) the total cost of suppliers is Í𝑛. Ultimately, the quantity that determines the cost of meeting demand is the total payment to suppliers Í𝑛.

Proposed Scheme: Equilibrium-Constrained Pricing

  • Pricing Formulation
    • Linear+Uplift Pricing
  • An Efficient Approximation Algorithm

Constraint (4.1d) can be equivalently expressed as 4.2) The key difference between EC pricing and existing pricing methods in non-convex markets is that it directly minimizes the total cost paid and tries to find both optimal allocations𝑞∗. Therefore, minimizing the total payment also limits the total production cost, while the opposite is generally not true (minimizing the total production cost can result in very high payments, as can be seen in e.g. the case studies in Figures 4.4a and 4.5a).

Figure 4.1: An illustration of the set Λ for an example with 3 non-convex cost functions
Figure 4.1: An illustration of the set Λ for an example with 3 non-convex cost functions

Equilibrium-Constrained Pricing for Networked Markets

  • Pricing Formulation

When the capacity constraints (4.14c) are relaxed (𝑓𝑒 =−∞, 𝑓𝑒 =∞, ∀𝑒 ∈ 𝐸), the network problem is reduced to the internal market problem. As in the case of the internal market, we define an approximate solution to this problem.

Existing Pricing Schemes

  • Pricing in Convex Markets
  • Pricing in Non-Convex Markets

However, IP pricing assumes knowledge of the optimal solutions to the unit commitment problem and is therefore not intended as a practical approach to finding the optimal allocation. The scheme is based on the formulation and solution of the SLR of the mixed integer program by the market clearing constraint with standard Lagrange multiplier𝜆 semi-relaxed.

Figure 4.3: An illustration of shadow pricing for the case of 3 convex cost functions.
Figure 4.3: An illustration of shadow pricing for the case of 3 convex cost functions.

Experimental Results

  • Case 1: Linear plus startup cost
  • Case 2: Quadratic plus startup cost
  • A Networked Market with Capacity Constraints

As you can see in Figure 4-5a, EC1, EC2, EC3, and EC4 reach the possible minimum total payments equal to the total cost. However, it helps to reduce the overall increases, as we can see in Figure 4.7b. a)Total payments as a function of power capacity.

Table 4.2: Summary of the production characteristics in the modified Scarf’s example.
Table 4.2: Summary of the production characteristics in the modified Scarf’s example.

Concluding Remarks

In the optimization problem (4.6), the order of the variables in the minimizations does not matter, and further, for any fixed𝑞1,. Intermediate Nodes: At each new level, there are at most half (+1) nodes as in the previous level.

Figure 4.8: The transformation of an arbitrary-degree tree to a binary tree.
Figure 4.8: The transformation of an arbitrary-degree tree to a binary tree.

MANAGING AGGREGATORS IN THE SMART GRID

Introduction

  • Summary of Contributions
  • Related Work
    • Quantifying Market Power in Electricity Markets
    • Cyber-Attacks in the Grid
    • Algorithms for Managing Distributed Energy Resources
    • Algorithms for Bilevel Programs

On the aggregator side, managing a geographically diverse fleet of distributed energy resources is a difficult algorithmic challenge. On the operator's side, the participation of aggregators in electricity markets poses unique challenges in terms of monitoring and limiting the potential of exercising market power.

System Model

  • Preliminaries

In contrast, our work presents a polynomial time algorithm that arguably maximizes the gain of the aggregator. The ex-post LMPs are announced as a function of the optimal Lagrange multipliers of this optimization.

The Market Behavior of the Aggregator

  • A Profit-Maximizing Aggregator

Since LMPs are themselves the solution to an optimization problem, the aggregator's problem is a bilevel optimization problem. An important note about this problem is that we have assumed that the aggregator has complete knowledge of the network topology (G) and state estimates (pandd).

The Impact of Strategic Curtailment

  • An Illustrative Example
  • Case Studies

Assume that the total collector generation at each bus is 10𝑀 𝑊 and is able to limit 1% of it (0.1 𝑀 𝑊). As the size of the aggregator (its number of buses) increases, not only does the profit (which is expected) increase, but also.

Figure 5.2: The locational marginal prices for the 6-bus example before and after the curtailment.
Figure 5.2: The locational marginal prices for the 6-bus example before and after the curtailment.

Optimizing Curtailment Profit

  • An Exact Algorithm for Single-Node Aggregators in Arbitrary Networks Even in the simplest case, when the aggregator has only a single node, i.e., its entire
  • An Approximation Algorithm for Multi-Bus Aggregators in Radial Net- works
  • Evaluation of the Approximation Algorithm

The other important question is what is the impact of strategic curtailment on the price of each bus of the network (not necessarily just the aggregator's buses). In particular, we show that a 𝜖 approximation of the optimal throttling gain can be obtained using an algorithm with running time that is linear in the size of the network and polynomial in 1𝜖.

Figure 5.5: The LMP at bus 𝑖 as a function of curtailed generation at that bus. Shaded areas indicate the aggregator’s revenue at the normal condition and at the curtailment.
Figure 5.5: The LMP at bus 𝑖 as a function of curtailed generation at that bus. Shaded areas indicate the aggregator’s revenue at the normal condition and at the curtailment.

Concluding Remarks

In this definition, the value of 𝜂𝑖 captures the ability of the generator/aggregator to exercise market power. Focusing on expressions involving𝛼𝑖 for a given𝑖, the objective of the above optimization problem is of the form: (Δ𝑝.

Figure 5.8: The difference from the optimal solution as a function of the running time of the algorithm, in the 9-bus network with 1% curtailment allowance.
Figure 5.8: The difference from the optimal solution as a function of the running time of the algorithm, in the 9-bus network with 1% curtailment allowance.

Distributed Computation

DISTRIBUTED SOLUTION OF LARGE-SCALE SYSTEMS OF EQUATIONS

Introduction

In addition to the optimization-based methods, there are some distributed algorithms specially designed for solving systems of linear equations. We provide a full analysis of the convergence rate of APC (Section 6.3), as well as a detailed comparison with all other distributed methods mentioned above (Section 6.4).

The Setup

In our methodology, the task master assigns a subset of equations to each of the machines and invokes a distributed consensus-based algorithm to obtain the solution to the original problem in an iterative manner. At each iteration, each machine updates its solution by adding a scaled version of the projection of an error signal onto the null space of its system of equations, and the taskmaster averages over the solutions with momentum.

Accelerated Projection-Based Consensus .1 The Algorithm.1The Algorithm

  • Convergence Analysis
  • Computation and Communication Complexity

We analyze the convergence of the proposed algorithm and prove that it has linear convergence (i.e. the error decays exponentially), with no additional assumption imposed. The convergence rate of the algorithm is determined by the spectral radius (largest eigenvalue) of the (𝑚 +1)𝑛× (𝑚 +1)𝑛 block matrix in (6.10).

Comparison with Related Methods .1 Distributed Gradient Descent (DGD).1Distributed Gradient Descent (DGD)

  • Distributed Nesterov’s Accelerated Gradient Descent (D-NAG)
  • Distributed Heavy-Ball Method (D-HBM)
  • Alternating Direction Method of Multipliers (ADMM)
  • Block Cimmino Method
  • Consensus Algorithm of Mou et al

We should also note that the computational complexity of ADMM is 𝑂(𝑝 𝑛) per iteration (the inverse is calculated using the matrix inversion lemma), which is again the same as for gradient-type methods and APC. It is not difficult to show that the optimal convergence rate of the Cimmino method is

Underdetermined System

Note that this is exactly the same as in the proof of Theorem 31, where 𝑋 is replaced by 𝑌. This implies that the nullity of the matrix in (6.26) is 𝑛and every steady state solution must be a consensus solution, which satisfies the proof.

Table 6.2: A comparison between the condition numbers of 𝐴 𝑇 𝐴 and 𝑋 for some examples
Table 6.2: A comparison between the condition numbers of 𝐴 𝑇 𝐴 and 𝑋 for some examples

Experimental Results

To further verify the performance of the proposed algorithm, we also run all the algorithms on multiple problems, and observe the actual decay of the error. We should also note that initialization does not seem to affect the convergence behavior of our algorithm.

Figure 6.2: The decay of the error for different distributed algorithms, on two real problems from Matrix Market [141] (QC324: Model of 𝐻 +
Figure 6.2: The decay of the error for different distributed algorithms, on two real problems from Matrix Market [141] (QC324: Model of 𝐻 +

A Distributed Preconditioning to Improve Gradient-Based Methods The noticeable similarity between the optimal convergence rate of APC (The noticeable similarity between the optimal convergence rate of APC (

Again, to make the comparison fair, all methods are tuned to their optimal parameters. As you can see, APC far outperforms the other methods, which is consistent with the order of magnitude of the differences in the convergence times of Table 6.3.

Conclusion

CODED COMPUTATION FOR DISTRIBUTED GRADIENT DESCENT

  • Introduction
    • Related Work
    • Statement of Contributions
  • Preliminaries .1 Problem Setup.1Problem Setup
    • Computational Trade-offs
  • Code Construction
    • Balanced Mask Matrices
    • Correctness of Algorithm 6
    • General construction
  • Column-balanced Mask Matrix M Input
    • Correctness of Algorithm 7
    • Building the Encoding Matrix from the Mask Matrix
    • Efficient Online Decoding
    • Analysis of Total Computation Time
    • Numerical Results
    • Conclusion

Here, the quantity 𝑇consists of the time it takes for the machine to calculate its part of the gradient. For this value of 𝛼, the number of machines required to successfully restore the gradient is given by .

Figure 7.1: Schematic representation of the taskmaster and the 𝑛 workers.
Figure 7.1: Schematic representation of the taskmaster and the 𝑛 workers.

Learning from Data

MINIMAX OPTIMALITY AND IMPLICIT REGULARIZATION OF STOCHASTIC GRADIENT/MIRROR DESCENT

Introduction

  • Our Contribution

Therefore, a general characterization of the behavior of stochastic discount algorithms for more general models would be of great interest. The theory also allows us to establish new results, such as the convergence (in a deterministic sense) of SMD in the over-parametrized linear case.

Preliminaries

Furthermore, we show that many properties recently proven in the learning/optimization literature, such as the implicit regularization of SMD in the overparameterized linear case—when convergence occurs—[85], follow naturally from this theory. We also use the theory developed in this chapter to provide some speculative arguments as to why SMD (and SGD) may have similar convergence and implicit regularization properties in the so-called "highly overparameterized".

Warm-up: Revisiting SGD on Square Loss of Linear Models

  • Conservation of Uncertainty
  • Minimax Optimality of SGD

We assume that the loss𝐿𝑖(·) depends only on the residual, i.e., the difference between the prediction and the true label. However, in this case, the convergence is not surprising - since, effectively, after a while, the weights are no longer updating - and the more interesting question is "what" the recursion converges to.

Gambar

Figure 2.1: State diagram of a single node in different models. Wavy arrows represent exogenous (network-based) transition
Figure 2.2: Reduced Markov chain of a single node in the steady state.
Figure 2.3: Summary of known results for different models. The results have been illustrated as a function of 𝛽𝜆 𝑚𝑎 𝑥 𝛿 ( 𝐴)
Figure 2.4: A typical example of the evolution of an SIS epidemic over an Erdős- Erdős-Rényi graph with 𝑛 = 2000 nodes and 𝜆 max ( 𝐴 ) = 16
+7

Referensi

Dokumen terkait

The OEDB does not issue employ- ment contracts clearances unless two conditions are met: first, their contracts must be attested by Philippine Labor Attaches or appropriate Philippine