In Part I of this thesis, we study one of the most important families of network dynamics, namely epidemics or spreading processes. In Part IV of this thesis, we characterize several properties, such as minimax optimality and implicit regularization, of SGD and more generally of the family of stochastic mirror descent (SMD).
LIST OF ILLUSTRATIONS
109 5.3 The profit under the normal (no restriction) condition and under. optimal) strategic constraint, as a function of the size of the aggregator in the IEEE test case network: a) IEEE 14-Bus Case, b) IEEE 30-Bus Case and c) IEEE 57-Bus Case. Note that each of the four histograms corresponds to an 11×106-dimensional weight vector that perfectly interpolates the data.
LIST OF TABLES
INTRODUCTION
- Major Challenges
- Synopsis of Part I: Network Dynamics
- Synopsis of Part II: Incentives and Markets
- Synopsis of Part III: Distributed Computation
- Synopsis of Part IV: Learning from Data
Distributed computing: The massive computational needs due to the scale of the system (and the fact that data can be distributed across multiple entities) make it virtually impossible to perform computations in a central unit. Moreover, none of the existing schemes is accompanied by a computational algorithm for non-convex overheads.
Network Dynamics
EPIDEMICS OVER COMPLEX NETWORKS: ANALYSIS OF EXACT AND APPROXIMATE MODELS
Introduction
A healthy node has a chance to become infected if it has infected neighbors in the network. Apparently, because the analysis of this Markov chain is too complicated, various 𝑛-dimensional linear and nonlinear approaches have been proposed in the literature.
Models
- Susceptible-Infected-Susceptible (SIS) .1 Exact Markov Chain Model
- Nonlinear Model
- Linear Model
- Susceptible-Infected-Recovered-Susceptible (SIRS)
- Exact Markov Chain Model
- Nonlinear Model
- Linear Model
- Susceptible-Infected-Vaccinated (SIV) .1 Exact Markov Chain Model
- Nonlinear Model
- Linear Model
However, the transition matrix of the discrete-time Markov chain model can have non-zero entries anywhere (except the row of the absorbing state). 𝑃𝑛(𝑡) = 0) is a trivial fixed point of the above model, which is consistent with the absorptive state of the Markov chain model.
Results on the Nonlinear MFA Model
- SIRS/SEIRS/Immune-Admitting-SIS
- SIV/SEIV (Infection-Dominant)
- SIV/SEIV (Vaccination-Dominant)
The disease-free fixed point of the nonlinear model for the infection-dominant SIV and the infection-dominant SEIV epidemics is . The disease-free fixed point of the nonlinear model for the vaccination-dominant SIV and the vaccination-dominant SEIV epidemics is .
Results on the Exact Markov Chain Model
- Connection to the Linear Model
- Connection to the Nonlinear Model
However, it turns out that it provides an upper bound on the probability that the chain is not in the fully healthy state (i.e., the existence of infection) [4], if one initializes the nonlinear model from the state of all infected . Finally we should note that the reason why it is possible for the nonlinear map to converge to a unique non-origin fixed point when 𝛽𝜆max𝛿(𝐴) >1, even though the original Markov chain model always converges to the fully healthy state, is that this is only an upper bound on P(𝜉(𝑡) ≠ ¯0|𝜉(0) = 𝑋).
Heterogeneous Network Models
The origin is the unique fixed point that is globally stable if the largest eigenvalue of 𝑀 is less than 1. It also has a unique nontrivial fixed point that is globally stable if the largest eigenvalue of 𝑀 is greater than 1.
Pairwise and Higher-Order Approximate Models
Similar to the previous examples, 𝜆max(𝑀) < 1 ensures that the mixing time of the Markov chain defined by the transition matrix 𝑆(𝑀) is 𝑂(log𝑛).
Summary and Conclusion
Finally, we should note that characterizing the exact epidemic threshold of the Markov chain model is still an open problem. The nonlinear model has the same Jacobian matrix as that of the previous section.
IMPROVED BOUNDS ON THE EPIDEMIC THRESHOLD OF THE EXACT MODELS
- Introduction
- The Markov Chain and Marginal Probabilities of Infection
- An Alternative Bounding Technique
- Connection to Mixing Time of the Markov Chain
- A Lower Bound on the 𝑝 𝑖 𝑗 ’s
- An Alternative Pairwise Probability ( 𝑞 𝑖 𝑗 )
- Conclusion and Future Work
If 𝜌(𝑀0) < 1, then the mixing time of the Markov chain whose transition matrix 𝑆 is described by Eq. If 𝜌(𝑀00) < 1and1−𝛿−𝛽 ≥ 0, then the mixing time of the Markov chain whose transition matrix𝑆 is described by Eq.
Incentives and Markets
OPTIMAL PRICING IN MARKETS WITH NON-CONVEX COSTS
Introduction
We formalize the desired properties in the literature in Section 4.2 and discuss the properties of the existing schemes in Section 4.5. For example, most of the existing schemes mentioned above are proposed for specific classes of non-convex cost functions and cannot handle more general non-convex ones.
Market Description and Pricing Objectives
- Market Model
- Pricing Objectives
Although in certain cases (e.g. IP pricing [159] for start-up and linear costs) the total cost of suppliers is Í𝑛. Ultimately, the quantity that determines the cost of meeting demand is the total payment to suppliers Í𝑛.
Proposed Scheme: Equilibrium-Constrained Pricing
- Pricing Formulation
- Linear+Uplift Pricing
- An Efficient Approximation Algorithm
Constraint (4.1d) can be equivalently expressed as 4.2) The key difference between EC pricing and existing pricing methods in non-convex markets is that it directly minimizes the total cost paid and tries to find both optimal allocations𝑞∗. Therefore, minimizing the total payment also limits the total production cost, while the opposite is generally not true (minimizing the total production cost can result in very high payments, as can be seen in e.g. the case studies in Figures 4.4a and 4.5a).
Equilibrium-Constrained Pricing for Networked Markets
- Pricing Formulation
When the capacity constraints (4.14c) are relaxed (𝑓𝑒 =−∞, 𝑓𝑒 =∞, ∀𝑒 ∈ 𝐸), the network problem is reduced to the internal market problem. As in the case of the internal market, we define an approximate solution to this problem.
Existing Pricing Schemes
- Pricing in Convex Markets
- Pricing in Non-Convex Markets
However, IP pricing assumes knowledge of the optimal solutions to the unit commitment problem and is therefore not intended as a practical approach to finding the optimal allocation. The scheme is based on the formulation and solution of the SLR of the mixed integer program by the market clearing constraint with standard Lagrange multiplier𝜆 semi-relaxed.
Experimental Results
- Case 1: Linear plus startup cost
- Case 2: Quadratic plus startup cost
- A Networked Market with Capacity Constraints
As you can see in Figure 4-5a, EC1, EC2, EC3, and EC4 reach the possible minimum total payments equal to the total cost. However, it helps to reduce the overall increases, as we can see in Figure 4.7b. a)Total payments as a function of power capacity.
Concluding Remarks
In the optimization problem (4.6), the order of the variables in the minimizations does not matter, and further, for any fixed𝑞1,. Intermediate Nodes: At each new level, there are at most half (+1) nodes as in the previous level.
MANAGING AGGREGATORS IN THE SMART GRID
Introduction
- Summary of Contributions
- Related Work
- Quantifying Market Power in Electricity Markets
- Cyber-Attacks in the Grid
- Algorithms for Managing Distributed Energy Resources
- Algorithms for Bilevel Programs
On the aggregator side, managing a geographically diverse fleet of distributed energy resources is a difficult algorithmic challenge. On the operator's side, the participation of aggregators in electricity markets poses unique challenges in terms of monitoring and limiting the potential of exercising market power.
System Model
- Preliminaries
In contrast, our work presents a polynomial time algorithm that arguably maximizes the gain of the aggregator. The ex-post LMPs are announced as a function of the optimal Lagrange multipliers of this optimization.
The Market Behavior of the Aggregator
- A Profit-Maximizing Aggregator
Since LMPs are themselves the solution to an optimization problem, the aggregator's problem is a bilevel optimization problem. An important note about this problem is that we have assumed that the aggregator has complete knowledge of the network topology (G) and state estimates (pandd).
The Impact of Strategic Curtailment
- An Illustrative Example
- Case Studies
Assume that the total collector generation at each bus is 10𝑀 𝑊 and is able to limit 1% of it (0.1 𝑀 𝑊). As the size of the aggregator (its number of buses) increases, not only does the profit (which is expected) increase, but also.
Optimizing Curtailment Profit
- An Exact Algorithm for Single-Node Aggregators in Arbitrary Networks Even in the simplest case, when the aggregator has only a single node, i.e., its entire
- An Approximation Algorithm for Multi-Bus Aggregators in Radial Net- works
- Evaluation of the Approximation Algorithm
The other important question is what is the impact of strategic curtailment on the price of each bus of the network (not necessarily just the aggregator's buses). In particular, we show that a 𝜖 approximation of the optimal throttling gain can be obtained using an algorithm with running time that is linear in the size of the network and polynomial in 1𝜖.
Concluding Remarks
In this definition, the value of 𝜂𝑖 captures the ability of the generator/aggregator to exercise market power. Focusing on expressions involving𝛼𝑖 for a given𝑖, the objective of the above optimization problem is of the form: (Δ𝑝.
Distributed Computation
DISTRIBUTED SOLUTION OF LARGE-SCALE SYSTEMS OF EQUATIONS
Introduction
In addition to the optimization-based methods, there are some distributed algorithms specially designed for solving systems of linear equations. We provide a full analysis of the convergence rate of APC (Section 6.3), as well as a detailed comparison with all other distributed methods mentioned above (Section 6.4).
The Setup
In our methodology, the task master assigns a subset of equations to each of the machines and invokes a distributed consensus-based algorithm to obtain the solution to the original problem in an iterative manner. At each iteration, each machine updates its solution by adding a scaled version of the projection of an error signal onto the null space of its system of equations, and the taskmaster averages over the solutions with momentum.
Accelerated Projection-Based Consensus .1 The Algorithm.1The Algorithm
- Convergence Analysis
- Computation and Communication Complexity
We analyze the convergence of the proposed algorithm and prove that it has linear convergence (i.e. the error decays exponentially), with no additional assumption imposed. The convergence rate of the algorithm is determined by the spectral radius (largest eigenvalue) of the (𝑚 +1)𝑛× (𝑚 +1)𝑛 block matrix in (6.10).
Comparison with Related Methods .1 Distributed Gradient Descent (DGD).1Distributed Gradient Descent (DGD)
- Distributed Nesterov’s Accelerated Gradient Descent (D-NAG)
- Distributed Heavy-Ball Method (D-HBM)
- Alternating Direction Method of Multipliers (ADMM)
- Block Cimmino Method
- Consensus Algorithm of Mou et al
We should also note that the computational complexity of ADMM is 𝑂(𝑝 𝑛) per iteration (the inverse is calculated using the matrix inversion lemma), which is again the same as for gradient-type methods and APC. It is not difficult to show that the optimal convergence rate of the Cimmino method is
Underdetermined System
Note that this is exactly the same as in the proof of Theorem 31, where 𝑋 is replaced by 𝑌. This implies that the nullity of the matrix in (6.26) is 𝑛and every steady state solution must be a consensus solution, which satisfies the proof.
Experimental Results
To further verify the performance of the proposed algorithm, we also run all the algorithms on multiple problems, and observe the actual decay of the error. We should also note that initialization does not seem to affect the convergence behavior of our algorithm.
A Distributed Preconditioning to Improve Gradient-Based Methods The noticeable similarity between the optimal convergence rate of APC (The noticeable similarity between the optimal convergence rate of APC (
Again, to make the comparison fair, all methods are tuned to their optimal parameters. As you can see, APC far outperforms the other methods, which is consistent with the order of magnitude of the differences in the convergence times of Table 6.3.
Conclusion
CODED COMPUTATION FOR DISTRIBUTED GRADIENT DESCENT
- Introduction
- Related Work
- Statement of Contributions
- Preliminaries .1 Problem Setup.1Problem Setup
- Computational Trade-offs
- Code Construction
- Balanced Mask Matrices
- Correctness of Algorithm 6
- General construction
- Column-balanced Mask Matrix M Input
- Correctness of Algorithm 7
- Building the Encoding Matrix from the Mask Matrix
- Efficient Online Decoding
- Analysis of Total Computation Time
- Numerical Results
- Conclusion
Here, the quantity 𝑇consists of the time it takes for the machine to calculate its part of the gradient. For this value of 𝛼, the number of machines required to successfully restore the gradient is given by .
Learning from Data
MINIMAX OPTIMALITY AND IMPLICIT REGULARIZATION OF STOCHASTIC GRADIENT/MIRROR DESCENT
Introduction
- Our Contribution
Therefore, a general characterization of the behavior of stochastic discount algorithms for more general models would be of great interest. The theory also allows us to establish new results, such as the convergence (in a deterministic sense) of SMD in the over-parametrized linear case.
Preliminaries
Furthermore, we show that many properties recently proven in the learning/optimization literature, such as the implicit regularization of SMD in the overparameterized linear case—when convergence occurs—[85], follow naturally from this theory. We also use the theory developed in this chapter to provide some speculative arguments as to why SMD (and SGD) may have similar convergence and implicit regularization properties in the so-called "highly overparameterized".
Warm-up: Revisiting SGD on Square Loss of Linear Models
- Conservation of Uncertainty
- Minimax Optimality of SGD
We assume that the loss𝐿𝑖(·) depends only on the residual, i.e., the difference between the prediction and the true label. However, in this case, the convergence is not surprising - since, effectively, after a while, the weights are no longer updating - and the more interesting question is "what" the recursion converges to.