0% found this document useful (0 votes)
20 views

Distributed Optimization Methods for Multi-Robot Systems Part 2 survey

Uploaded by

shoaib6174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Distributed Optimization Methods for Multi-Robot Systems Part 2 survey

Uploaded by

shoaib6174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

SURVEY

Distributed
Optimization
Methods for
Multi-Robot Systems
Part 2—A Survey
By Ola Shorinwa , Trevor Halsted,
Javier Yu , and Mac Schwager

Although the field of distributed


optimization is well developed,
relevant literature focused on the
application of distributed optimi-
zation to multi-robot problems is
limited. This survey constitutes
the second part of a two-part
series on distributed optimization
applied to multi-robot problems.
In this article, we survey three
main classes of distributed opti-
mization algorithms—distributed
first-order (DFO) methods, dis- ©SHUTTERSTOCK.COM/ANTONOV SERG

tributed sequential convex pro-


gramming methods, and alternating direction method of INTRODUCTION
multipliers (ADMM) methods—focusing on fully distributed In this article, we survey the literature in distributed optimiza-
methods that do not require coordination or computation by a tion, specifically with an eye toward problems in multi-robot
central computer. We describe the fundamental structure of coordination. As we demonstrated in the first article in this
each category and note important variations around this two-part series [1], many multi-robot problems can be written
structure, designed to address its associated drawbacks. Fur- as a sum of local objective functions, subject to an intersection
ther, we provide practical implications of noteworthy of local constraints. Such problems can be solved with a pow-
assumptions made by distributed optimization algorithms, erful and growing arsenal of distributed optimization algo-
noting the classes of robotics problems suitable for these rithms. Distributed optimization consists of multiple
algorithms. Moreover, we identify important open research computation nodes working together to minimize a common
challenges in distributed optimization, specifically for robot- objective function through local computation iterations and
ics problem. network-constrained communication steps, providing both
computational and communication benefits by eliminating the
need for data aggregation. Distributed optimization is also
Digital Object Identifier 10.1109/MRA.2024.3352852
Date of publication 1 February 2024; date of current version 11 September 2024. robust against the failure of individual nodes, as it does not

154 IEEEAuthorized
ROBOTICS licensed
& AUTOMATION MAGAZINE
use limited SEPTEMBER
to: Stanford 2024
University. 1070-9932/24©2024IEEE
Downloaded on September 13,2024 at 19:03:42 UTC from IEEE Xplore. Restrictions apply.
rely on a central computation station, and many distributed zation algorithms. We note that derivative-free optimization
optimization algorithms have inherent privacy-preserving methods have been discussed extensively in [5], [6], [7], [8],
properties, keeping the local data, objective function, and [9], and [10].
constraint function private to each robot while still allowing In many robotics applications, such as field robotics, com-
for all robots to benefit from one another. Distributed optimi- munication with a central computer (or the cloud) might be
zation has not yet been widely employed in robotics, and infeasible, even though each robot can communicate locally
there exist many open opportunities for research in this with other neighboring robots. Consequently, we focus par-
space, which we highlight in this survey. ticularly on distributed optimization algorithms that permit
Although the field of distributed optimization is well estab- robots to use local robot-to-robot communication to compute
lished in many areas, such as computer networking and power an optimal solution, rather than algorithms that require coor-
systems, problems in robotics have a number of distinguish- dination by a central computer. These methods yield a glob-
ing features that are not often considered in the major applica- ally optimal solution for convex problems and, in general, a
tion areas of distributed optimization. Notably, robots move, locally optimal solution for nonconvex problems, producing
unlike their analogous counterparts in these other disciplines, the same solution quality that would be obtained if a central-
which makes their networks time varying and prone to band- ized method were applied. Although many distributed opti-
width limitations, packet drops, and delays. Robots often use mization algorithms are not inherently “online” (in the sense
optimization within a receding horizon or model predictive that these algorithms were not originally designed to be exe-
control (MPC) loop, so fast convergence to an optimal solu- cuted while the robot is actively gathering data or completing
tion is essential in robotics. In addition, optimization problems a task, providing information that changes its objective and
in robotics are often constrained (e.g., with safety constraints, constraint functions), we note that many of these algorithms
input constraints, or kinodynamics constraints in planning can be applied in these online problems within the MPC
problems) and nonconvex [for example, simultaneous localiza- framework, where a new optimization problem is solved peri-
tion and mapping (SLAM) is a nonconvex optimization, as is odically from streaming data.
trajectory planning and state estimation for any nonlinear robot In this survey, we provide a taxonomy of the different
model]. Many existing surveys on distributed optimization do algorithms for performing distributed optimization, based on
not address these unique characteristics of robotics problems. their defining mathematical characteristics. We identify three
This survey constitutes the second part of a two-part series classes: DFO algorithms, distributed sequential convex pro-
on distributed optimization for multi-robot systems. The first gramming, and distributed extensions to the ADMM:
part consisted of a tutorial focused on the applicability of dis- 1) DFO algorithms: The most common class of distributed
tributed optimization to multi-robot problems. In it, we dem- optimization methods is based on the idea of averaging
onstrated how a broad range of multi-robot problems can be
cast in a form that is appropriate for distributed optimization,
and we provided practical guidelines for implementing dis-
tributed optimization algorithms. In this survey, we highlight
relevant distributed optimization algorithms and note the
classes of robotics problems to which these algorithms can
be applied. Noting the large body of work in distributed opti-
mization, we categorize distributed optimization algorithms
into three broad classes and identify the practical implica- (a)
tions of these algorithms for robotics problems, including the
challenges arising in the implementation of these algorithms
on robotics platforms.
This survey is aimed at robotics researchers who are inter-
ested in research at the intersection of distributed optimization
and multi-robot systems as well as robotics practitioners who
want to harness the benefits of distributed optimization algo-
rithms in solving practical robotics problems (see Figure 1). In
this survey, we limit our discussion to optimization problems (b)
over real-valued decision variables. Although discrete opti-
mization problems (i.e., integer programs or mixed-integer FIGURE 1. A motivation for distributed optimization: consider an
programs) arise in some robotics applications, these problems estimation scenario in which a robot seeks to localize a target
are beyond the scope of this survey. However, we note that when given sensor measurements. (a) The robot can compute
distributed algorithms for integer and mixed-integer problems an optimal solution when given only its observations. (b) By
using distributed optimization techniques, each robot in a net-
have been discussed in a number of different works [2], [3], [4]. worked system of robots can compute the optimal solution when
Further, we limit our discussion to derivative-based methods, given all robots’ observations, without actually sharing individual
in contrast to derivative-free (zeroth-order) distributed optimi- sensor models or measurements with one another.

Authorized licensed use limited to: Stanford University. Downloaded on September 13,2024SEPTEMBER
at 19:03:42 2024 IEEEIEEE
UTC from ROBOTICS & AUTOMATION
Xplore. Restrictions MAGAZINE
apply. 155
local gradients computed by each computational node consensus ADMM (C-ADMM), although the original paper
to perform an approximate gradient descent update [11], [19] did not use this terminology. A number of other dis-
and in this work, we refer to them as DFO algorithms. tributed variants have been developed to address many
DFO algorithms can be further subdivided into distrib- unique characteristics, including unidirectional communi-
uted (sub)gradient descent, distributed gradient track- cation networks and limited communication bandwidth
ing, distributed stochastic gradient descent, and [20], [21], which are often present in robotics problems.
distributed dual averaging (DDA) algorithms, with each
subcategory differing from the others based on the EXISTING SURVEYS
order of the update steps and the nature of the gradients A number of other recent surveys on distributed optimization
used. In general, DFO algorithms use consensus meth- exist and provide useful background when working with the
ods to achieve a shared solution for the optimization algorithms covered in this article. Some of these surveys
problem. Many DFO algorithms allow for dynamic cover applications of distributed optimization in distributed
communication networks (including unidirectional and power systems [22], big data problems [23], and game theory
bidirectional networks) [12], [13] and limited computa- [24], while others focus primarily on first-order methods for
tion resources [14], but they are often not well suited to problems in multiagent control [25]. Other articles broadly
constrained problems. address DFO optimization methods, including a discussion
2) Distributed sequential convex programming: Sequential on the communication–computation tradeoffs [26], [27].
convex optimization is a common technique in centralized Another survey [28] covers exclusively nonconvex optimiza-
optimization that involves minimizing a sequence of con- tion in both batch and data streaming contexts but again ana-
vex approximations to the original (usually nonconvex) lyzes only first-order methods. Finally, [29] covers a wide
problem. Under certain conditions, the sequence of sub- breadth of distributed optimization algorithms with a variety
problems converges to a local optimum of the original of assumptions, focusing exclusively on convex optimization
problem. Newton’s method and the Broyden–Fletcher– problems. Building on the first article in this two-part series
Goldfarb–Shanno (BFGS) method are common examples. [1], which formulates multi-robot problems within the frame-
The same concepts are used by a number of distributed work of distributed optimization, our survey differs from
optimization algorithms, and we refer to these algorithms other existing surveys in that it specifically targets applica-
as distributed sequential convex programming methods. tions of distributed optimization to multi-robot problems:
Generally, these methods use consensus techniques to con- identifying suitable distributed optimization algorithms that
struct the convex approximations of the joint objective address the practical issues arising in multi-robot problems
function. One example is the network Newton method and providing references demonstrating the application of
[15], which uses consensus to approximate the inverse distributed optimization to multi-robot problems. As a result,
Hessian of the objective to construct a quadratic approxi- this survey highlights the practical implications of the
mation of the joint problem. The NEXT family of algo- assumptions made by many distributed optimization algo-
rithms [16] provides a flexible framework, which can rithms and provides a condensed taxonomic overview of use-
utilize a variety of convex surrogate functions to approxi- ful methods for these applications. Other useful background
mate the joint problem and is specifically designed to opti- material can be found for distributed computation [30], [31]
mize nonconvex objective functions. Although many and on multi-robot systems in [32] and [33].
distributed sequential convex programming methods are
not suitable for problems with dynamic communication CONTRIBUTIONS
networks, a few distributed sequential convex program- This survey article has three primary objectives:
ming algorithms are amenable to these problems [16]. 1) survey the literature across three different classes of dis-
3) ADMM: The last class of algorithms covered in this survey tributed optimization algorithms, noting the defining math-
is based on the ADMM [17]. The ADMM works by mini- ematical characteristics of each category
mizing the augmented Lagrangian of the optimization 2) highlight noteworthy assumptions made by distributed opti-
problem, using alternating updates to the primal and dual mization algorithms and provide existing applications of
variables [18]. This method naturally accommodates con- distributed optimization algorithms to multi-robot problems
strained problems (with the assumption that we can con- 3) propose open research problems in distributed optimiza-
vert inequality constraints to equality constraints by using tion for robotics.
slack variables). The original method is distributed but not
in the sense we consider in this survey. Specifically, the ORGANIZATION
original ADMM requires a central computation hub to col- In the “Notation and Preliminaries” section, we introduce
lect all local primal computations from the nodes to per- mathematical notation and preliminaries, and in the “Prob-
form a centralized dual update step. The ADMM was first lem Formulation” section, we present the general formulation
modified to remove this requirement for a central node in for the distributed optimization problem and describe the
[19], where it was used for distributed signal processing. general framework shared by distributed optimization algo-
The algorithm from [19] has since become known as the rithms. The “DFO Algorithms,” “Distributed Sequential

156 IEEEAuthorized
ROBOTICS licensed
& AUTOMATION MAGAZINE
use limited SEPTEMBER
to: Stanford 2024
University. Downloaded on September 13,2024 at 19:03:42 UTC from IEEE Xplore. Restrictions apply.
Convex Programming,” and “ADMM” sections survey the We next discuss some relevant notions of the connectivity
literature in each of the three categories and provide details of a graph.
for representative algorithms in each category. The “Distrib-
uted Optimization in Robotics and Related Applications” sec- DEFINITION 5: CONNECTIVITY
tion provides existing applications of distributed optimization OF AN UNDIRECTED GRAPH
in the robotics literature. In the “Research Opportunities in An undirected graph G is connected if a path exists between
Distributed Optimization for Multi-robot Systems” section, every pair of vertices (i, j ), where i, j ! V. Note that such a
we discuss open research problems in applying distributed path might traverse other vertices in G.
optimization to multi-robot systems and robotics in general,
and we offer concluding remarks in the “Conclusion” section. DEFINITION 6: CONNECTIVITY OF A DIRECTED GRAPH
A directed graph G is strongly connected if a directed path
NOTATION AND PRELIMINARIES exists between every pair of vertices (i, j), where i, j ! V. In
In this section, we introduce the notation used in this article addition, a directed graph G is weakly connected if the
and provide the definitions of mathematical concepts relevant underlying undirected graph is connected. The underlying
to the discussion of the distribution optimization algorithms. undirected graph G u of a directed graph G refers to a graph
We denote the gradient of a function f : R n " R as df and with the same set of vertices as G and a set of edges obtained
its Hessian as d 2 f. We denote the vector containing all ones by considering each edge in G a bidirectional edge. Conse-
as 1 n, where n represents the number of elements in the vec- quently, every strongly connected directed graph is weakly
tor. Next, we begin with the definition of stochastic matrices connected; however, the converse is not true.
that arise in DFO optimization algorithms. In distributed optimization in multi-robot systems, robots
perform communication and computation steps to minimize
DEFINITION 1: NONNEGATIVE MATRIX some global objective function. We focus on problems in
A matrix W ! R n # n is referred to as a nonnegative matrix if which the robots’ exchange of information must respect the
w ij $ 0 for all i, j ! " 1, f, n , . topology of an underlying distributed communication graph,
which could possibly change over time. This communication
DEFINITION 2: STOCHASTIC MATRIX graph, denoted as G (t) = ^V (t ), E (t )h, consists of vertices
A nonnegative matrix W ! R n # n is referred to as a row-sto- V (t) = " 1, f, N , and edges E (t ) 3 V (t ) # V (t ) over which
chastic matrix if pairwise communication can occur. For undirected graphs, we
denote the set of neighbors of robot i as N i (t ) . For directed
W1 n = 1 n .  (1)
graphs, we refer to the set of robots that can send informa-
In other words, the sum of all elements in each row of the tion to robot i as the set of in neighbors of robot i, denoted
matrix equals one. We refer to W as a column-stochastic by N +i (t ) . Likewise, for directed graphs, we refer to the set
matrix if of robots that can receive information from robot i as the out
neighbors of robot i, denoted by N -i (t ) .
1 n< W = 1 n< .  (2)
DEFINITION 7: CONVERGENCE RATE
Likewise, for a doubly stochastic matrix W, Provided that a sequence " x (k) , converges to x *, if there
exists a positive scalar r ! R, with r $ 1, and a constant
W1 n = 1 n, and 1 n< W = 1 n< .  (3) m ! R, with m 2 0, such that

Now we provide the definition of some relevant properties x (k + 1) - x *


lim r = m (6)
of a sequence. k "3
x (k) - x *

DEFINITION 3: SUMMABLE SEQUENCE then r defines the order of convergence of the sequence " x (k) ,
A sequence " a (k) ,k $ 0, with k ! N, is a summable sequence to x * . Moreover, the asymptotic error constant is given by m.
if a (k) 2 0 for all k and If r = 1 and m = 1, then " x (k) , converges to x * sublinearly.
However, if r = 1 and m 1 1, then " x (k) , converges to x * lin-
early. Likewise, " x (k) , converges to x * quadratically if r = 2
3
/ a (k) 1 3. (4)
k=0 and cubically if r = 3.

DEFINITION 4: SQUARE-SUMMABLE SEQUENCE DEFINITION 8: SYNCHRONOUS ALGORITHM


A sequence " a (k) ,k $ 0, with k ! N, is a square-summable An algorithm is synchronous if each robot (computational node)
sequence if a (k) 2 0 for all k and has to wait at a predetermined point for a specific message
from other robots (computational nodes) before proceeding. In
/ ^a (k)h2 1 3.
3
(5) general, the end of an iteration of the algorithm represents the
k=0 predetermined synchronization point. Conversely, in an

Authorized licensed use limited to: Stanford University. Downloaded on September 13,2024SEPTEMBER
at 19:03:42 2024 IEEEIEEE
UTC from ROBOTICS & AUTOMATION
Xplore. Restrictions MAGAZINE
apply. 157
a­ synchronous algorithm, each robot completes each iteration at providing compatibility with a point-to-point communication
its own pace, without having to wait at a predetermined point. network, where robots can communicate only with their one-
In other words, at any given time, the number of iterations of an hop neighbors. To simplify notation, we introduce the set
asynchronous algorithm completed by each robot could differ X i = " x i ; g i (x i) = 0, h i (x i) # 0 ,, representing the feasible set
from the number of iterations completed by other robots. given the constraint functions g i and h i . Consequently, we
can express the problem in (8) succinctly, as follows:
PROBLEM FORMULATION
We consider a general class of separable distributed optimization
min
" x i ! X i, 6i ! V ,
/ fi (x i)
i!V 
problems in which we express a joint objective function as the subject to x i = x j 6 (i, j ) ! E. (9)
sum over local objective functions. From a multi-robot perspec-
tive, each robot knows only its own local function, but the robots In the following sections, we discuss three broad classes
collectively seek to find the optimum to the global function. In of distributed optimization methods, namely, DFO methods,
this general formulation, we also consider a set of joint con- distributed sequential convex programming methods, and the
straints consisting of an intersection over local constraints. Each ADMM. We note that DFO methods and distributed sequen-
robot knows only its own local constraints and its local objec- tial convex programming methods implicitly enforce the con-
tive function. The resulting optimization problem is given by sensus constraints in (9), while the ADMM enforces these
constraints explicitly. While not all the methods that we survey
min
x
/ fi (x) explicitly address constraints of the form g i (x) = 0, h i (x) # 0,
i!V we note in each section considerations to accommodate these
subject to g i (x) = 0 6i ! V  additional terms. In some cases, it is also appropriate to incor-
h i (x) # 0 6i ! V (7) porate the constraints as penalty terms in the cost function.
Before proceeding, we highlight the general framework
where x ! R n denotes the optimization variable and fi : R n " R, that distributed optimization algorithms share. Distributed
g i : R n " R, and h i : R n " R denote the local objective func- optimization algorithms are iterative algorithms in which each
tion, equality constraint function, and inequality constraint robot executes a number of operations over discrete iterations
function of robot i, respectively. The joint optimization prob- k = 0, 1, f until convergence, where each iteration consists of
lem (7) can be solved locally by each robot if all the robots a communication and computation step. During each commu-
share their objective and constraint functions with one another. nication round, each robot shares a set of its local variables
Alternatively, the solution can be computed centrally if all the with its neighbors, referred to as its “communicated” variables
(k) (k)
local functions are collated at a central station. However, Q i , which we distinguish from its “internal” variables P i ,
robots typically possess limited computation and communica- which are not shared with its neighbors. In general, each algo-
tion resources, which precludes each robot from sharing its rithm requires initialization of the local variables of each robot
(k)
local functions with other robots, particularly in problems with in addition to algorithm-specific parameters, denoted by R i .
high-dimensional problem data, such as images, lidar, and We note that some algorithms require all the robots to utilize a
other perception measurements. common step-size at initialization; however, these parameters
Distributed optimization algorithms enable each robot to com- can be initialized prior to deployment of the robots.
pute a solution of (7) locally without sharing its local objective,
constraints, or data. These algorithms assign a copy of the opti- DFO ALGORITHMS
mization variable to each robot, enabling each robot to update its The optimization problem in (7) (in its unconstrained form)
own copy locally and in parallel with the other robots. Moreover, can be solved through gradient descent, where the optimiza-
distributed optimization algorithms enforce consensus among the tion variable is updated using
robots for agreement on a common solution of the optimization
problem. Consequently, these algorithms solve an equivalent x (k + 1) = x (k) - a (k) df (x (k)) (10)
reformulation of the optimization problem in (7), given by
with df ^ x (k) h denoting the gradient of the objective function
min
" x i, 6i ! V ,
/ fi (x i) at x (k), given by
i!V
subject to x i = x j 6(i, j ) ! E  df (x) = / d fi (x) (11)
i!V
g i (x i) = 0 6i ! V
h i (x i) # 0 6i ! V (8) given some scheduled step-size a (k) . Inherently, computation
of df (x (k)) requires knowledge of the local objective func-
where x i ! R n denotes robot i’s local copy of the optimization tions or gradients by all robots in the network, which is infea-
variable. We note that the consensus constraints in (8) ensure sible in many problems.
agreement among all the robots, with the assumption that the DFO algorithms extend the centralized gradient scheme to
communication graph is connected. Moreover, the consensus the distributed setting, where robots communicate with one-hop
constraints are enforced between neighboring robots only, neighbors without knowledge of the local objective functions

158 IEEEAuthorized
ROBOTICS licensed
& AUTOMATION MAGAZINE
use limited SEPTEMBER
to: Stanford 2024
University. Downloaded on September 13,2024 at 19:03:42 UTC from IEEE Xplore. Restrictions apply.
or gradients of all robots. In DFO methods, each robot updates is square summable but not summable, then the optimization
its local variable by using a weighted combination of the local variables of all robots converge to the optimal joint solution
variables or gradients of its neighbors according to the weights given the standard assumptions of a connected network, prop-
specified by a stochastic weighting matrix W, allowing for the erly chosen weights, and bounded (sub)gradients. In contrast,
dispersion of information on the objective function or its gradient the choice of a constant step-size for all time steps guarantees
through the network. The stochastic matrix W must be compat- only the convergence of each robot’s iterates to a neighbor-
ible with the underlying communication network, with a nonzero hood of the optimal joint solution. In practice, this means that
element w ij when robot j can send information to robot i. a multi-robot system implementing DGD must coordinate on
From the perspective of a single robot, the update equations scheduling the decrease of the step-size. Nonetheless, DGD
in DFO methods represent a tradeoff between the optimality can generally tolerate some level of asynchrony or stochas-
of the robot’s individual solution based on its local objective ticity. Algorithm 1 summarizes the update step for the DGD
function and agreement with its neighbors. Consensus enables method in [14], with the step-size a (k) = a (0)/ k , with a (0) 2 0.
the robot to incorporate global information about the objective We note that the update procedure given in (12) requires
function’s shape into its update, thereby allowing it to approxi- a doubly stochastic weighting matrix, which, in general, is
mate a gradient descent step on the global cost function rather incompatible with directed communication networks. Other
than on its local cost function. DGD algorithms [37], [38], [39], [40] utilize the push-sum
Many DFO algorithms use a doubly stochastic matrix, a consensus protocol [41] in place of the consensus terms in (12),
row-stochastic matrix [34], or a column-stochastic matrix, extending the application of DGD schemes to problems with
depending on the model of the communication network con- directed communication networks.
sidered, while other methods use a push-sum approach. In In general, with a constant step-size, distributed (sub)gradi-
addition, many methods further require symmetry of the dou- ent descent algorithms converge at a rate of O ^1/k h to a neigh-
bly stochastic weighting matrix with W = W < . The weight borhood of the optimal solution in convex problems [42]. With
matrix exerts a significant influence on the convergence rates a decreasing step-size, some distributed (sub)gradient descent
of DFO algorithms, and thus, an appropriate choice of these algorithms converge to an optimal solution at O ^log k/k h by
weights is required for convergence of DFO methods. using an accelerated gradient scheme, such as the Nesterov
The order of the update procedures for the local variables gradient method [43].
of each robot and the gradient used by each robot in perform-
ing its local update procedures differ among DFO algorithms, DISTRIBUTED GRADIENT TRACKING ALGORITHMS
giving rise to four broad classes of DFO methods: distributed Although distributed (sub)gradient descent algorithms con-
(sub)gradient descent and diffusion algorithms, gradient track- verge to an optimal joint solution, the requirement of a
ing algorithms, distributed stochastic gradient algorithms, square-summable sequence " a (k) ,, which results in a decay-
and DDA. While distributed (sub)gradient descent algorithms ing step-size, reduces the convergence speed of these meth-
require a decreasing step-size for convergence to an optimal ods. Gradient tracking methods address this limitation by
solution, gradient tracking algorithms converge to an optimal allowing each robot to utilize the changes in its local gradient
solution without this condition. We discuss these distributed between successive iterations as well as a local estimate of
methods in the following sections. the average gradient across all robots in its update proce-
dures, enabling the use of a constant step-size while retaining
DISTRIBUTED (SUB)GRADIENT DESCENT convergence to the optimal joint solution.
AND DIFFUSION ALGORITHMS The EXTRA algorithm introduced by Shi et al. in [44]
Tsitsiklis introduced a model for distributed gradient descent uses a fixed step-size while still achieving exact convergence.
(DGD) in the 1980s in [35] and [11] (see also [30]). The
works of Nedić and Ozdaglar in [14] revisit the problem,
ALGORITHM 1. DGD.
marking the beginning of interest in consensus-based frame-
works for distributed optimization over the past decade. This Initialization: k ! 0, x i ! R n
(0)

basic model of DGD consists of an update term that involves (k)


Internal variables: P i = 4
(k) (k)
consensus on the optimization variable as well as a step in Communicated variables: Q i = x i
(k) (k)
the direction of the local gradient for each node: Parameters: R i = (a , w i)
do in parallel 6i ! V
/ w ij x j - a i dfi ^ x i h (12)
(k +1) (k) (k) (k) (k)
xi =   Communicate Q i to all j ! N i
j ! Ni , " i ,
(k)
  Receive Q j from all j ! N i

where robot i updates its variable using a weighted combina- a (0)


a (k) =
k
tion of its neighbors’ variables determined by the weights w ij,    
with a i (k) denoting its local step-size at iteration k. x
(k + 1)
i = / (k) (k)
w ij x j - a (k) dfi (x i )
j ! N i , {i }
For convergence to the optimal joint solution, these meth-
  k ! k + 1
ods require the step-size to asymptotically decay to zero. As
proved in [36], if a (k) is chosen such that the sequence " a (k) ,
while stopping criterion is not satisfied

Authorized licensed use limited to: Stanford University. Downloaded on September 13,2024SEPTEMBER
at 19:03:42 2024 IEEEIEEE
UTC from ROBOTICS & AUTOMATION
Xplore. Restrictions MAGAZINE
apply. 159
EXTRA replaces the gradient term with the difference in the In general, gradient tracking algorithms address uncon-
gradients of the previous two iterates. Because the contribu- strained distributed convex optimization problems, but these
tion of this gradient difference term decays as the iterates con- methods have been extended to nonconvex problems [63] and
verge to the optimal joint solution, EXTRA does not require constrained problems using projected gradient descent [64],
the step-size to decay in order to converge to the exact optimal [65], [66]. Some other methods [67], [68], [69], [70] perform
joint solution. EXTRA achieves linear convergence [42], and dual ascent on the dual problem of (7), where the robots com-
a variety of gradient tracking algorithms have since offered pute their local primal variables from the related minimiza-
improvements on its linear rate [45] for convex problems with tion problem by using their dual variables. These methods
strongly convex objective functions. require doubly stochastic weighting matrices but allow for
The DIGing algorithm [46] whose update equations are time-varying communication networks. DFO methods have
shown in Algorithm 2, is one such similar method that extends been extended to the constrained setting [71], where each robot
the faster convergence properties of EXTRA to the domain performs a subsequent proximal projection step to obtain solu-
of directed and time-varying graphs. DIGing requires com- tions that satisfy the problem constraints.
munication of two variables, effectively doubling the commu- In deep learning problems, the associated objective function
nication cost per iteration when compared to DGD but greatly often consists of a sum over a very large number of data points.
increasing the diversity of communication infrastructure that Computing exact gradients for such problems can be prohibi-
it can be deployed on. tively costly, so gradients are approximated by randomly sam-
Many other gradient tracking algorithms involve variations pling a subset of the data at each iteration and computing the
on the variables updated using consensus and the order of the gradient only over those data. Such methods, called stochastic
update steps, such as NIDS [48] and Exact Diffusion [49], [50], gradient descent, dominate in deep learning. In [72], stochastic
[51], [52]. These algorithms, which generally require the use of gradients are used in place of gradients in the DGD algorithm,
doubly stochastic weighting matrices, have been extended to and the resulting algorithm is shown to converge.
problems with row-stochastic or column-stochastic matrices
[12], [13], [53], [54] and push-sum consensus [55] for distrib- DDA
uted optimization in directed networks. To achieve faster con- Dual averaging, first posed in [73] and extended in [74], takes a
vergence rates, many of these algorithms require each robot to similar approach to distributed (sub)gradient descent methods
communicate multiple local variables to its neighbors during in solving the optimization problem in (7), with the added ben-
each communication round. In addition, we note that some of efit of providing a mechanism for handling problem constraints
these algorithms require all robots to use the same step-size, through a projection step, in a like manner to projected (sub)
which can prove challenging in some situations. Several works gradient descent methods. However, the original formulations
offer a synthesis of various gradient tracking methods, not- of the dual averaging method require knowledge of all compo-
ing the similarities among these methods. Under the canonical nents of the objective function or its gradient, which is unavail-
form proposed in [56] and [57], these algorithms and oth- able to all robots. The DDA method circumvents this limitation
ers differ only in the choice of several constant parameters. by modifying the update equations through a doubly stochastic
Jakovetić also provides a unified form for various gradient weighting matrix to allow for updates of each robot’s variable
tracking algorithms in [58]. Some other works consider accel- by using its local gradients and a weighted combination of the
erated variants using Nesterov gradient descent [59], [60], [61]. variables of its neighbors [75].
Gradient tracking algorithms can be considered primal–dual Similar to distributed (sub)gradient descent methods, DDA
methods with an appropriately defined augmented Lagrangian requires a sequence of decreasing step-sizes to converge to the
function [46], [62]. optimal solution. Algorithm 3 provides the update ­equations

ALGORITHM 2. DIGing. ALGORITHM 3. DDA.


(0) (0) (0) (0) (0) (0)
Initialization: k ! 0, x i ! R n, y i = dfi (x i ) Initialization: k ! 0, x i ! R n, z i = x i
(k) (k)
Internal variables: P i = 4 Internal variables: Pi = z i
(k) (k) (k) (k)
Communicated variables: Q i = (x i , y ik) Communicated variables: Q i = x i
(k) (k) (k)
Parameters: R i = (a, w i) Parameters: R i = (a , w i, z ( $ ))
do in parallel 6i ! V do in parallel 6i ! V
(k) (k)
  Communicate Q i to all j ! N i   Communicate Q i to all j ! N i
(k) (k)
  Receive Q j from all j ! N i   Receive Q j from all j ! N i
(k + 1)
xi = / (k)
w ij x j - ay i
(k) (k + 1)
zi = / (k)
w ij z j + dfi (x i )
(k)

j ! N i , {i } j ! N i , {i }
       
(k + 1)
yi = / (k)
w ij y j + dfi (x i
(k + 1) (k)
) - dfi (x i ) (k + 1)
xi = argmin ' x < z i
(k + 1)
+ 1(k) z (x) 1
j ! N i , {i } x ! Xi a

  k ! k + 1   k ! k + 1
while stopping criterion is not satisfied while stopping criterion is not satisfied

160 IEEEAuthorized
ROBOTICS licensed
& AUTOMATION MAGAZINE
use limited SEPTEMBER
to: Stanford 2024
University. Downloaded on September 13,2024 at 19:03:42 UTC from IEEE Xplore. Restrictions apply.
in the DDA algorithm along with the projection step, which sian inverse comes at an additional communication cost and
involves a proximal function z (x), often defined as ^1/2 h < x <22 . requires an additional K communication rounds per update
After the projection step, the robot’s variable satisfies the prob- of the primal variable. Algorithm 4 summarizes the update
lem constraints described by the constraints set X. Some of procedures in the NN-K method, in which e denotes the
the same extensions made to distributed (sub)gradient descent local step-size for the Newton step. Selection of the step-size
algorithms have been studied for DDA, including analysis parameter does not require any coordination among robots.
of the algorithm under communication time delays [76] and As presented in Algorithm 4, the NN-K proceeds through two
replacement of the doubly stochastic weighting matrix with sets of update equations: an outer set of updates that initializes
push-sum consensus [77]. the Hessian approximation and computes the decision variable
update and an inner Hessian approximation update; a commu-
DISTRIBUTED SEQUENTIAL CONVEX PROGRAMMING nication round precedes the execution of either set of update
Sequential convex programming is a class of optimization equations. Increasing K, the number of intermediary commu-
methods, typically for nonconvex problems, that proceed itera- nication rounds, improves the accuracy of the approximated
tively by approximating the nonconvex problem with a convex Hessian inverse at the cost of increasing the communication
surrogate computed from the current values of the decision cost per primal variable update.
variables. This convex surrogate is optimized, and the resulting A follow-up work optimizes a quadratic approximation
decision variables are used to compute the convex surrogate of the augmented Lagrangian of the general distributed opti-
for the next iterate. Newton’s method is a classic example of a mization problem (7) in which the primal variable update
sequential convex method, in which the convex surrogate is a involves computing a P-approximate Hessian inverse to per-
quadratic approximation of the original objective function. form a Newton descent step, and the dual variable update uses
Several methods have been proposed for distributed sequential gradient ascent [80]. The resulting exact second-order method
convex programming, as we survey here. As with DFO meth- (ESOM) algorithm provides a faster convergence rate than the
ods, distributed sequential convex programming takes the per- NN-K at the cost of one additional round of communication
spective of using consensus to approximate the global for the dual ascent step. Notably, replacing the augmented
objective function, with the addition of approximating not only Lagrangian in the ESOM formulation with its linear approxi-
the global gradient but also the global Hessian. The benefit of mation results in the EXTRA update equations, showing the
this approach is that convergence typically requires fewer itera- relationship between both approaches.
tions and is less dependent on carefully selecting a step-size. In some cases, computation of the Hessian is impossible
This comes at the expense of requiring the robots to communi- because second-order information is not available or intracta-
cate more information in order to approximate the second- ble due to the large dimensions of the problem. Quasi-Newton
order characteristics of the global objective function. methods like the BFGS algorithm approximate the Hessian
when it cannot be directly computed. The distributed BFGS
APPROXIMATE NEWTON METHODS (D-BFGS) algorithm [81] replaces the second-order ­information
Newton’s method and its variants are commonly used for
solving convex optimization problems, and they provide sig-
nificant improvements in the convergence rate when second- ALGORITHM 4. NN-K.
order function information is available [78]. To apply (0)
Initialization: k ! 0, x i ! R n
Newton’s method to the distributed optimization problem in (k) (k) (k)
Internal variables: P i = (g i , D i )
(7), the network Newton-K (NN-K) algorithm [15] takes a Communicated variables: i (x i , d i
Q =
(k) (k + 1)
)
penalty-based approach that introduces consensus among the Parameters: R i = (a, e, K, wr i)
robots’ variables as components of the objective function. do in parallel 6i ! V
The NN-K method reformulates the constrained form of the     D i
(k + 1) (k)
= ad 2 fi (x i ) + 2w
r ii I
distributed problem in (7) as the following unconstrained (k)
  Communicate x i to all j ! N i
optimization problem:
gi
(k + 1)
= adfi (x i ) +
(k)
/ r ij x (jk)
w
  
/ /
j ! N i , {i }
min fi (x i) + x <i c wr ij x j m (13) h g (i k + 1)
d i = - ^D i
a (0) (k + 1) -1
" x i ! R n, 6i ! V ,
i!V j ! N ," i ,
  for p = 0 to K - 1 do
( p)
where W r = I - W and a is a weighting hyperparameter.       Communicate d i to all j ! N i
However, the Newton descent step requires computing the di
      
(p + 1)
= (D i
(k + 1) -1
) ;w
r ii d (i p) - g (i k + 1) - / r ij d (jp)E
w
inverse of the joint problem’s Hessian, which cannot be direct- j ! N i , {i }

ly computed in a distributed manner, as its inverse is dense. end


  
To allow for distributed computation of the Hessian inverse,     x i
(k + 1) (k)
= xi + e di
(K)

the NN-K uses the first K terms of the Taylor series expan-
sion ^ I - X h-1 = R 3j = 0 X j to compute the approximate Hessian
  k ! k + 1
while stopping criterion is not satisfied
inverse, as introduced in [79]. Approximation of the Hes-

Authorized licensed use limited to: Stanford University. Downloaded on September 13,2024SEPTEMBER
at 19:03:42 2024 IEEEIEEE
UTC from ROBOTICS & AUTOMATION
Xplore. Restrictions MAGAZINE
apply. 161
in the primal update in ESOM with a BFGS approximation (i.e., generally slower convergence rates as well as additional
(k)
it replaces D i in a call to the Hessian approximation equations hyperparameter tuning.
in Algorithm 4 with an approximation) and results in essentially The SONATA [84] algorithm extends the surrogate function
a “doubly” approximate Hessian inverse. In [82], the D-BFGS principles of NEXT and proposes a variety of nondoubly sto-
method is extended so that the dual update also uses a distrib- chastic weighting schemes that can be used to perform gradi-
uted quasi-Newton update scheme rather than gradient ascent. ent averaging similar to the push-sum protocols. The authors of
The resulting primal–dual quasi-Newton method requires two SONATA also show that several configurations of the algorithm
consecutive iterative rounds of communication, doubling the result in already proposed distributed optimization algorithms,
communication overhead per primal variable update compared including Aug-DGM [85], Push-DIG [46], and ADD-OPT [53].
to its predecessors (NN-K, ESOM, and D-BFGS). However, the
resulting algorithm is shown by the authors to still converge ADMM
faster in terms of required communication. In addition, asyn- Considering the optimization problem in (9) with only agree-
chronous variants of the approximate Newton methods have ment constraints, we have
been developed [83].
min
" x i ! R n, 6i ! V ,
/ fi (x i) (14)
i!V
CONVEX SURROGATE METHODS subject to x i = x j 6 (i, j ) ! E. (15)
While the approximate Newton methods in [80], [81], and
[82] optimize a quadratic approximation of the augmented The method of multipliers solves this problem by alternat-
Lagrangian of (13), other distributed methods allow for more ing between minimizing the augmented Lagrangian of the
general and direct convex approximations of the distributed optimization problem with respect to the primal variables
optimization problem. These convex approximations general- x 1, f, x n (the “primal update”) and taking a gradient step
ly require the gradient of the joint objective function, which is to maximize the augmented Lagrangian with respect to the
inaccessible to any single robot. In the NEXT family of algo- dual (the “dual update”). The augmented Lagrangian of (14)
rithms [16], dynamic consensus is used to allow each robot to is given by
approximate the global gradient, and that gradient is then used
/ `q i<,j ( x i - x j) + t2 < x i - x j <22 j (16)
N N
to compute a convex approximation of the joint objective func- L a (x, q) = / fi (x i) + /
i =1 i =1 j ! N i
tion locally. A variety of surrogate functions U ($) are proposed,
including linear, quadratic, and block convex functions, which where q i, j represents a dual variable for the consensus con-
straints between robots i and j, q = 6q <i, j, 6(i, j ) ! E@ , and
<
allows for greater flexibility in tailoring the algorithm to indi-
vidual applications. Using its surrogate of the joint objective x = 6x 1 , x 2 , f, x N@ . The parameter t 2 0 represents a pen-
< < < <

function, each robot updates its local variables iteratively by alty term on the violations of the consensus constraints. The
solving its surrogate for the problem and then taking a quadratic penalty term is what distinguishes the augmented
weighted combination of the resulting solution with the solu- Lagrangian, and it also distinguishes the method of multipli-
tions of its neighbors. To ensure convergence, NEXT algo- ers from dual ascent. The main benefit of using the augment-
rithms require a series of decreasing step-sizes, resulting in ed Lagrangian is that the quadratic term essentially serves as
a proximal operator and helps to ensure convergence.
In the ADMM, given the separability of the global objec-
ALGORITHM 5. NEXT. tive function, the primal update is executed as successive
minimizations over each primal variable (i.e., choose the
(0) (0) (0) (k + 1) (0) (0)
Initialization: k ! 0, x i ! R n, y i = dfi (x i ), ru i = Ny i - dfi (x i ) minimizing x 1 with all other variables fixed, then choose the
(k) (k) (k)
Internal variables: Pi = (x i , xu i , ru i ) minimizing x 2, and so on). Most ADMM-based approaches
(k) (k) (k)
Communicated variables: Q i = (z i , y i ) do not satisfy our definition of distributed in that either the
(k)
Parameters: R i = (a (k), w i, U ( $ ), X i)
do in parallel 6i ! V
primal updates take place sequentially rather than in parallel
or the dual update requires centralized computation [86], [87],
(k) (k) (k)
xu i = argmin U (x; x i , ru i ) [88]. However, the C-ADMM provides an ADMM-based opti-
    x ! Xi
(k) (k)
z i = x i + a (k) (xu i - x i )
(k) (k) mization method that is fully distributed: the nodes alternate
(k)
between updating their primal and dual variable and commu-
  Communicate Q i to all j ! N i nicating with neighboring nodes [19], [89].
(k)
  Receive Q j from all j ! N i
To achieve a distributed update of the primal and dual vari-
xi
(k + 1)
= / w ij z j
(k)
ables, the C-ADMM alters the agreement constraints among
j ! N i , {i }
agents with an existing communication link by introducing aux-
/ w ij y j + 6dfi (x i ) - dfi (x i )@
(k + 1) (k) (k + 1) (k)
    y i = iliary primal variables in (9) (instead of the constraint x i = x j,
j ! N i , {i }
(k + 1)
ru i = N $ yi
(k + 1)
- dfi (x i
(k + 1)
) we have two constraints: x i = z ij and x j = z ij) . Considering the
optimization steps across the entire network, the C-ADMM
  k ! k + 1
while stopping criterion is not satisfied
proceeds by optimizing the original primal variables, then the
auxiliary primal variables, and then the dual variables, as in the

162 IEEEAuthorized
ROBOTICS licensed
& AUTOMATION MAGAZINE
use limited SEPTEMBER
to: Stanford 2024
University. Downloaded on September 13,2024 at 19:03:42 UTC from IEEE Xplore. Restrictions apply.
original formulation of the ADMM. We can perform minimi- based methods, such as in [99], [100], and [101], which also
zation with respect to the primal variables and gradient ascent accommodate asynchronous or lossy communication. Like
with respect to the dual variables on an augmented Lagrang- SOVA, these methods exploit the partitioning of the state vari-
ian that is fully distributed among the robots. Further, we note ables, in that robot i need not estimate the states that are not
that although the ADMM is typically applied to equality-con- relevant to its local objective function.
strained problems, the method can be extended to inequality- In SOVA, each agent optimizes only over variables rel-
constrained problems quite easily. In particular, we note that evant to its data or role, enabling robotic applications in which
inequality-constrained problems can be expressed as equali- agents have minimal access to computation and communication
ty-constrained problems using indicator functions. With this resources. SOVA introduces consistency constraints between
approach, corresponding update procedures for constrained each agent’s local optimization variable and its neighbors, map-
optimization problems can be derived using the ADMM. ping the elements of the local optimization variables, given by
Algorithm 6 summarizes the update procedures for the
local primal and dual variables of each agent in constrained U ij x i = U ji x j 6j ! N i, 6i ! V
optimization problems, where y i represents the dual variable
that enforces agreement between robot i and its neighbors. We where U ij and U ji map elements of x i and x j to a common
have incorporated the solution of the auxiliary primal variable space. The C-ADMM represents a special case of SOVA
(k + 1)
update into the update procedure for x i , noting that the where U ij is always the identity matrix. The update procedures
auxiliary primal variable update can be performed implicitly for each agent reduce to the equations given in Algorithm 7.
6z )ij = ^1/2h ^ x i + x jh@ . The parameter t that weights the qua-
dratic terms in L a is also the step-size in the gradient ascent
of the dual variable. We note that the update procedure for ALGORITHM 6. C-ADMM.
(k + 1)
xi requires solving an optimization problem, which might (0)
Initialization: k ! 0, x i ! R n, y i = 0
(0)

be computationally intensive for certain objective functions. (k)


Internal variables: P i = y i
(k)

To simplify the update complexity, the optimization can be Communicated variables: Q i = x i


(k) (k)

(k)
solved inexactly using a linear approximation of the objective Parameters: R i = t
function, such as [90], [91], and [92], or a quadratic approxi- do in parallel 6i ! V
mation using the Hessian, such as decentralized quadratically
= argmin ) fi (x i) + x i< y i g
(k + 1) (k)
approximated ADMM [93]. The convergence rate of ADMM xi
xi ! Xi
   
xi - 1 ` xi + x j j 3
methods depends on the value of the penalty parameter t. Sev- (k) 2
+t / 2
(k)

eral works discuss effective strategies for optimally selecting j ! Ni


2

t [94]. In general, convergence of the C-ADMM and its vari- (k)


  Communicate Q to all j ! N i i
ants is guaranteed only when the dual variables sum to zero, a (k)
  Receive Q j from all j ! N i
condition that could be challenging to satisfy in problems with
unreliable communication networks. Other distributed ADMM     y i
(k + 1) (k)
= yi + t / ` x (i k + 1) - x (jk + 1) j
j ! Ni
variants that do not require this condition have been developed
  k ! k + 1
[95], [96]. However, these methods incur a greater communica- while stopping criterion is not satisfied
tion overhead to provide robustness in these problems. Gradi-
ent tracking algorithms are related to the C-ADMM when the
minimization problem in the primal update procedure is solved
using a single gradient decent update. ALGORITHM 7. SOVA.
The C-ADMM, as presented in Algorithm 6, requires each (0)
Initialization: k ! 0, x i ! R n i, y i = 0
(0)

robot to optimize over a local copy of the global decision vari- (k)
Internal variables: P i = y i
(k)

able x. However, many robotic problems have a fundamental Communicated variables: Q i = x i


(k) (k)

(k)
structure that makes maintaining global knowledge at every Parameters: i R = t
individual robot unnecessary: each robot’s data relate only to do in parallel 6i ! V
a subset of the global optimization variables, and each agent
= argmin ) fi (x i) + x i< y i g
(k + 1) (k)
requires only a subset of the optimization variable for its role. xi
xi ! Xi
   
U ij x i - 1 ` U ij x i + U ji x j j 3
For instance, in distributed SLAM, a memory-efficient solution (k) 2
+t / 2
(k)

would require a robot to optimize only over its local map and j ! Ni
2

communicate with other robots only messages of shared inter-   Communicate Q to all j ! N i
(k)
i
est. Other examples arise in distributed environmental monitor- (k)
  Receive Q j from all j ! N i
ing by multiple robots [97]. The SOVA method [98] leverages
/ U <ij ` U ij x i - U ji x j j
(k + 1) (k) (k) (k)
the separability of the optimization variable to achieve orders-     y i = yi + t
j ! Ni
of-magnitude improvement in convergence rates, computation,
  k ! k + 1
and communication complexity over C-ADMM methods. The while stopping criterion is not satisfied
general approach of SOVA can also be found in partitioning-

Authorized licensed use limited to: Stanford University. Downloaded on September 13,2024SEPTEMBER
at 19:03:42 2024 IEEEIEEE
UTC from ROBOTICS & AUTOMATION
Xplore. Restrictions MAGAZINE
apply. 163
DISTRIBUTED OPTIMIZATION IN ROBOTICS falls of other synchronous distributed training frameworks,
AND RELATED APPLICATIONS which face notable challenges in problems with communi-
In this section, we discuss some existing applications of dis- cation deadlocks [131]. Many of these algorithms empha-
tributed optimization to robotics problems. To simplify the size parallelization.
presentation, we highlight a number of these applications in In addition, distributed sequential convex programming
the following notable problems in robotics: synchronization, algorithms have been developed for a number of learning prob-
localization, mapping, and target tracking; online and deep lems where data are distributed, including semisupervised sup-
learning problems; and task assignment, planning, and con- port vector machines [132], neural network training [133], and
trol. We refer the reader to the first article in this two-part clustering [134]. Moreover, the ADMM has been applied to
series [1] for a case study on multidrone target tracking, which online problems, such as estimation and surveillance problems
compares solutions using several different distributed optimi- involving wireless sensor networks [135], [136]. The ADMM
zation algorithms. has also be applied to distributed deep learning in robot net-
works in [137].
SYNCHRONIZATION, LOCALIZATION,
MAPPING, AND TRACKING TASK ASSIGNMENT, PLANNING, AND CONTROL
Distributed optimization algorithms have found notable Distributed optimization has been applied to task assignment
applications in robot localization from relative measure- problems posed as optimization problems. Some works [138]
ments [102], [103], including in networks with asynchronous employ distributed optimization using a distributed simplex
communication [104]. More generally, DFO algorithms have method [139] to obtain an optimal assignment of the robots to
been applied to optimization problems on manifolds, includ- a desired target formation. Other works employ the
ing SE(3) localization [105], [106], [107], [108], synchroniza- C-ADMM for distributed task assignment [140], [141]. Fur-
tion problems [109], and formation control in SO(3) [110], ther applications of distributed optimization arise in motion
[111]. In pose graph optimization, distributed optimization planning [142], trajectory tracking problems involving teams
has been employed through majorization–minimization of robots using nonlinear MPC [143], and collaborative
schemes, which minimize an upper bound of the objective manipulation [144], [145], which employs fully distributed
function [112], using gradient descent on Riemannian mani- variants of the ADMM. One feature common to these prob-
folds [113], [114], and block coordinate descent [115]. Other lems is that the joint decision variables, which consist of con-
pose graph optimization methods have utilized distributed trol inputs or action variables concatenated over all the
sequential programming algorithms using a quadratic robots, can often be partitioned so that each robot needs to
approximation model of the nonconvex objective function, consider only its own actions, as in [98], [99], [100], and
with Gauss–Seidel updates to enable distributed local com- [101]. This can lead to significantly faster convergence com-
putations among the robots [116]. Further, the ADMM has pared methods in which each agent has a complete copy of
been employed in bundle adjustment and pose graph optimi- the joint decision variables, as discussed at the end of the
zation problems, which involve the recovery of the 3D posi- “ADMM” section.
tions and orientations of a map and camera [117], [118],
[119]. However, many of these algorithms require a central RESEARCH OPPORTUNITIES IN DISTRIBUTED
node for the dual variable updates, making them semidis- OPTIMIZATION FOR MULTI-ROBOT SYSTEMS
tributed. Nonetheless, a few fully distributed ADMM-based In this section, we highlight challenges in the application of
algorithms exist for bundle adjustment and cooperative existing distributed optimization algorithms to multi-robot
localization problems [120], [121]. Other applications of dis- problems, each of which represents a promising direction for
tributed optimization arise in target tracking [122], signal future research.
estimation [19], and parameter estimation in global naviga-
tion satellite systems [123]. NONCONVEX AND CONSTRAINED
ROBOTICS PROBLEMS
ONLINE AND DEEP LEARNING PROBLEMS Distributed optimization methods have primarily focused on
Distributed optimization has also been applied in online solving unconstrained convex optimization problems, which
dynamic problems. In these problems, each robot gains constitute a limited subset of robotics problems. Many robot-
knowledge of its time-varying objective function in an ics problems involve nonconvex objectives or constraints. For
online fashion after taking an action or decision. A number example, problems in multi-robot motion planning, SLAM,
of DFO algorithms have been designed for these problems learning, distributed manipulation, and target tracking are
[124], [125], [126]. Similarly, DDA has been adapted for often nonconvex and/or constrained.
online scenarios with both static communication graphs Both DFO methods and C-ADMM methods can be modi-
[127], [128] and time-varying communication topology fied for nonconvex and constrained problems; however, few
[129], [130]. The push-sum variant of dual averaging has examples of practical algorithms or rigorous analyses of
also been used for distributed training of deep learning performance for such modified algorithms exist in the lit-
algorithms and has been shown to be useful in avoiding pit- erature. One way to implement the C-ADMM for nonconvex

164 IEEEAuthorized
ROBOTICS licensed
& AUTOMATION MAGAZINE
use limited SEPTEMBER
to: Stanford 2024
University. Downloaded on September 13,2024 at 19:03:42 UTC from IEEE Xplore. Restrictions apply.
problems is to solve each primal update step as a nonconvex n­etworks under the condition of bounded connectivity, in
­optimization (e.g., through a quasi-Newton method or interior general, distributed ADMM algorithms are not amenable to
point method). Another option is to perform successive qua- problems with dynamic communication networks. This is an
dratic approximations in an outer loop and use the C-ADMM interesting avenue for future research.
to solve each resulting quadratic problem in an inner loop. The
tradeoff between these two options has not yet been explored LIMITED COMPUTATION RESOURCES
in the literature, especially in the context of nonconvex prob- Another valuable direction for future research is in develop-
lems in robotics. ing algorithms specifically for computationally limited robot-
ic platforms, in which the timeliness of the solution is as
BANDWIDTH-CONSTRAINED, LOSSY, important as the solution quality [157], [158]. In general,
OR DYNAMIC COMMUNICATION many distributed optimization methods involve computation-
In many robotics problems, each robot exchanges informa- ally challenging procedures that require significant computa-
tion with its neighbors over a communication network with tional power, especially distributed methods for constrained
a limited communication bandwidth, which effectively lim- problems [90], [91], [92]. These methods ignore the signifi-
its the size of the message packets that can be transmitted cance of computation time, assuming that agents have access
among robots. Moreover, in practical situations, the com- to significant computational power. These assumptions often
munication links among robots sometimes fail, resulting in do not hold in robotics problems. Typically, robotics prob-
packet losses. However, many distributed optimization lems unfold over successive time periods, with an associated
methods do not consider communication among agents as optimization phase at each step of the problem. Thus, agents
an expensive unreliable resource, given that many of these must compute their solutions fast enough to proceed with
methods were developed for problems with reliable com- computing a reasonable solution for the next problem, which
munication infrastructure (e.g., multicore computing or requires efficient distributed optimization methods. Develop-
computing in a hardwired cluster). Information quantiza- ing such algorithms specifically for multi-robot systems is an
tion has been extensively employed in many disciplines to interesting topic for future work.
allow for the efficient exchange of information over band-
width-constrained networks. Quantization involves encod- COORDINATION AND SYNCHRONIZATION
ing the data to be transmitted into a format that utilizes a Many distributed optimization algorithms implicitly assume
lower number of bits, often resulting in lower precision. coordination in several aspects of implementation. First,
Transmission of the encoded data incurs a lower communi- while most algorithms accommodate an arbitrary initializa-
cation overhead, enabling each robot to communicate with tion of the initial solution of each robot (at least in convex
its neighbors within the bandwidth constraints. A few dis- problems), they often place stringent requirements on the
tributed optimization algorithms have been designed for initialization of the algorithms’ parameters. For instance,
these problems, including quantized DFO algorithms. Some DFO methods assume a common step-size across all robots
of these algorithms assume that all robots can communi- and in some cases a scheduled decrease in that step-size
cate with a central node [146], [147], making them unsuit- [14], [44], [46]. Similarly, DFO algorithms and distributed
able for a variety of robotics of problems, while others do sequential convex programming algorithms require the
not make this assumption [148], [149], [150], [151]. In addi- specification of a stochastic matrix, which must be compati-
tion, quantized distributed variants of the ADMM also ble with the underlying communication network. However,
exist [21], [152], [153]. generating doubly stochastic matrices for directed commu-
Generally, quantization introduces error between each nication networks is nontrivial if each robot does not know
robot’s solution and the optimal solution. However, in some the global network topology [159]. The ADMM and its dis-
of these algorithms, the quantization error decays during the tributed variants require the selection of a common penalty
execution of the algorithms under certain assumptions on the parameter t.
quantizer and the quantization interval [148], [149]. However, Second, some DFO, distributed sequential programming,
quantization in distributed optimization algorithms generally and distributed ADMM algorithms require synchronous exe-
results in slower convergence rates, which poses a challenge cution (see Definition 8). If robots have variable computation
in robotics problems where a solution is required rapidly, such times and a synchronous distributed optimization algorithm
as MPC problems, highlighting the need for the development is being used, one solution is to implement a distributed bar-
of more effective algorithms. Further, only a few distributed rier scheme, where each robot waits until all its neighbors
optimization algorithms consider problems with lossy com- have computed and communicated their most recent update
munication networks [154], [155], [156]. before proceeding. However, barrier schemes can lead to sig-
In many practical situations, the communication network nificantly increased time to convergence, as some robots idle
among robots changes as robots move, giving rise to a time- while waiting for their neighbors. To address this issue, a
varying communication graph. While many DFO optimization number of asynchronous distributed optimization algorithms
algorithms [46] and some distributed sequential program- have been developed [47], [81], [83], [121], [160], which allow
ming algorithms [16], [84] tolerate dynamic communication each robot to perform its local updates asynchronously,

Authorized licensed use limited to: Stanford University. Downloaded on September 13,2024SEPTEMBER
at 19:03:42 2024 IEEEIEEE
UTC from ROBOTICS & AUTOMATION
Xplore. Restrictions MAGAZINE
apply. 165
e­ liminating the need for synchronization. These asynchro- Mac Schwager, Department of Aeronautics and
nous variants are guaranteed to converge to an optimal solu- Astronautics, Stanford University, Stanford, CA 94305 USA.
tion, provided that an integer T ! Z exists such that each E-mail: [email protected].
robot performs at least one iteration of the algorithm over
T time steps. REFERENCES
[1] O. Shorinwa, T. Halsted, J. Yu, and M. Schwager, “Distributed optimization
methods for multi-robot systems: Part I—A tutorial,” presented at the Amer.
HARDWARE IMPLEMENTATION Control Conf. (ACC), 2023, pp. 1–8. [Online]. Available: https://msl.stanford.edu/
Finally, we believe there is a gap between the analysis in the papers/shorinwa_distributed_2023.pdf
distributed optimization literature and the applicability of [2] I. Prodan, F. Stoican, S. Olaru, C. Stoica, and S.-I. Niculescu, “Mixed-integer
programming techniques in distributed MPC problems,” in Distributed Model
these distributed optimization algorithms to hardware imple- Predictive Control Made Easy, J. Maestre and R. Negenborn, Eds., Dordrecht,
mentations [26], [27], [29]. The suitability of algorithms to The Netherlands: Springer-Verlag, 2014, pp. 275–291.
[3] A. Murray, A. Engelmann, V. Hagenmeyer, and T. Faulwasser, “Hierarchical
run efficiently and robustly on robots has still not be thor- distributed mixed-integer optimization for reactive power dispatch,” IFAC-
oughly proved. We provide empirical results of a hardware PapersOnLine, vol. 51, no. 28, pp. 368–373, 2018, doi: 10.1016/j.ifacol.2018.11.730.
implementation of C-ADMM over XBee radios in the first [4] A. Testa, A. Rucco, and G. Notarstefano, “Distributed mixed-integer linear
programming via cut generation and constraint exchange,” IEEE Trans. Autom.
article in this series [1]. While this survey considers adapting Control, vol. 65, no. 4, pp. 1456–1467, Apr. 2020, doi: 10.1109/TAC.2019.2920812.
existing distributed optimization algorithms for robotic [5] S. Liu, P.-Y. Chen, B. Kailkhura, G. Zhang, A. O. Hero III, and P. K. Varshney,
implementations, it could also be useful to consider the code- “A primer on zeroth-order optimization in signal processing and machine learn-
ing: Principals, recent advances, and applications,” IEEE Signal Process. Mag.,
sign of general-purpose distributed optimization algorithms vol. 37, no. 5, pp. 43–54, Sep. 2020, doi: 10.1109/MSP.2020.3003837.
with practical hardware setups. [6] D. Hajinezhad, M. Hong, and A. Garcia, “Zeroth order nonconvex multi-agent
optimization over networks,” 2017, arXiv:1710.09997.
[7] D. Hajinezhad, M. Hong, and A. Garcia, “ZONE: Zeroth-order nonconvex
CONCLUSION multiagent optimization over networks,” IEEE Trans. Autom. Control, vol. 64, no.
Despite the amenability of many robotics problems to distrib- 10, pp. 3995–4010, Oct. 2019, doi: 10.1109/TAC.2019.2896025.
uted optimization, few applications of distributed optimiza- [8] D. Hajinezhad and M. Hong, “Perturbed proximal primal–dual algorithm for
nonconvex nonsmooth optimization,” Math. Program., vol. 176, nos. 1–2, pp.
tion to multi-robot problems exist. In this work, we have 207–245, 2019, doi: 10.1007/s10107-019-01365-4.
categorized distributed optimization methods into three [9] A. Beznosikov, E. Gorbunov, and A. Gasnikov, “Derivative-free method for
broad classes—distributed first-order methods, distributed composite optimization with applications to decentralized distributed optimiza-
tion,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 4038–4043, 2020, doi: 10.1016/j.ifa-
sequential convex programming methods, and the ADMM— col.2020.12.2272.
highlighting the distinct mathematical techniques employed [10] Y. Tang, J. Zhang, and N. Li, “Distributed zero-order algorithms for noncon-
by these algorithms. Further, we have identified a number of vex multiagent optimization,” IEEE Trans. Control Netw. Syst., vol. 8, no. 1, pp.
269–281, Mar. 2021, doi: 10.1109/TCNS.2020.3024321.
important open challenges in distributed optimization for [11] J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distributed asynchronous deter-
robotics, which could be interesting areas for future research. ministic and stochastic gradient optimization algorithms,” IEEE Trans. Autom.
Control, vol. 31, no. 9, pp. 803–812, Sep. 1986, doi: 10.1109/TAC.1986.1104412.
In general, the opportunities for research in distributed opti-
[12] F. Saadatniaki, R. Xin, and U. A. Khan, “Optimization over time-varying direct-
mization for multi-robot systems are plentiful. Distributed ed graphs with row and column-stochastic matrices,” 2018, arXiv:1810.07393.
optimization provides an appealing unifying framework from [13] R. Xin and U. A. Khan, “A linear algorithm for optimization over directed
which to synthesize solutions for a large variety of problems graphs with geometric convergence,” IEEE Control Syst. Lett., vol. 2, no. 3, pp.
315–320, Jul. 2018, doi: 10.1109/LCSYS.2018.2834316.
in multi-robot systems.
[14] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent
optimization,” IEEE Trans. Autom. Control, vol. 54, no. 1, pp. 48–61, Jan. 2009,
ACKNOWLEDGMENT doi: 10.1109/TAC.2008.2009515.

This project was funded in part by National Science Founda- [15] A. Mokhtari, Q. Ling, and A. Ribeiro, “Network newton,” in Proc. 48th
Asilomar Conf. Signals, Syst. Comput., 2014, pp. 1621–1625, doi: 10.1109/
tion (NSF) National Robotics Initiative Awards 1830402 and ACSSC.2014.7094740.
1925030. Trevor Halsted was supported by a National [16] P. Di Lorenzo and G. Scutari, “NEXT: In-network nonconvex optimization,”
Defense Science and Engineering Graduate Fellowship, and IEEE Trans. Signal Inf. Process. Netw., vol. 2, no. 2, pp. 120–136, Jun. 2016, doi:
10.1109/TSIPN.2016.2524588.
Javier Yu was supported by an NSF Graduate Research Fel- [17] R. T. Rockafellar, “Monotone operators and the proximal point algorithm,”
lowship. Ola Shorinwa, Trevor Halsted, and Javier Yu con- SIAM J. Control Optim., vol. 14, no. 5, pp. 877–898, 1976, doi: 10.1137/0314056.
tributed equally to this work. [18] S. Boyd et al., “Distributed optimization and statistical learning via the alter-
nating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no.
1, pp. 1–122, 2011, doi: 10.1561/2200000016.
AUTHORS [19] G. Mateos, J. A. Bazerque, and G. B. Giannakis, “Distributed sparse linear
Ola Shorinwa, Department of Mechanical Engineering, regression,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5262–5276, Oct.
2010, doi: 10.1109/TSP.2010.2055862.
Stanford University, Stanford, CA 94305 USA. E-mail:
[20] V. Khatana and M. V. Salapaka, “DC-DistADMM: ADMM algorithm for
[email protected]. cont ra i ne d d ist r ibute d opt i m izat ion over d i re cte d g raphs,” 2020,
Trevor Halsted, Department of Mechanical Engineering, arXiv:2003.13742.

Stanford University, Stanford, CA 94305 USA. E-mail: halsted@ [21] S. Zhu, M. Hong, and B. Chen, “Quantized consensus ADMM for multi-
agent distributed optimization,” in Proc. IEEE Int. Conf. Acoust., Speech
stanford.edu. S ig n a l P ro c e s s . (I C A S S P), 2 016 , p p. 413 4 – 4138 , d o i: 10.110 9/
Javier Yu, Department of Aeronautics and Astronautics, ICASSP.2016.7472455.
[22] D. K. Molzahn et al., “A survey of distributed optimization and control algo-
Stanford University, Stanford, CA 94305 USA. E-mail: rithms for electric power systems,” IEEE Trans. Smart Grid, vol. 8, no. 6, pp.
[email protected]. 2941–2962, Nov. 2017, doi: 10.1109/TSG.2017.2720471.

166 IEEEAuthorized
ROBOTICS licensed
& AUTOMATION MAGAZINE
use limited SEPTEMBER
to: Stanford 2024
University. Downloaded on September 13,2024 at 19:03:42 UTC from IEEE Xplore. Restrictions apply.
[23] G. Scutari and Y. Sun, “Parallel and distributed successive convex approxi- Signal Process., vol. 67, no. 17, pp. 4494–4506, Sep. 2019, doi: 10.1109/
mation methods for big-data optimization,” in Multi-Agent Optimization, TSP.2019.2926022.
F. Facchinei and J. S. Pang, Eds., Cham, Switzerland: Springer-Verlag, 2018, [49] K. Yuan, B. Ying, X. Zhao, and A. H. Sayed, “Exact diffusion for distrib-
pp. 141–308. uted optimization and learning—Part I: Algorithm development,” IEEE Trans.
[24] B. Yang and M. Johansson, “Distributed optimization and games: A tutorial Signal Process., vol. 67, no. 3, pp. 708–723, Feb. 2019, doi: 10.1109/TSP.
overview,” Networked Control Systems, A. Bemporad, M. Heemels, and M. 2018.2875898.
Johansson, Eds., London, U.K.: Springer-Verlag, pp. 109–148, 2010. [50] K. Yuan, B. Ying, X. Zhao, and A. H. Sayed, “Exact diffusion for distribut-
[25] A. Nedić and J. Liu, “Distributed optimization for control,” Annu. Rev. ed optimization and learning—Part II: Convergence analysis,” IEEE Trans.
Control, Robot., Auton. Syst., vol. 1, no. 1, pp. 77–103, 2018, doi: 10.1146/ Signal Process., vol. 67, no. 3, pp. 724–739, Feb. 2019, doi: 10.1109/TSP.
annurev-control-060117-105131. 2018.2875883.
[26] A. Nedić, A. Olshevsky, and M. G. Rabbat, “Network topology and commu- [51] G. Qu and N. Li, “Harnessing smoothness to accelerate distributed optimiza-
nication-computation tradeoffs in decentralized optimization,” Proc. IEEE, tion,” IEEE Trans. Control Netw. Syst., vol. 5, no. 3, pp. 1245–1260, Sep. 2018,
vol. 106, no. 5, pp. 953–976, May 2018, doi: 10.1109/JPROC.2018.2817461. doi: 10.1109/TCNS.2017.2698261.
[27] A. Nedic´, “Distributed optimization over networks,” in Multi-Agent [52] R. Xin and U. A. Khan, “Distributed heavy-ball: A generalization and acceler-
Optimization, F. Facchinei and J. S. Pang, Eds., Cham, Switzerland: Springer- ation of first-order methods with gradient tracking,” IEEE Trans. Autom. Control,
Verlag, 2018, pp. 1–84. vol. 65, no. 6, pp. 2627–2633, Jun. 2019, doi: 10.1109/TAC.2019.2942513.
[28] T.-H. Chang, M. Hong, H.-T. Wai, X. Zhang, and S. Lu, “Distributed learning [53] C. Xi, R. Xin, and U. A. Khan, “ADD-OPT: Accelerated distributed directed
in the nonconvex world: From batch data to streaming and beyond,” IEEE Signal optimization,” IEEE Trans. Autom. Control, vol. 63, no. 5, pp. 1329–1339, May
Process. Mag., vol. 37, no. 3, pp. 26–38, May 2020, doi: 10.1109/MSP. 2018, doi: 10.1109/TAC.2017.2737582.
2020.2970170. [54] C. Xi, V. S. Mai, R. Xin, E. H. Abed, and U. A. Khan, “Linear convergence
[29] T. Yang et al., “A survey of distributed optimization,” Annu. Rev. Control, in optimization over directed graphs with row-stochastic matrices,” IEEE Trans.
vol. 47, pp. 278–305, Jun. 2019, doi: 10.1016/j.arcontrol.2019.05.006. Autom. Control, vol. 63, no. 10, pp. 3558–3565, Oct. 2018, doi: 10.1109/
TAC.2018.2797164.
[30] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation:
Numerical Methods, vol. 23. Englewood Cliffs, NJ, USA: Prentice-Hall 1989. [55] J. Zeng and W. Yin, “ExtraPush for convex smooth decentralized optimiza-
tion over directed networks,” J. Comput. Math., vol. 35, no. 4, pp. 383–396, 2017,
[31] N. A. Lynch, Distributed Algorithms. New York, NY, USA: Elsevier, 1996.
doi: 10.4208/jcm.1606-m2015-0452.
[32] F. Bullo, J. Cortés, and S. Martínez, Distributed Control of Robotic Networks [56] A. Sundararajan, B. Van Scoy, and L. Lessard, “A canonical form for first-
(Applied Mathematics Series). Princeton, NJ, USA: Princeton Univ. Press, 2009. order distributed optimization algorithms,” in Proc. Amer. Control Conf.,
[Online]. Available: http://coordinationbook.info Piscataway, NJ, USA: IEEE Press, 2019, pp. 4075–4080, doi: 10.23919/
[33] M. Mesbahi and M. Egerstedt, Graph Theoretic Methods in Multiagent ACC.2019.8814838.
Networks, vol. 33. Princeton, NJ, USA: Princeton Univ. Press, 2010. [57] A. Sundararajan, B. Van Scoy, and L. Lessard, “Analysis and design of first-
[34] V. S. Mai and E. H. Abed, “Distributed optimization over directed graphs order distributed optimization algorithms over time-varying graphs,” IEEE Trans.
with row stochasticity and constraint regularity,” Automatica, vol. 102, pp. Control Netw. Syst., vol. 7, no. 4, pp. 1597–1608, Dec. 2020, doi: 10.1109/
94–104, Apr. 2019, doi: 10.1016/j.automatica.2018.07.020. TCNS.2020.2988009.
[35] J. N. Tsitsiklis, “Problems in decentralized decision making and computa- [58] D. Jakovetić, “A unification and generalization of exact distributed first-order
tion,” Cambridge Lab for Information and Decision Systems, Massachusetts Inst. methods,” IEEE Trans. Signal Inf. Process. Netw., vol. 5, no. 1, pp. 31–46, Mar.
Technol., Cambridge, MA, USA, Tech. Rep., 1984. [Online]. Available: https:// 2019, doi: 10.1109/TSIPN.2018.2846183.
www.mit.edu/~jnt/Papers/PhD-84-jnt.pdf [59] G. Qu and N. Li, “Accelerated distributed Nesterov gradient descent,” IEEE
[36] I. Lobel and A. Ozdaglar, “Distributed subgradient methods for convex opti- Trans. Autom. Control, vol. 65, no. 6, pp. 2566–2581, Jun. 2020, doi: 10.1109/
mization over random networks,” IEEE Trans. Autom. Control, vol. 56, no. 6, pp. TAC.2019.2937496.
1291–1306, Jun. 2011, doi: 10.1109/TAC.2010.2091295. [60] R. Xin, D. Jakovetić, and U. A. Khan, “Distributed Nesterov gradient meth-
[37] A. Olshevsky and J. N. Tsitsiklis, “Convergence speed in distributed consen- ods over arbitrary graphs,” IEEE Signal Process. Lett., vol. 26, no. 8, pp. 1247–
sus and averaging,” SIAM J. Control Optim., vol. 48, no. 1, pp. 33–55, 2009, doi: 1251, Aug. 2019, doi: 10.1109/LSP.2019.2925537.
10.1137/060678324. [61] Q. Lü, X. Liao, H. Li, and T. Huang, “A Nesterov-like gradient tracking algo-
[38] A. Olshevsky, I. C. Paschalidis, and A. Spiridonoff, “Robust asynchronous rithm for distributed optimization over directed networks,” IEEE Trans. Syst.,
stochastic gradient-push: Asymptotically optimal and network-independent per- Man, Cybern. Syst., vol. 51, no. 10, pp. 6258–6270, Oct. 2021, doi: 10.1109/
formance for strongly convex functions,” 2018, arXiv:1811.03982. TSMC.2019.2960770.
[39] A. Nedić and A. Olshevsky, “Distributed optimization over time-varying [62] F. Mansoori and E. Wei, “A general framework of exact primal-dual first-
directed graphs,” IEEE Trans. Autom. Control, vol. 60, no. 3, pp. 601–615, Mar. order algorithms for distributed optimization,” in Proc. IEEE 58th Conf. Decis.
2015, doi: 10.1109/TAC.2014.2364096. Control (CDC), Piscataway, NJ, USA: IEEE Press, 2019, pp. 6386–6391, doi:
10.1109/CDC40024.2019.9029902.
[40] F. Bénézit, V. Blondel, P. Thiran, J. Tsitsiklis, and M. Vetterli, “Weighted
gossip: Distributed averaging using non-doubly stochastic matrices,” in Proc. [63] T. Tatarenko and B. Touri, “Non-convex distributed optimization,” IEEE
IEEE Int. Symp. Inf. Theor y, 2010, pp. 1753–1757, doi: 10.1109/ISIT. Trans. Autom. Control, vol. 62, no. 8, pp. 3744–3757, Aug. 2017, doi: 10.1109/
2010.5513273. TAC.2017.2648041.
[41] D. Kempe, A. Dobra, and J. Gehrke, “Gossip-based computation of aggregate [64] S. S. Ram, A. Nedić, and V. V. Veeravalli, “Distributed stochastic subgradi-
information,” in Proc. 44th Annu. IEEE Symp. Found. Comput. Sci., 2003, pp. ent projection algorithms for convex optimization,” J. Optim. Theory Appl., vol.
482–491, doi: 10.1109/SFCS.2003.1238221. 147, no. 3, pp. 516–545, 2010, doi: 10.1007/s10957-010-9737-7.
[42] K. Yuan, Q. Ling, and W. Yin, “On the convergence of decentralized gradient [65] P. Bianchi and J. Jakubowicz, “Convergence of a multi-agent projected sto-
descent,” SIAM J. Optim ., vol. 26, no. 3, pp. 1835–1854, 2016, doi: chastic gradient algorithm for non-convex optimization,” IEEE Trans. Autom.
10.1137/130943170. Control, vol. 58, no. 2, pp. 391–405, Feb. 2013, doi: 10.1109/TAC.2012.
2209984.
[43] D. Jakovetić, J. Xavier, and J. M. Moura, “Fast distributed gradient methods,”
IEEE Trans. Autom. Control, vol. 59, no. 5, pp. 1131–1146, May 2014, doi: [66] B. Johansson, M. Rabi, and M. Johansson, “A randomized incremental sub-
10.1109/TAC.2014.2298712. gradient method for distributed optimization in networked systems,” SIAM J.
Optim., vol. 20, no. 3, pp. 1157–1170, 2010, doi: 10.1137/08073038X.
[44] W. Shi, Q. Ling, G. Wu, and W. Yin, “EXTRA: An exact first-order algo-
rithm for decentralized consensus optimization,” SIAM J. Optim., vol. 25, no. 2, [67] M. Maros and J. Jaldén, “PANDA: A dual linearly converging method for
pp. 944–966, 2015, doi: 10.1137/14096668X. distributed optimization over time-varying undirected graphs,” in Proc. IEEE
Conf. Decis. Control (CDC), Piscataway, NJ, USA: IEEE Press, 2018, pp. 6520–
[45] A. Daneshmand, G. Scutari, and V. Kungurtsev, “Second-order guarantees of 6525, doi: 10.1109/CDC.2018.8619626.
distributed gradient algorithms,” 2018, arXiv:1809.08694.
[68] M. Maros and J. Jaldén, “A geometrically converging dual method for distrib-
[46] A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergence for uted optimization over time-varying graphs,” IEEE Trans. Autom. Control, vol.
distributed optimization over time-varying graphs,” SIAM J. Optim., vol. 27, no. 66, no. 6, pp. 2465–2479, Jun. 2021, doi: 10.1109/TAC.2020.3018743.
4, pp. 2597–2633, 2017, doi: 10.1137/16M1084316.
[69] M. Maros and J. Jaldén, “Eco-PANDA: A computationally economic, geo-
[47] S. Zheng et al., “Asynchronous stochastic gradient descent with delay com- metrically converging dual optimization method on time-varying undirected
pensation,” in Proc. Int. Conf. Mach. Learn., PMLR, 2017, pp. 4120–4129. graphs,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
[48] Z. Li, W. Shi, and M. Yan, “A decentralized proximal-gradient method with Piscataway, NJ, USA: IEEE Press, 2019, pp. 5257–5261, doi: 10.1109/
network independent step-sizes and separated convergence rates,” IEEE Trans. ICASSP.2019.8683797.

Authorized licensed use limited to: Stanford University. Downloaded on September 13,2024SEPTEMBER
at 19:03:42 2024 IEEEIEEE
UTC from ROBOTICS & AUTOMATION
Xplore. Restrictions MAGAZINE
apply. 167
[70] K. Seaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulié, “Optimal algo- Signal Process., vol. 64, no. 19, pp. 5158–5173, Oct. 2016, doi: 10.1109/
rithms for smooth and strongly convex distributed optimization in networks,” in TSP.2016.2548989.
Proc. 34th Int. Conf. Mach. Learn., vol. 70, JMLR, 2017, pp. 3027–3036. [94] A. Teixeira, E. Ghadimi, I. Shames, H. Sandberg, and M. Johansson, “The
[71] G. Lan, S. Lee, and Y. Zhou, “Communication-efficient algorithms for ADMM algorithm for distributed quadratic problems: Parameter selection and
decentralized and stochastic optimization,” Math. Program., vol. 180, nos. 1–2, constraint preconditioning,” IEEE Trans. Signal Process., vol. 64, no. 2, pp. 290–
pp. 237–284, 2020, doi: 10.1007/s10107-018-1355-4. 305, Jan. 2015, doi: 10.1109/TSP.2015.2480041.
[72] X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, “Can decen- [95] D. Meng, M. Fazel, and M. Mesbahi, “Proximal alternating direction method
tralized algorithms outperform centralized algorithms? A case study for decen- of multipliers for distributed optimization on weighted graphs,” in Proc. 54th
tralized parallel stochastic gradient descent,” in Proc. 31st Int. Conf. Neural Inf. IEEE Conf. Decis. Control (CDC), Piscataway, NJ, USA: IEEE Press, 2015, pp.
Process. Syst., 2017, pp. 5336–5346. 1396–1401, doi: 10.1109/CDC.2015.7402406.
[73] Y. Nesterov, “Primal-dual subgradient methods for convex problems,” Math. [96] A. Makhdoumi and A. Ozdaglar, “Convergence rate of distributed ADMM
Program., vol. 120, no. 1, pp. 221–259, 2009, doi: 10.1007/s10107-007-0149-x. over networks,” IEEE Trans. Autom. Control, vol. 62, no. 10, pp. 5082–5095, Oct.
[74] L. Xiao, “Dual averaging methods for regularized stochastic learning and 2017, doi: 10.1109/TAC.2017.2677879.
online optimization,” J. Mach. Learn. Res., vol. 11, pp. 2543–2596, Oct. 2010. [97] M. L. Elwin, R. A. Freeman, and K. M. Lynch, “Distributed environmental
monitoring with finite element robots,” IEEE Trans. Robot., vol. 36, no. 2, pp.
[75] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for distributed
380–398, Apr. 2020, doi: 10.1109/TRO.2019.2936747.
optimization: Convergence analysis and network scaling,” IEEE Trans. Autom.
Control, vol. 57, no. 3, pp. 592–606, Mar. 2012, doi: 10.1109/TAC.2011.2161027. [98] O. Shorinwa, T. Halsted, and M. Schwager, “Scalable distributed optimiza-
tion with separable variables in multi-agent networks,” in Proc. Amer. Control
[76] K. I. Tsianos and M. G. Rabbat, “Distributed consensus and optimization
Conf. (ACC), Piscataway, NJ, USA: IEEE Press, 2020, pp. 3619–3626, doi:
under communication delays,” in Proc. 49th Annu. Allerton Conf. Commun.,
10.23919/ACC45564.2020.9147590.
Control, Comput. (Allerton), Piscataway, NJ, USA: IEEE Press, 2011, pp. 974–
982, doi: 10.1109/Allerton.2011.6120272. [99] T. Erseghe, “A distributed and scalable processing method based upon
ADMM,” IEEE Signal Process. Lett., vol. 19, no. 9, pp. 563–566, Sep. 2012, doi:
[77] K. I. Tsianos, S. Lawlor, and M. G. Rabbat, “Push-sum distributed dual
10.1109/LSP.2012.2207719.
averaging for convex optimization,” in Proc. IEEE 51st IEEE Conf. Decis.
Control (CDC), Piscataway, NJ, USA: IEEE Press, 2012, pp. 5453–5458, doi: [100] N. Bastianello, R. Carli, L. Schenato, and M. Todescato, “A partition-
10.1109/CDC.2012.6426375. based implementation of the relaxed ADMM for distributed convex optimiza-
tion over lossy networks,” in Proc. IEEE Conf. Decis. Control (CDC),
[78] S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex Optimization. Cambridge,
Piscataway, NJ, USA: IEEE Press, 2018, pp. 3379–3384, doi: 10.1109/CDC.
U.K.: Cambridge Univ. Press, 2004.
2018.8619729.
[79] M. Zargham, A. Ribeiro, A. Ozdaglar, and A. Jadbabaie, “Accelerated dual
[101] M. Todescato, N. Bof, G. Cavraro, R. Carli, and L. Schenato, “Partition-based
descent for network flow optimization,” IEEE Trans. Autom. Control, vol. 59, no.
multi-agent optimization in the presence of lossy and asynchronous communica-
4, pp. 905–920, Apr. 2014, doi: 10.1109/TAC.2013.2293221.
tion,” Automatica, vol. 111, Jan. 2020, Art. no. 108648, doi: 10.1016/j.automatica.
[80] A. Mokhtari, W. Shi, Q. Ling, and A. Ribeiro, “A decentralized second-order 2019.108648.
method with exact linear convergence rate for consensus optimization,” IEEE
[102] V.-L. Dang, B.-S. Le, T.-T. Bui, H.-T. Huynh, and C.-K. Pham, “A decen-
Trans. Signal Inf. Process. Netw., vol. 2, no. 4, pp. 507–522, Dec. 2016, doi:
tralized localization scheme for swarm robotics based on coordinate geometry
10.1109/TSIPN.2016.2613678.
and distributed gradient descent,” MATEC Web Conf., vol. 54, Feb. 1–3, 2016,
[81] M. Eisen, A. Mokhtari, A. Ribeiro, and A. We, “Decentralized quasi-Newton Art. no. 02002, doi: 10.1051/matecconf/20165402002.
methods,” IEEE Trans. Signal Process., vol. 65, no. 10, pp. 2613–2628, May 2017,
[103] N. A. Alwan and A. S. Mahmood, “Distributed gradient descent localization
doi: 10.1109/TSP.2017.2666776.
in wireless sensor networks,” Arabian J. Sci. Eng., vol. 40, no. 3, pp. 893–899,
[82] M. Eisen, A. Mokhtari, and A. Ribeiro, “A primal-dual quasi-Newton meth- 2015, doi: 10.1007/s13369-014-1552-2.
od for exact consensus optimization,” IEEE Trans. Signal Process., vol. 67, no. 23, [104] M. Todescato, A. Carron, R. Carli, and L. Schenato, “Distributed localiza-
pp. 5983–5997, Dec. 2019, doi: 10.1109/TSP.2019.2951216. tion from relative noisy measurements: A robust gradient based approach,” in
[83] F. Mansoori and E. Wei, “A fast distributed asynchronous newton-based opti- Proc. Eur. Control Conf. (ECC), Piscataway, NJ, USA: IEEE Press, 2015,
mization algorithm,” IEEE Trans. Autom. Control, vol. 65, no. 7, pp. 2769–2784, pp. 1914–1919, doi: 10.1109/ECC.2015.7330818.
Jul. 2020, doi: 10.1109/TAC.2019.2933607. [105] R. Tron and R. Vidal, “Distributed image-based 3-d localization of camera
[84] Y. Sun and G. Scutari, “Distributed nonconvex optimization for sparse repre- sensor networks,” in Proc. 48h IEEE Conf. Decis. Control (CDC) Held Jointly
sentation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2009 28th Chin. Control Conf., Piscataway, NJ, USA: IEEE Press, 2009, pp. 901–
Piscataway, NJ, USA: IEEE Press, 2017, pp. 4044– 4048, doi: 10.1109/ 908, doi: 10.1109/CDC.2009.5400405.
ICASSP.2017.7952916. [106] R. Tron and R. Vidal, “Distributed computer vision algorithms,” IEEE Signal
[85] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Augmented distributed gradient meth- Process. Mag., vol. 28, no. 3, pp. 32–45, May 2011, doi: 10.1109/MSP.2011.940399.
ods for multi-agent optimization under uncoordinated constant stepsizes,” in [107] R. Tron, Distributed Optimization on Manifolds for Consensus Algorithms
Proc. 54th IEEE Conf. Decis. Control (CDC), Piscataway, NJ, USA: IEEE Press, and Camera Network Localization. Baltimore, MD, USA: The Johns Hopkins
2015, pp. 2055–2060, doi: 10.1109/CDC.2015.7402509. Univ. Press, 2012.
[86] B. Houska, J. Frasch, and M. Diehl, “An augmented Lagrangian based algo- [108] R. Tron and R. Vidal, “Distributed 3-D localization of camera sensor net-
rithm for distributed nonconvex optimization,” SIAM J. Optim., vol. 26, no. 2, pp. works from 2-D image measurements,” IEEE Trans. Autom. Control, vol. 59, no.
1101–1127, 2016, doi: 10.1137/140975991. 12, pp. 3325–3340, Dec. 2014, doi: 10.1109/TAC.2014.2351912.
[87] N. Chatzipanagiotis, D. Dentcheva, and M. M. Zavlanos, “An augmented [109] A. Sarlette and R. Sepulchre, “Consensus optimization on manifolds,”
Lagrangian method for distributed optimization,” Math. Program., vol. 152, nos. SIAM J. Control Optim., vol. 48, no. 1, pp. 56–76, 2009, doi: 10.1137/060673400.
1–2, pp. 405–434, 2015, doi: 10.1007/s10107-014-0808-7.
[110] K.-K. Oh and H.-S. Ahn, “Formation control and network localization via
[88] F. Iutzeler, P. Bianchi, P. Ciblat, and W. Hachem, “Asynchronous distributed orientation alignment,” IEEE Trans. Autom. Control, vol. 59, no. 2, pp. 540–545,
optimization using a randomized alternating direction method of multipliers,” in Feb. 2014, doi: 10.1109/TAC.2013.2272972.
Proc. 52nd IEEE Conf. Decis. Control., Piscataway, NJ, USA: IEEE Press, 2013,
[111] K.-K. Oh and H.-S. Ahn, “Distributed formation control based on orienta-
pp. 3671–3676, doi: 10.1109/CDC.2013.6760448.
tion alignment and position estimation,” Int. J. Control Automat. Syst., vol. 16, no.
[89] H. Terelius, U. Topcu, and R. M. Murray, “Decentralized multi-agent optimi- 3, pp. 1112–1119, Jun. 2018, doi: 10.1007/s12555-017-0280-2.
zation via dual decomposition,” IFAC Proc. Volumes, vol. 44, no. 1, pp. 11245–
[112] T. Fan and T. Murphey, “Majorization minimization methods for distributed pose
11251, 2011, doi: 10.3182/20110828-6-IT-1002.01959.
graph optimization with convergence guarantees,” in Proc. IEEE/RSJ Int. Conf. Intell.
[90] Q. Ling, W. Shi, G. Wu, and A. Ribeiro, “DLM: Decentralized linearized Robots Syst. (IROS), 2020, pp. 5058–5065, doi: 10.1109/IROS45743.2020.9341063.
alternating direction method of multipliers,” IEEE Trans. Signal Process., vol. 63,
[113] Y. Tian, A. Koppel, A. S. Bedi, and J. P. How, “Asynchronous and parallel
no. 15, pp. 4051–4064, Aug. 2015, doi: 10.1109/TSP.2015.2436358.
distributed pose graph optimization,” IEEE Robot. Autom. Lett., vol. 5, no. 4, pp.
[91] T.-H. Chang, M. Hong, and X. Wang, “Multi-agent distributed optimization 5819–5826, Oct. 2020, doi: 10.1109/LRA.2020.3010216.
via inexact consensus ADMM,” IEEE Trans. Signal Process., vol. 63, no. 2, pp.
[114] J. Knuth and P. Barooah, “Collaborative localization with heterogeneous
482–497, Jan. 2015, doi: 10.1109/TSP.2014.2367458.
inter-robot measurements by Riemannian optimization,” in Proc. IEEE Int. Conf.
[92] F. Farina, A. Garulli, A. Giannitrapani, and G. Notarstefano, “A distributed asyn- Robot. Automat., Piscataway, NJ, USA: IEEE Press, 2013, pp. 1534–1539, doi:
chronous method of multipliers for constrained nonconvex optimization,” 10.1109/ICRA.2013.6630774.
Automatica, vol. 103, pp. 243–253, May 2019, doi: 10.1016/j.automatica.2019.02.003. [115] Y. Tian, K. Khosoussi, and J. P. How, “Block-coordinate descent on the
[93] A. Mokhtari, W. Shi, Q. Ling, and A. Ribeiro, “DQM: Decentralized qua- Riemannian staircase for certifiably correct distributed rotation and pose syn-
dratically approximated alternating direction method of multipliers,” IEEE Trans. chronization,” 2019, arXiv:1911.03721.

168 IEEEAuthorized
ROBOTICS licensed
& AUTOMATION MAGAZINE
use limited SEPTEMBER
to: Stanford 2024
University. Downloaded on September 13,2024 at 19:03:42 UTC from IEEE Xplore. Restrictions apply.
[116] S. Choudhary, L. Carlone, C. Nieto, J. Rogers, H. I. Christensen, and F. [139] M. Bürger, G. Notarstefano, F. Bullo, and F. Allgöwer, “A distributed simplex
Dellaert, “Distributed mapping with privacy and communication constraints: algorithm for degenerate linear programs and multi-agent assignments,” Automatica,
Lightweight algorithms and object-based models,” Int. J. Robot. Res., vol. 36, no. vol. 48, no. 9, pp. 2298–2304, 2012, doi: 10.1016/j.automatica.2012.06.040.
12, pp. 1286–1311, 2017, doi: 10.1177/0278364917732640. [140] R. Haksar, O. Shorinwa, P. Washington, and M. Schwager, “Consensus-
[117] R. Zhang, S. Zhu, T. Fang, and L. Quan, “Distributed very large scale bun- based ADMM for task assignment in multi-robot teams,” in Proc. Int. Symp.
dle adjustment by global camera consensus,” in Proc. IEEE Int. Conf. Comput. Robot. Res., 2019, pp. 35–51, doi: 10.1007/978-3-030-95459-8_3.
Vis., 2017, pp. 29–38. [141] O. Shorinwa, R. N. Haksar, P. Washington, and M. Schwager, “Distributed
[118] A. Eriksson, J. Bastian, T.-J. Chin, and M. Isaksson, “A consensus-based multi-robot task assignment via consensus ADMM,” IEEE Trans. Robot., vol. 39,
framework for distributed bundle adjustment,” in Proc. IEEE Conf. Comput. Vis. no. 3, pp. 1781–1800, Jun. 2023, doi: 10.1109/TRO.2022.3228132.
Pattern Recognit., 2016, pp. 1754–1762. [142] J. Bento, N. Derbinsky, J. Alonso-Mora, and J. S. Yedidia, “A message-pass-
[119] S. Choudhary, L. Carlone, H. I. Christensen, and F. Dellaert, “Exactly sparse ing algorithm for multi-agent trajectory planning,” in Proc. Adv. Neural Inf.
memory efficient SLAM using the multi-block alternating direction method of Process. Syst., 2013, pp. 521–529.
multipliers,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Piscataway, [143] L. Ferranti, R. R. Negenborn, T. Keviczky, and J. Alonso-Mora,
NJ, USA: IEEE Press, 2015, pp. 1349–1356, doi: 10.1109/IROS.2015.7353543. “Coordination of multiple vessels via distributed nonlinear model predictive con-
[120] K. Natesan Ramamurthy, C.-C. Lin, A. Aravkin, S. Pankanti, and R. trol,” in Proc. Eur. Control Conf. (ECC), Piscataway, NJ, USA: IEEE Press, 2018,
Viguier, “Distributed bundle adjustment,” in Proc. IEEE Int. Conf. Comput. Vis. pp. 2523–2528, doi: 10.23919/ECC.2018.8550178.
Workshops, 2017, pp. 2146–2154.
[144] O. Shorinwa and M. Schwager, “Scalable collaborative manipulation with
[121] S. Kumar, R. Jain, and K. Rajawat, “Asynchronous optimization over het- distributed trajectory planning,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.
erogeneous networks via consensus ADMM,” IEEE Trans. Signal Inf. Process. (IROS), vol. 1, Piscataway, NJ, USA: IEEE Press, 2020, pp. 9108–9115, doi:
Netw., vol. 3, no. 1, pp. 114–129, Mar. 2017, doi: 10.1109/TSIPN.2016.2593896. 10.1109/IROS45743.2020.9340957.
[122] O. Shorinwa, J. Yu, T. Halsted, A. Koufos, and M. Schwager, “Distributed [145] O. Shorinwa and M. Schwager, “Distributed contact-implicit trajectory opti-
multi-target tracking for autonomous vehicle fleets,” in Proc. IEEE Int. Conf. mization for collaborative manipulation,” in Proc. Int. Symp. Multi-Robot Multi-
Robot. Automat. (ICRA), Piscataway, NJ, USA: IEEE Press, 2020, pp. 3495–3501, Agent Syst. (MRS), Piscataway, NJ, USA: IEEE Press, 2021, pp. 56–65, doi:
doi: 10.1109/ICRA40945.2020.9197241. 10.1109/MRS50823.2021.9620665.
[123] A. Khodabandeh and P. Teunissen, “Distributed least-squares estimation [146] F. Alimisis, P. Davies, and D. Alistarh, “Communication-efficient distribut-
applied to GNSS networks,” Meas. Sci. Technol., vol. 30, no. 4, 2019, Art. no. ed optimization with quantized preconditioners,” 2021, arXiv:2102.07214.
044005, doi: 10.1088/1361-6501/ab034e.
[147] Y. Yu, J. Wu, and L. Huang, “Double quantization for communication-effi-
[124] K. Lu, G. Jing, and L. Wang, “Online distributed optimization with strongly cient distributed optimization,” in Proc. Adv. Neural Inform. Process. Syst., 2019,
pseudoconvex-sum cost functions,” IEEE Trans. Autom. Control, vol. 65, no. 1, vol. 32, pp. 4438–4449.
pp. 426–433, Jan. 2020, doi: 10.1109/TAC.2019.2915745.
[148] Y. Pu, M. N. Zeilinger, and C. N. Jones, “Quantization design for distributed
[125] S. Shahrampour and A. Jadbabaie, “Distributed online optimization in optimization,” IEEE Trans. Autom. Control, vol. 62, no. 5, pp. 2107–2120, May
dynamic environments using mirror descent,” IEEE Trans. Autom. Control, vol. 2016, doi: 10.1109/TAC.2016.2600597.
63, no. 3, pp. 714–725, Mar. 2018, doi: 10.1109/TAC.2017.2743462.
[149] A. Reisizadeh, A. Mokhtari, H. Hassani, and R. Pedarsani, “An exact quan-
[126] Y. Zhang, R. J. Ravier, M. M. Zavlanos, and V. Tarokh, “A distributed tized decentralized gradient descent algorithm,” IEEE Trans. Signal Process., vol.
online convex optimization algorithm with improved dynamic regret,” in Proc. 67, no. 19, pp. 4934–4947, Oct. 2019, doi: 10.1109/TSP.2019.2932876.
IEEE 58th Conf. Decis. Control (CDC), Piscataway, NJ, USA: IEEE Press, 2019,
[150] C.-S. Lee, N. Michelusi, and G. Scutari, “Finite rate quantized distributed
pp. 2449–2454, doi: 10.1109/CDC40024.2019.9029474.
optimization with geometric convergence,” in Proc. 52nd Asilomar Conf. Signals,
[127] S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed optimiza- Syst., Comput., Piscataway, NJ, USA: IEEE Press, 2018, pp. 1876–1880, doi:
tion via dual averaging,” in Proc. 52nd IEEE Conf. Decis. Control, Piscataway, 10.1109/ACSSC.2018.8645345.
NJ, USA: IEEE Press, 2013, pp. 1484–1489, doi: 10.1109/CDC.2013.6760092.
[151] H. Li, S. Liu, Y. C. Soh, and L. Xie, “Event-triggered communication and
[128] S. Shahrampour and A. Jadbabaie, “Exponentially fast parameter estimation data rate constraint for distributed optimization of multiagent systems,” IEEE
in networks using distributed dual averaging,” in Proc. 52nd IEEE Conf. Decis. Trans. Syst., Man, Cybern. Syst., vol. 48, no. 11, pp. 1908–1919, Nov. 2017, doi:
Control, Piscataway, NJ, USA: IEEE Press, 2013, pp. 6196–6201, doi: 10.1109/ 10.1109/TSMC.2017.2694323.
CDC.2013.6760868.
[152] A. Elgabli, J. Park, A. S. Bedi, C. B. Issaid, M. Bennis, and V. Aggarwal,
[129] S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed convex opti- “Q-GADMM: Quantized group ADMM for communication efficient decentral-
mization on dynamic networks,” IEEE Trans. Autom. Control, vol. 61, no. 11, pp. ized machine learning,” IEEE Trans. Commun., vol. 69, no. 1, pp. 164–181, Jan.
3545–3550, Nov. 2016, doi: 10.1109/TAC.2016.2525928. 2021, doi: 10.1109/TCOMM.2020.3026398.
[130] S. Lee, A. Nedić, and M. Raginsky, “Stochastic dual averaging for decentralized [153] S. Zhu and B. Chen, “Distributed average consensus with deterministic
online optimization on time-varying communication graphs,” IEEE Trans. Autom. quantization: An ADMM approach,” in Proc. IEEE Global Conf. Signal Inf.
Control, vol. 62, no. 12, pp. 6407–6414, Dec. 2017, doi: 10.1109/TAC.2017.2650563. Process. (GlobalSIP), Piscataway, NJ, USA: IEEE Press, 2015, pp. 692–696, doi:
[131] K. I. Tsianos, S. Lawlor, and M. G. Rabbat, “Consensus-based distributed 10.1109/GlobalSIP.2015.7418285.
optimization: Practical issues and applications in large-scale machine learning,” [154] N. Bastianello, M. Todescato, R. Carli, and L. Schenato, “Distributed opti-
in Proc. 50th Annu. Allerton Conf. Commun., Control, Comput. (Allerton), mization over lossy networks via relaxed Peaceman-Rachford splitting: A robust
Piscataway, NJ, USA: IEEE Press, 2012, pp. 1543–1550, doi: 10.1109/Allerton. ADMM approach,” in Proc. Eur. Control Conf. (ECC), Piscataway, NJ, USA:
2012.6483403. IEEE Press, 2018, pp. 477–482, doi: 10.23919/ECC.2018.8550322.
[132] S. Scardapane, R. Fierimonte, P. Di Lorenzo, M. Panella, and A. Uncini, [155] N. Bastianello, R. Carli, L. Schenato, and M. Todescato, “Asynchronous dis-
“Distributed semi-supervised support vector machines,” Neural Netw., vol. 80, tributed optimization over lossy networks via relaxed ADMM: Stability and linear
pp. 43–52, Aug. 2016, doi: 10.1016/j.neunet.2016.04.007. convergence,” IEEE Trans. Autom. Control, vol. 66, no. 6, pp. 2620–2635, Jun.
[133] S. Scardapane and P. D. Lorenzo, “A framework for parallel and distributed 2021, doi: 10.1109/TAC.2020.3011358.
training of neural networks,” Neural Netw., vol. 91, pp. 42–54, Jul. 2017, doi: [156] N. Bof, R. Carli, G. Notarstefano, L. Schenato, and D. Varagnolo,
10.1016/j.neunet.2017.04.004. “Multiagent Newton–Raphson optimization over lossy networks,” IEEE Trans.
[134] R. Altilio, P. Di Lorenzo, and M. Panella, “Distributed data clustering over Autom. Control, vol. 64, no. 7, pp. 2983–2990, Jul. 2019, doi: 10.1109/
networks,” Pattern Recognit., vol. 93, pp. 603–620, Sep. 2019, doi: 10.1016/j.pat- TAC.2018.2874748.
cog.2019.04.021. [157] S. M. Trenkwalder, “Computational resources of miniature robots:
[135] Q. Ling and A. Ribeiro, “Decentralized dynamic optimization through the Classification and implications,” IEEE Robot. Autom. Lett., vol. 4, no. 3, pp.
alternating direction method of multipliers,” IEEE Trans. Signal Process., vol. 62, 2722–2729, Jul. 2019, doi: 10.1109/LRA.2019.2917395.
no. 5, pp. 1185–1197, Mar. 2014, doi: 10.1109/TSP.2013.2295055. [158] M. Lahijanian et al., “Resource-performance tradeoff analysis for mobile
[136] H. F. Xu, Q. Ling, and A. Ribeiro, “Online learning over a decentralized robots,” IEEE Robot. Autom. Lett., vol. 3, no. 3, pp. 1840–1847, Jul. 2018, doi:
network through ADMM,” J. Oper. Res. Soc. China, vol. 3, no. 4, pp. 537–562, 10.1109/LRA.2018.2803814.
Dec. 2015, doi: 10.1007/s40305-015-0104-0. [159] B. Gharesifard and J. Cortés, “Distributed strategies for generating weight-
[137] J. Yu, J. A. Vincent, and M. Schwager, “DiNNO: Distributed neural network balanced and doubly stochastic digraphs,” Eur. J. Control, vol. 18, no. 6, pp. 539–
optimization for multi-robot collaborative learning,” IEEE Robot. Autom. Lett., 557, 2012, doi: 10.3166/EJC.18.539-557.
vol. 7, no. 2, pp. 1896–1903, Apr. 2022, doi: 10.1109/LRA.2022.3142402. [160] X. Lian, W. Zhang, C. Zhang, and J. Liu, “Asynchronous decentralized par-
[138] E. Montijano and A. R. Mosteo, “Efficient multi-robot formations using dis- allel stochastic gradient descent,” in Proc. Int. Conf. Mach. Learn., PMLR, 2018,
tributed optimization,” in Proc. 53rd IEEE Conf. Decis. Control, Piscataway, NJ, pp. 3043–3052.
USA: IEEE Press, 2014, pp. 6167–6172, doi: 10.1109/CDC.2014.7040355. 

Authorized licensed use limited to: Stanford University. Downloaded on September 13,2024SEPTEMBER
at 19:03:42 2024 IEEEIEEE
UTC from ROBOTICS & AUTOMATION
Xplore. Restrictions MAGAZINE
apply. 169

You might also like