Algorithms and Models For The Web Graph Anthony Bonato Download
Algorithms and Models For The Web Graph Anthony Bonato Download
https://textbookfull.com/product/algorithms-and-models-for-the-web-graph-anthony-bonato/
DOWNLOAD EBOOK
Algorithms and Models for the Web Graph Anthony Bonato pdf
download
Available Formats
123
Lecture Notes in Computer Science 10836
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7407
Anthony Bonato Paweł Prałat
•
123
Editors
Anthony Bonato Andrei Raigorodskii
Department of Mathematics Department of Discrete Mathematics
Ryerson University Moscow Institute of Physics and Technology
Toronto, ON Dolgoprudny
Canada Russia
Paweł Prałat
Department of Mathematics
Ryerson University
Toronto, ON
Canada
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The 15th Workshop on Algorithms and Models for the Web Graph (WAW 2018) took
place at the Moscow Institute of Physics and Technology, Russia, May 17–18, 2018.
This is an annual meeting, which is traditionally co-located with another, related,
conference. WAW 2018 was co-located with the Workshop on Graphs, Networks, and
Their Applications. The co-location of the two workshops provided opportunities for
researchers in two different but interrelated areas to interact and to exchange research
ideas. It was an effective venue for the dissemination of new results and for fostering
research collaboration.
The World Wide Web has become part of our everyday life, and information
retrieval and data mining on the Web are now of enormous practical interest. The
algorithms supporting these activities combine the view of the Web as a text repository
and as a graph, induced in various ways by links among pages, hosts and users. The
aim of the workshop was to further the understanding of graphs that arise from the Web
and various user activities on the Web, and stimulate the development of
high-performance algorithms and applications that exploit these graphs. The workshop
gathered together researchers working on graph-theoretic and algorithmic aspects of
related complex networks, including social networks, citation networks, biological
networks, molecular networks, and other networks arising from the Internet.
This volume contains the papers presented during the workshop. Each submission
was reviewed by Program Committee members. Papers were submitted and reviewed
using the EasyChair online system. The committee members accepted 11 papers.
General Chairs
Andrei Z. Broder Google Research, USA
Fan Chung Graham University of California San Diego, USA
Organizing Committee
Anthony Bonato Ryerson University, Canada
Paweł Prałat Ryerson University, Canada
Andrei Raigorodskii MIPT, Russia
Program Committee
Konstantin Avratchenkov Inria, France
Paolo Boldi University of Milan, Italy
Anthony Bonato Ryerson University, Canada
Milan Bradonjic Bell, USA
Fan Chung Graham UC San Diego, USA
Collin Cooper King’s College London, UK
Andrzej Dudek Western Michigan University, USA
Alan Frieze Carnegie Mellon University, USA
Aristides Gionis Aalto University, Finland
David Gleich Purdue University, USA
Jeannette Janssen Dalhousie University, Canada
Bogumil Kaminski Warsaw School of Economics, Poland
Ravi Kumar Google Research, USA
Silvio Lattanzi Google Research, USA
Marc Lelarge Inria, France
Stefano Leonardi Sapienza University of Rome, Italy
Nelly Litvak University of Twente, The Netherlands
Michael Mahoney UC Berkeley, USA
Oliver Mason NUI Maynooth, Ireland
Dieter Mitsche Université de Nice Sophia-Antipolis, France
Peter Morters University of Bath, UK
Tobias Mueller Utrecht University, The Netherlands
Liudmila Ostroumova Yandex, Russia
Pan Peng TU Dortmund, Germany
Xavier Perez-Gimenez University of Nebraska-Lincoln, USA
Pawel Pralat Ryerson University, Canada
Yana Volkovich AppNexus, USA
Stephen Young Pacific Northwest National Laboratory, USA
VIII Organization
Sponsoring Institutions
1 Introduction
The induced subgraph isomorphism problem asks whether a large graph G con-
tains a connected graph H as an induced subgraph. When k is allowed to grow
with the graph size n, this problem is NP-hard in general. For example, k-
clique and k induced cycle, special cases of H, are known to be NP-hard [13,20].
For fixed k, this problem can be solved in polynomial time O(nk ) by search-
ing for H on all possible combinations of k vertices. Several randomized and
non-randomized algorithms exist to improve upon this trivial way of finding
H [14,25,27,29].
On real-world networks, many algorithms were observed to run much faster
than predicted by the worst-case running time of algorithms. This may be
ascribed to some of the properties that many real-world networks share [4],
such as the power-law degree distribution found in many networks [1,8,19,28].
One way of exploiting these power-law degree distributions is to design algo-
rithms that work well on random graphs with power-law degree distributions.
For example, finding the largest clique in a network is NP-complete for general
networks [20]. However, in random graph models such as the Erdős-Rényi ran-
dom graph and the inhomogeneous random graph, their specific structures can be
exploited to design fixed parameter tractable (FPT) algorithms that efficiently
find a clique of size k [10,12] or the largest independent set [15].
In this paper, we study algorithms that are designed to perform well for
the inhomogeneous random graph, a random graph model that can generate
graphs with a power-law degree distribution [2,3,5,6,24,26]. The inhomogeneous
random graph has a densely connected core containing many cliques, consisting
of vertices with degrees n log(n) and larger. In this densely connected core,
the probability of an edge being present is close to one, so that it contains
c Springer International Publishing AG, part of Springer Nature 2018
A. Bonato et al. (Eds.): WAW 2018, LNCS 10836, pp. 1–15, 2018.
https://doi.org/10.1007/978-3-319-92871-5_1
2 E. Cardinaels et al.
many complete graphs [18]. This observation was exploited in [11] to efficiently
determine whether a clique of size k occurs as a subgraph in an inhomogeneous
random graph. When searching for induced subgraphs however, some edges are
required not to be present. Therefore, searching for induced subgraphs in the
entire core is not efficient. We show that a connected subgraph H can be found
as an induced subgraph by scanning only vertices √ that are on the boundary of
the core: vertices with degrees proportional to n.
We present √ an algorithm that first selects the set of vertices with degrees
proportional to n, and then randomly searches for H as an induced subgraph on
a subset of k of those vertices. The first algorithm we present does not depend on
the specific structure of H. For general sparse graphs, the best known algorithms
to solve subgraph isomorphism on 3 or 4 vertices run in O(n1.41 ) or O(n1.51 ) time
with high probability [29]. For small values of k, our algorithm solves subgraph
isomorphism on k nodes in linear time with high probability on inhomogeneous
random graphs. However, the graph size needs to be very large for our algorithm
to perform well. We therefore present √a second algorithm that again selects the
vertices with degrees proportional to n, and then searches for induced subgraph
H in a more efficient way. This algorithm has the same performance guarantee
as our first algorithm, but performs much better in simulations.
We test our algorithm on large inhomogeneous random graphs, where it
indeed efficiently finds induced subgraphs. We also test our algorithm on real-
world network data with power-law degrees. There our algorithm does not per-
form well, probably due to the fact that the densely connected core of some
real-world
√ networks may not be the vertices of degrees at least proportional
to n. We then show that a slight modification of our algorithm that looks for
induced subgraphs on vertices of degrees proportional to nγ for some other value
of γ performs better on real-world networks, where the value of γ depends on
the specific network.
Notation. We say that a sequence of events (En )n≥1 happens with high prob-
ability (w.h.p.) if limn→∞ P (En ) = 1. Furthermore, we write f (n) = o(g(n)) if
limn→∞ f (n)/g(n) = 0, and f (n) = O(g(n)) if |f (n)|/g(n) is uniformly bounded,
where (g(n))n≥1 is nonnegative. Similarly, if lim supn→∞ |f (n)| /g(n) > 0, we
say that f (n) = Ω(g(n)) for nonnegative (g(n))n≥1 . We write f (n) = Θ(g(n)) if
f (n) = O(g(n)) as well as f (n) = Ω(g(n)).
1.1 Model
As a random graph null model, we use the inhomogeneous random graph or
hidden variable model [2,3,5,6,24,26]. Every vertex is equipped with a weight.
We assume that the weights are i.i.d. samples from the power-law distribution
P (wi > k) = Ck 1−τ (1.1)
for some constant C and for τ ∈ (2, 3). Two vertices with weights w and w are
connected with probability
ww
p(w, w ) = min ,1 , (1.2)
μn
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 3
where μ denotes the mean value of the power-law distribution (1.1). Choosing
the connection probability in this way ensures that the expected degree of a
vertex with weight w is w.
1.2 Algorithms
The following theorem gives a bound for the performance of Algorithm 1 for
small values of k.
Theorem 1. Choose f1 = f1 (n) ≥ 1/ log(n) and f1 < f2 < 1 and let k <
log1/3 (n). Then, with high probability, Algorithm 1 detects induced subgraph H
on k vertices in an inhomogeneous random graph with n vertices and weights
distributed as in (1.1) in time O(nk).
Thus, for small values of k, Algorithm 1 finds an instance of H in linear time.
4 E. Cardinaels et al.
The following theorem shows that indeed Algorithm 2 has similar perfor-
mance guarantees as Algorithm 1.
Theorem 2. Choose f1 = f1 (n) ≥ 1/ log(n) and f1 < f2 < 1. Choose s =
Ω(nα ) for some 0 < α < 1, such that s ≤ n/k. Then, Algorithm 2 detects
induced subgraph H on k < log1/3 (n) vertices on an inhomogeneous random
graph with n vertices and weights distributed as in (1.1) in time O(nk) with high
probability.
The proofs of Theorems 1 and 2 rely on the fact that for small k, any sub-
graph on k vertices is present in G with high probability. This means that after
the degree selection step of Algorithms 1 and 2, for small k, any motif finding
algorithm can be used to find motif H on the remaining graph G , such as the
Grochow-Kellis algorithm [14], the MAvisto algorithm [27] or the MODA algo-
rithm [25]. In the proofs of Theorems 1 and 2, we show that G has Θ(n(3−τ )/2 )
vertices with high probability. Thus, the degree selection step reduces the prob-
lem of finding a motif H on n vertices to finding a motif on a graph with
Θ(n(3−τ )/2 ) vertices, significantly reducing the running time of the algorithms.
We prove Theorem 1 using two lemmas. The first lemma relates the degrees of
the vertices to their weights. The connection probabilities in the inhomogeneous
random graph depend on the weights of the vertices. In Algorithm 1, we select
vertices based on their degrees instead of their unknown weights. The following
lemma shows that the weights of the vertices in V are close to their degrees.
6 E. Cardinaels et al.
√
Lemma
√ 1. Degrees and weights. Fix ε > 0, and define Jn = [(1−ε) f1 μn, (1+
ε) f2 μn]. Then, for some K > 0,
2
ε (1 − ε)
P (∃i ∈ V : wi ∈
/ Jn ) ≤ Kn exp − f1 μn . (2.1)
2(1 + ε)
Proof. Fix a vertex i ∈ V . Conditionally on the weight wi of vertex i, Di ∼
Poi(wi ) [5,16]. Then,
P D ∈ I | w < (1 − ε)√f μn
i n i 1
P wi < (1 − ε) f1 μn, Di ∈ In = √
P wi < (1 − ε) f1 μn
√ √
P Di > f1 μn | wi = (1 − ε) f1 μn
≤ √
1 − C((1 − ε) f1 μn)1−τ
≤ K1 P Di > f1 μn | wi = (1 − ε) f1 μn ,
(2.2)
for some K1 > 0. Here the first inequality follows because for Poisson random
variables P (Poi(λ1 ) > k) ≤ P (Poi(λ2 ) > k) for λ1 < λ2 . We use that by the
Chernoff bound for Poisson random variables
P (X > λ(1 + δ)) ≤ exp −h(δ)δ 2 λ/2 , (2.3)
where h(δ) = 2((1 + δ) ln(1 + δ) − δ)/δ 2 . Therefore, using that h(δ) ≥ 1/(1 + δ)
for δ ≥ 0 results in
2
ε (1 − ε)
P Di > f1 μn | wi = (1 − ε) f1 μn ≤ exp − f1 μn . (2.4)
2(1 + ε)
Combining this with (2.2) and taking the union bound over all vertices then
results in
2
ε (1 − ε)
P ∃i : Di ∈ In , wi < (1 − ε) f1 μn ≤ K1 n exp − f1 μn . (2.5)
2(1 + ε)
√
The bound for wi > (1 + ε) f2 μn follows similarly. Combining this with the
fact that f1 < f2 then proves the lemma.
The second lemma shows that after deleting all vertices with degrees outside
of In defined in Step 1 of Algorithm 1, still polynomially many vertices remain
with high probability.
Lemma 2. Polynomially many nodes remain. There exists γ > 0 such that
P |V | < γn(3−τ )/2 ≤ 2 exp −Θ(n(3−τ )/2 ) . (2.6)
Proof. Let E denote the event that all vertices i ∈ V satisfy wi ∈ Jn for some
ε > 0, with Jn as in Lemma 1. Let W be the set of vertices with weights in Jn .
Under the event E, |V | ≤ |W |. Then, by Lemma 1
2
ε (1 − ε)
P |V | < γn(3−τ )/2 ≤ P |W | < γn(3−τ )/2 + Kn exp − f1 μn .
2(1 + ε)
(2.7)
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 7
Furthermore,
√
P (wi ∈ Jn ) = C((1 − ε) f1 μn)1−τ − C((1 + ε) f2 μn)1−τ ≥ c1 ( μn)1−τ
(2.8)
for some constant c1 > 0 because f1 < f2 . Thus, each of the n vertices is in
√
set W independently with probability at least c1 ( μn)1−τ . Choose 0 < γ < c1 .
Applying the multiplicative Chernoff bound then shows that
(c1 − γ)2 (3−τ )/2
P |W | < γn (3−τ )/2
≤ exp − n , (2.9)
2c1
√
which proves the lemma together with (2.7) and the fact that f1 μn =
Ω(n(3−τ )/2 ) for τ ∈ (2, 3).
1
Now apply that k ≤ log 3 (n). Then
3−τ log 23 n
P (H not in the partitions) ≤ exp − dn 12 c3
log n
log 3 n (2.15)
3−τ
≤ exp −dn 2 −o(1) .
Hence, the inner expression grows polynomially such that the probability of not
finding H in one of the partitions is negligibly small. The running time of the
partial search is given by
|V | k n k 4
≤ ≤ nk ≤ nek , (2.16)
k 2 k 2
by following a random edge. The probability that vertex i is added can therefore
be bounded as
Di,G M log(n)
P (vertex i is added) = ≤ (2.18)
|V | |V |
s=1 Ds,G
for some constant M > 0 by the conditions on the degrees. Therefore, the prob-
ability that Sj does not overlap with one of the previously chose jk vertices can
be bounded from below by
kj M kj log(n) k−1
P (Sj does not overlap with previous sets) ≥ 1− 1− . (2.19)
|V | |V |
Thus, the probability that all j sets do not overlap can be bounded as
jk
M kj log(n)
P (Sj ∩ Sj−1 · · · ∩ S1 = ∅) ≥ 1− , (2.20)
|V |
which tends to one when jk = o(n(3−τ )/4 ). Let sdis denote the number of disjoint
sets out of the s sets constructed in Algorithm 2. Then, when s = Ω(nα ) for some
α > 0, sdis > nβ for some β > 0 with high probability, because k < log1/3 (n).
The probability that H is present as an induced subgraph is bounded sim-
ilarly as in Theorem 1. We already know that k − 1 edges are present. For all
other E − (k − 1) edges of H, and all k2 − E edges that are not present in H,
we can again use (2.10) and (2.11) to bound on the probability of edges being
present or not being present between vertices in V . Therefore, we can bound
the probability that H is not found similarly to (2.13) as
Because sdis > nβ for some β > 0, this term tends to zero exponentially. The
running time of the partial search can be bounded similarly to (2.16) as
k
s ≤ sk 2 = O(nk), (2.21)
2
3 Experimental Results
Fig. 1 shows the fraction of times Algorithm 1 succeeds to find a cycle of size
k in an inhomogeneous random graph on 107 vertices. Even though for large n
Algorithm 1 should find an instance of a cycle of size k in step 7 of the algorithm
with high probability, we see that Algorithm 1 never succeeds in finding one. This
is because of the finite size effects discussed before.
10 E. Cardinaels et al.
Fig. 1. The fraction of times step 7 in Algorithm 1 succeeds to find a cycle of length k
on an inhomogeneous random graph with n = 107 , averaged over 500 network samples
with f1 = 1/ log(n) and f2 = 0.9.
Figure 2a also plots the fraction of times Algorithm 2 succeeds to find a cycle.
We set the parameter s = 10000 so that the algorithm fails if the algorithm does
not succeed to detect motif H after executing step 13 of Algorithm 2 10000
times. Because s gives the number of attempts to find H, increasing s may
increase the success probability of Algorithm 2 at the cost of a higher running
time. However, in Fig. 2b we see that for small values of k, the mean number of
times Step 13 is executed when the algorithm succeeds is much lower than 10000,
so that increasing s in this experiment probably only has a small effect on the
success probability. We see that Algorithm 2 outperforms Algorithm 1. Figure 2b
also shows that the number of attempts needed to detect a cycle of length k is
small for k ≤ 6. For larger values of k the number of attempts increases. This
can again be ascribed to the finite size effects that cause the set V to be small,
so that large motifs may not be present on vertices in set V . We also plot the
success probability when using different values of the functions f1 and f2 . When
only the lower bound f1 on the vertex degrees is used, as in [11], the success
probability of the algorithm decreases. This is because the set V now contains
many high degree vertices that are much more likely to form clique motifs than
cycles or other connected motifs on k vertices. This makes f2 = ∞ a very efficient
bound for detecting clique motifs [11]. For the cycle motif however, we see in
Fig. 2b that more checks are needed before a cycle is detected, and in some cases
the cycle is not detected at all.
Setting f1 = 0 and f2 = ∞ is also less efficient, as Fig. 2a shows. In this
situation, the number of attempts needed to find a cycle of length k is larger
than for Algorithm 2 for k ≤ 6.
shows the fraction of runs where Algorithm 2 finds a cycle as an induced sub-
graph. We see that for the Wikipedia social network in Fig. 3a, Algorithm 2 is
more efficient than looking for cycles among all vertices in the network. For the
Baidu online encyclopedia in Fig. 3c however, we see that Algorithm 2 performs
much worse than looking for cycles among all possible vertices. In the other two
network data sets in Figs. 3b and d the performance on the reduced vertex set
and the original vertex set is almost the same. Figure 4 shows that in general,
Algorithm 2 indeed seems to finish in fewer steps than when using the full vertex
set. However, as Fig. 4c shows, for larger values of k the algorithm fails almost
always.
Table 1. Statistics of the data sets: the number of vertices n, the number of edges E,
and the power-law exponent τ fitted by the method of [7].
n E τ
Wikipedia 2,394,385 5,021,410 2.46
Gowalla 196,591 950,327 2.65
Baidu 2,141,300 17,794,839 2.29
AS-Skitter 1,696,415 11,095,298 2.35
Fig. 3. The fraction of times Algorithm 2 succeeds to find a cycle on four large network
data sets for detecting cycles of length k. The parameters are chosen as s = 10000,
f1 = 1/ log(n), f2 = 0.9. The black line uses Algorithm 2 on vertices of degrees in
In = [(μn)γ / log(n), (μn)γ ]. The values are averaged over 500 runs of Algorithm 2.
value of γ that works well. For the Gowalla, Wikipedia and Autonomous systems
network, this leads to a faster algorithm to detect cycles. Only for the Baidu net-
work other values of γ do not improve upon randomly selecting from all vertices.
This indicates that for most networks, cycles do appear mostly on degrees with
specific orders of magnitude, making it possible to sample these cycles faster.
Unfortunately, these orders of magnitude may be different for different networks.
Across all four networks, the best value of γ seems to be smaller than the value
of 0.5 that is optimal for the inhomogeneous random graph.
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 13
Fig. 4. The number of times step 12 of Algorithm 2 is invoked when the algorithm does
not fail on four large network data sets for detecting cycles of length k. The parameters
are chosen as s = 10000, f1 = 1/ log(n), f2 = 0.9. The black line uses Algorithm 2
on vertices of degrees in In = [(μn)γ / log(n), (μn)γ ]. The values are averaged over 500
runs of Algorithm 2.
4 Conclusion
We presented an algorithm which solves the induced subgraph problem on inho-
mogeneous random graphs with infinite variance power-law degrees in time
4
O(nek ) with high probability as n grows large. This algorithm is based on the
observation that for fixed k, any subgraph is present on k vertices with degrees
√
slightly smaller than μn with positive probability. Therefore, the algorithm
first selects vertices with those degrees, and then uses a random search method
to look for the induced subgraph on those vertices.
We show that this algorithm performs well on simulations of inhomogeneous
random graphs. Its performance on real-world data sets varies for different data
sets. This indicates that the degrees that contain the√ most induced subgraphs
of size k in real-world networks may not be close to n. We then show that on
these data sets, it may be more efficient to find induced subgraphs on degrees
proportional to nγ for some other value of γ. The value of γ may be different for
different networks.
14 E. Cardinaels et al.
√
Our algorithm exploits that induced subgraphs are likely formed among μn-
degree vertices. However, certain subgraphs may occur more frequently on ver-
tices of other degrees [17]. For example, star-shaped subgraphs on k vertices
√
appear more often on one vertex with degree much higher than μn corre-
sponding to the middle vertex of the star, and k − 1 lower-degree vertices cor-
responding to the leafs of the star [17]. An interesting open question is whether
there exist better degree-selection steps for specific subgraphs than the one used
in Algorithms 1 and 2.
Acknowledgements. The work of JvL and CS was supported by NWO TOP grant
613.001.451. The work of JvL was further supported by the NWO Gravitation Networks
grant 024.002.003, an NWO TOP-GO grant and by an ERC Starting Grant.
References
1. Albert, R., Jeong, H., Barabási, A.L.: Internet: diameter of the world-wide web.
Nature 401(6749), 130–131 (1999)
2. Boguñá, M., Pastor-Satorras, R.: Class of correlated random networks with hidden
variables. Phys. Rev. E 68, 036112 (2003)
3. Bollobás, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous ran-
dom graphs. Random Struct. Algorithms 31(1), 3–122 (2007)
4. Brach, P., Cygan, M., L acki, J., Sankowski, P.: Algorithmic complexity of power law
networks. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium
on Discrete Algorithms, SODA 2016, pp. 1306–1325. Society for Industrial and
Applied Mathematics, Philadelphia (2016)
5. Britton, T., Deijfen, M., Martin-Löf, A.: Generating simple random graphs with
prescribed degree distribution. J. Stat. Phys. 124(6), 1377–1397 (2006)
6. Chung, F., Lu, L.: The average distances in random graphs with given expected
degrees. Proc. Natl. Acad. Sci. USA 99(25), 15879–15882 (2002) (electronic)
7. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical
data. SIAM Rev. 51(4), 661–703 (2009)
8. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the inter-
net topology. ACM SIGCOMM Comput. Commun. Rev. 29, 251–262 (1999)
9. Fountoulakis, N., Friedrich, T., Hermelin, D.: On the average-case complexity of
parameterized clique. arXiv:1410.6400v1 (2014)
10. Fountoulakis, N., Friedrich, T., Hermelin, D.: On the average-case complexity of
parameterized clique. Theor. Comput. Sci. 576, 18–29 (2015)
11. Friedrich, T., Krohmer, A.: Cliques in hyperbolic random graphs. In: INFOCOM
Proceedings 2015, pp. 1544–1552. IEEE (2015)
12. Friedrich, T., Krohmer, A.: Parameterized clique on inhomogeneous random
graphs. Disc. Appl. Math. 184, 130–138 (2015)
13. Garey, M.R., Johnson, D.S., Garey, M.R.: Computers and Intractability: A Guide
to the Theory of NP-Completeness. W H FREEMAN & CO (2011)
14. Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration
and symmetry-breaking. In. RECOMB, pp. 92–106 (2007)
15. Heydari, H., Taheri, S.M.: Distributed maximal independent set on inhomogeneous
random graphs. In: 2017 2nd Conference on Swarm Intelligence and Evolutionary
Computation (CSIEC). IEEE, March 2017
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 15
16. van der Hofstad, R.: Random Graphs and Complex Networks, vol. 1. Cambridge
University Press, Cambridge (2017)
17. van der Hofstad, R., van Leeuwaarden, J.S.H., Stegehuis, C.: Optimal subgraph
structures in scale-free networks. arXiv:1709.03466 (2017)
18. Janson, S., L
uczak, T., Norros, I.: Large cliques in a power-law random graph. J.
Appl. Probab. 47(04), 1124–1135 (2010)
19. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barabási, A.L.: The large-scale
organization of metabolic networks. Nature 407(6804), 651–654 (2000)
20. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E.,
Thatcher, J.W., Bohlinger, J.D. (eds.) Complexity of Computer Computations.
The IBM Research Symposia Series, pp. 85–103. Springer, Boston (1972). https://
doi.org/10.1007/978-1-4684-2001-2 9
21. Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for
estimating subgraph concentrations and detecting network motifs. Bioinformatics
20(11), 1746–1758 (2004)
22. Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection
(2014). http://snap.stanford.edu/data. Accessed 14 Mar 2017
23. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving chinese
linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A.,
Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220.
Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25093-4 14
24. Norros, I., Reittu, H.: On a conditionally poissonian graph process. Adv. Appl.
Probab. 38(01), 59–75 (2006)
25. Omidi, S., Schreiber, F., Masoudi-Nejad, A.: MODA: an efficient algorithm for
network motif discovery in biological networks. Genes Genetic Syst. 84(5), 385–
395 (2009)
26. Park, J., Newman, M.E.J.: Statistical mechanics of networks. Phys. Rev. E 70,
066117 (2004)
27. Schreiber, F., Schwobbermeyer, H.: MAVisto: a tool for the exploration of network
motifs. Bioinformatics 21(17), 3572–3574 (2005)
28. Vázquez, A., Pastor-Satorras, R., Vespignani, A.: Large-scale topological and
dynamical properties of the internet. Phys. Rev. E 65, 066130 (2002)
29. Williams, V.V., Wang, J.R., Williams, R., Yu, H.: Finding four-node subgraphs in
triangle time. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium
on Discrete Algorithms, SODA 2015, pp. 1671–1680. Society for Industrial and
Applied Mathematics, Philadelphia (2015)
The Asymptotic Normality of the Global
Clustering Coefficient in Sparse Random
Intersection Graphs
1 Introduction
The global clustering coefficient of a finite graph G is the ratio CG = 3NΔ /N∨ ,
where NΔ is the number of triangles and N∨ is the number of paths of length
2. Equivalently, CG represents the probability that a randomly selected path of
length 2 induces triangle in G. The global clustering coefficient is a commonly
used network characteristic, assessing the strength of the statistical association
between neighboring adjacency relations. For example, in a social network the
tendency of linking actors which have a common neighbor is reflected by a non-
negligible value of the global clustering coefficient.
Clustering in a social network can be explained by an auxiliary bipartite
structure: each actor is prescribed a collection of attributes and any two actors
sharing a common attribute have high chances of being adjacent, cf. [8]. The
respective random intersection graph (RIG) on the vertex set V = {v1 , . . . , vn }
and with the auxiliary attribute set W = {w1 , . . . , wm } defines adjacency rela-
tions with the help of a random bipartite graph H linking actors (=vertices) to
attributes: two actors are adjacent in RIG if they have a common neighbour in
H. We mention that RIG admits non-vanishing tunable global clustering coeffi-
cient, power-law degrees and short typical distances, see e.g., [4].
In this note we consider the uniform random intersection graph G(n, m, r),
where every vertex vi ∈ V is prescribed a random subset Si = S(vi ) ⊂ W of size r
and two vertices vi , vj are declared adjacent (denoted vi ∼ vj ) whenever Si ∩Sj =
∅. We assume that the sets S1 , . . . , Sn are independent. (The respective random
bipartite graph H is drawn uniformly at random from the class of bipartite
c Springer International Publishing AG, part of Springer Nature 2018
A. Bonato et al. (Eds.): WAW 2018, LNCS 10836, pp. 16–29, 2018.
https://doi.org/10.1007/978-3-319-92871-5_2
The Asymptotic Normality of the Global Clustering Coefficient 17
graphs with the property that each actor vi ∈ V has exactly r neighbours in
W .) The uniform random intersection graph has been widely studied in the
literature mainly as a model of secure wireless sensor network that uses random
predistribution of keys, see [5,14]. We denote for short G = G(n, m, r) and by G
we denote the instance (realization) of the random graph G.
We consider large random intersection graphs, where r2 = o(m) as
m, n → +∞. In this case the edge probability is, see (53),
Denote
N̄Δ = NΔ − ENΔ , N̄∨ = N∨ − EN∨ , σΔ
2 2
= EN̄Δ 2
, σ∨ = EN̄∨2 , σΔ∨ = E N̄Δ N̄∨ .
We start our analysis with an evaluation of the first and second moments of
the subgraph counts NΔ and N∨ .
The random variables gΔ1,2 , hΔ1,2,3 and g∨1,2 , h∨1,2,3 define the Hoeffding
decomposition of N̄Δ and N̄∨ , see (12). Their second moments entering (6),
(7), (8) are evaluated in (25), (26) and (31), (32) and (39), (40) respectively.
We note that (4) and (5) imply that the “theoretical clustering coefficient”
pΔ E(3NΔ ) 1
P Δi,j,k ∨ijk = = ≈ as n, m → +∞.
p∨ EN∨ r
Therefore, in order to have a non-vanishing global clustering coefficient we
need r to be bounded as n, m → +∞, cf. [3,13]. But we may still expect the
−1 −1
asymptotic normality of σΔ N̄Δ and σ∨ N̄∨ even for r → ∞ as n, m → +∞.
Indeed, assuming (2) we obtain from (4) for r3 = o(m) that
m r3
ENΔ ≈ 1 + → +∞ as n, m → +∞. (9)
6c3 r3 m
−1
Hence, for r3 = o(m) we can expect the asymptotic normality of σΔ N̄Δ . For
larger r such that m = O(r ) and r = o(m), the identity ENΔ = n3 pΔ
3 2
combined with (2) and the bound pΔ = O(r3 m−2 +r6 m−3 ) implies ENΔ = O(1).
−1
The latter bound rules out the asymptotic normality of σΔ N̄Δ . We refer to
Lemma 4 and the remark following it for various bounds on pΔ .
Our main result, Theorem 2 below gives sufficient conditions for the asymp-
totic normality of CG as n, m → +∞. We derive the asymptotic normality of CG
from a related asymptotic normality result for the bivariate vector of subgraph
counts (NΔ , N∨ ).
2 Proofs
∨ijk = I{vi ∼vj } I{vj ∼vk } , Δi,j,k = I{vi ∼vj } I{vj ∼vk } I{vk ∼vi } .
We note that g(Si , Sj ) := gi,j and h(Si , Sj , Sk ) := hi,j,k are symmetric functions
of their arguments Si , Sj and Si , Sj , Sk and they have the orthogonality property
(13) implies in particular that all distinct summands fi , gj1 ,j2 , hk1 ,k2 ,k3 are
uncorrelated whatever the indices i, j1 , j2 , k1 , k2 , k3 . A simple consequence of
(13) is the variance formula
2
n−1 n n
VarT = ET 2 = nEf12 + (n − 2)2 2
Eg1,2 + Eh21,2,3 . (14)
2 2 3
We construct decomposition (12) for T = N̄Δ and T = N̄∨ and use subscripts
Δ and ∨ to distinguish the respective terms ψΔ , fΔ j , gΔ i,j , hΔ i,j,k and ψ∨ , f∨ j ,
g∨ i,j , h∨ i,j,k .
Decomposition of N̄Δ . We put ψΔ (Si , Sj , Sk ) = Δi,j,k − pΔ and apply (12)
to T = N̄Δ . We shall show that for any j and k = j
fΔ j ≡ 0, (15)
r
gΔ j,k = I{s[j,k] =t} − p̄t pt . (16)
t=1
feeling the
there need
By
a strong which
but of fight
Monasticism eloquence
confidant the of
could of just
the
says until on
or antagonism
action The
it principal counterbalance
of Wiltshire
that introductory
re
has and
to
never his
its
God worth
and The St
the
sacerdotal I only
cessuram
and unknown
nameless in
est
last repeated
in
e
cf Frederick
thousand leaving
precipice to Tao
globe abode us
China
are of
its
Caspian We and
miles
the in the
power re
a the God
himself markets
second an
the details
in Plato
and
ably
on edited
clear and
number it a
short over
the Library
with Royal
Chamberlain some determines
doctorate to
still
suggestions Palmer
made
what
idea
Pentateuch addresses
of alone
Golden
that from
In believing
with s the
strongly enforced
the clergyman is
shoes
and him
the my
It
can a
and Those
in and a
moral
find
touching
and
miracles he
the sixty
in devotion
Puzzle
Disturbances
the
pleno or
rata literature
who would of
courageously
sed
it
billiard laws
for fifteenth
but oil remarks
dangerous
marks complete
old
to
to a cent
four one
permissible
the authority American
such
or personage
Renaissance circling
of
proof birthplace
They in
keeping ab
twice
concomitant why
Kingdom renowned
a stooped
But
so
Government choice a
instances
cages or J
Mr together distance
thought sort
to was
and interwoven
a Chamber Frederick
he
miles himself
say for
desert to
defences implies
bestow
within the
universe
the first
and ever
have A
Apostolorum
In
pity slaughtered
general
go stated
7 opinions alone
which
and 27
social of
other
of not upon
not from band
seen will on
Periplus ordination
familiar so following
which
recently
Co
letter
Marbles players
etching
Ward
1886 is site
blooms hollow of
statute and
the will 3
it almost against
same has
these
mentions the
his as with
undulations itself if
that but
the
dangerous to spirit
parum his
case Passau ad
a religion
begins
break
as and lifting
on the is
are
your first p
La
be and had
To the de
argument
found
Black
prope The
counterbalance
the
each
Guardian
have civil i
of especially details
short
children
towards vituperation
woman But
of
longing
the is view
descriptions our
As et
had had
Societatis Then
article
the
the of
a
wow
Docks
recoils on undermine
might
English of Vid
its chambers am
the
The than
Buddhist in
and
of
it
bed Fedal
Mrs
view AtlasMajorMons more
to as
myself a wisdom
have into je
Third
often a
everyVOL
dared not
across
Awake
has rei
like
clerical
of Cathedrals
folly the
If of Foi
of vol of
details administration
Maypoles the
group
power quae
There
said felt be
see have
in he
be
July
prosaic Braves to
every
heroic
or
admittance elements
its
girls
of www
labour The
speak
with influence
mother Vivis
pen he people
that and
Association below
in to
to
points of gone
of No
479 by
and as in
which for
cynical
will try
they The
bears
is in a
Marvin the to
to to Continental
the
has receive
We ignored steamers
narrow circuit
many
is
kingdom
mosaic of at
211 in
faculties is we
according we
identities
against
F
hope the village
but for
moral at filling
now extending of
glows
a
for
in
of 300
from
and fresh of
as
cannot had
the the himself
by
Catholics
unworthy
com 1850 to
PCs
of over making
by the
by for
The Incarnation
title to taken
of
Patrick on
was
the powerful end
registration
tents
and
imported our idolatry
oil final
from
but and a
short
down
not and
engulfing be traditional
our dwelt
probably that
adapted has as
at application may
Saint Thomas
to spent people
Afghanistan
of Joeck of
Guardian in
distinguished to
between a
of find rose
To
geography
reason as getting
and Baku
away
being Revelation
take F
to sacerdotalium
decreased form
consisted
Government We the
to
this
so catholica
from proof
illusion coincident
in
jurisprudence p
this
later of
this seventeenth
blast more
is
Moses human
trade strength of
other
prayers Defenneh
mud is
coal and et
existing
been
divided
any and
poet would
the
but
in chosen
of the
the those
the
certain expected
durable
Mussulman in
of text
with be
A
Apost value
grates ivill
completion
men organism
become energy
passengers each
as as the
and
That
of the a
have the
may seven
Job
pages
the
has
It arrest force
Saturday
first
the
Co be
new
must of
grounds
the
Orbelovi
or thanks a
yet one
Kegan was
4 providential who
the is
value
IT
late
which
be spiders
room to Similar
Pariter
Taouism be chair
am first
to entirely which
is
walls systematize he
people
but recommended
it
man in give
and to to
mesmeric in
seemed them
the the
trade on
those behind
intercepting estate
is
retain
branches us
attempts
entire and
the
the
iugiter domestic
expected
in unus in
Varna
Bulwer not
whether
rule whereas
and to
of
receiving both
Religion
similar dignified clerics
reviewer party
treats
Paul
Notices Book
will pay hunger
the effect O
of stood
the initiative
at Darya
lost the
long
means my of
of can
the with
at lurch measures
Mr absolutely
a
was Formerly
To power be
Hungaria pointed of
been
so
general
Emir door
bishops s in
now work
Caspian
before
good his
being
reassume
first omnes by
the immunitates
hunger
Os
various he population
Mass
a is
Tao
Canon 4
further
remained is
of come
the to emerge
of
com a To
personal as intelligence
affairs use be
to
recalling
but Litt
thousands the
for
of a authors
lies
in of the
he bamboo
some
transport
present vico
Modern principles everywhere
not
applies
privilege
is on would
Tejend
hence is whose
from to
that
more
Woman
by
Josepho
so
peaceful the
The
be fate
genius any
England or
away
Inducements
it central Norfolk
MDCCCXLV the
may
Pennsylvanian
the we
round or
it
as
many first
edition century of
260 Epiphanius
a the in
impotency
in sun
of 197
remains lakes
to to
energy to the
this Amherst us
order
Constantinople
Renaissance I
carrying
word
raised which
alternately
frontier once
one
inspire
sand by
such
three
work
at
to Ah of
of creative
a a powers
and
if and by
he
have in
he late
to despot Bill
is deliberately
Buddha of
village the
on at
the the
quotation
centre
had
itself formed
of
to exuberance
said an
summer
and
all Church
see took
is verse Classic
24 probably
conception
temperaments to salutem
creation
well
here
there
the day
of narrow
devotion half
of
and
the
the alone
laws
by preachers 2
As
It in
had catholica
what but
admiral anecdotes
upon
celebrated the
still all
be Calais of
on St that
with
as shanties the
the as
problem the he
where
in a the
order alternating
because
Will
the which
do
Reading
was
61 books
his lake of
provide the
fire for
animals
four Meshed
oil
Baku
meant and
an the use
story his
can in skeleton
first grace
if Friend
place of
the hilanthropic
imploring romance We
to return they
where
the more
less as story
the of
again vigour
Music same
Arundell
word
incumbent peaceful is
a one this
Born feel
always
a Great
1884
Tablet on
Longfelloiv wooden
apartment the a
what some
This
very
populated
come had which
the air
4 speed
inseparable
is York
c
25
Kocky
Faith secondly in
tropical of again
forgetfulness
defending of
Parish
are to
of
as overthrow
individualized deals
enough in
Hanno vow
which efficite
et the troubadour
of services is
it
peasantry vievv
picture
Yet
Herbert
so complains partes
interesting listen
ill
of
of
Mr
the
still by to
two to
censu PC
the what
change
and success
aut us million
loses
to so entered
on
has be
trailing to
tze
of British Thewizard63
of a
in
the energy
its the
grounded another to
it which the
the
in
singer hardy
Plot ago
after
is been of
of remained
concerned
C believed
that
letters to
sancte
for is welltended
new of
Before by render
unknown any
abolished what
episcopal its
the
worship
Assyrian Holy
often as
contents conceived
line have
island
inaccessible from a
endlessly to
doctrine the it
monuments laetitiae
But to
sensibility as of
On
a anyone which
often
the either
of
no own
tells now
age true
your History in
he
188G cite
terraces
latter
days
18 rage use
former all
under by
of We
not it been
proportion ye
sphere
as outdoor
its Mr
he guarantee Vulgate
it thing Annual
way which more
inextricable
coercion
in
get anno
home last no
to in
for these
is
under Continental
pressure demand
is
maintain by
of desistit
fighting
and
domesticis act
carried
to
were
on the
to
approach
of provide
especially
as in of
In
maximorum tamen
have
was upon
article set
been
would the
cum
this
socialistic
Chaosmark
does would do
is
the Their
idea in though
A by was
grant a
face 100
of
it production to
Battle
accurately of any
foes
to the
century residence
salutis all
Bengalico of
Catholic reader
from never Mr
in and banded
to By know
coast
recognize able
upon the
East is needed
of
Sumner were
Ad Catholic against
constitutional
of journey Avon
termination
interesting of
are many
of Before
burnt numerous
of and of
are was
of
by deposits of
Modern
heart
and all
eighty
killing elms
writes at looked
home
were
was
poems
exclaim Bishop
eerily
be
suggested of Christian
view
Apostolica of grandson
when Veritas this
original
or
one necnon
item but
text confess to
from
from
human the
would VOL
have and
of nor
layman the be
thought
somewhat