0% found this document useful (0 votes)
1K views175 pages

Algorithms and Models For The Web Graph Anthony Bonato Download

Educational material: Algorithms and Models for the Web Graph Anthony Bonato Immediately Available. Thorough academic resource featuring expert analysis, comprehensive coverage, and structured content for effective learning.

Uploaded by

vehtyzztph030
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views175 pages

Algorithms and Models For The Web Graph Anthony Bonato Download

Educational material: Algorithms and Models for the Web Graph Anthony Bonato Immediately Available. Thorough academic resource featuring expert analysis, comprehensive coverage, and structured content for effective learning.

Uploaded by

vehtyzztph030
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 175

Algorithms and Models for the Web Graph Anthony

Bonato pdf download

https://textbookfull.com/product/algorithms-and-models-for-the-web-graph-anthony-bonato/

★★★★★ 4.6/5.0 (33 reviews) ✓ 126 downloads ■ TOP RATED


"Fantastic PDF quality, very satisfied with download!" - Emma W.

DOWNLOAD EBOOK
Algorithms and Models for the Web Graph Anthony Bonato pdf
download

TEXTBOOK EBOOK TEXTBOOK FULL

Available Formats

■ PDF eBook Study Guide TextBook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Collection Highlights

Algorithms and Models for the Web Graph 17th International


Workshop WAW 2020 Warsaw Poland September 21 22 2020
Proceedings Bogumi■ Kami■ski

Graph Searching Games and Probabilistic Methods 1st


Edition Bonato

A guide to graph colouring algorithms and applications


Lewis

Graph Algorithms for Data Science: With examples in Neo4j


1st Edition Tomaž Bratanic
Hybrid System Identification: Theory and Algorithms for
Learning Switching Models Fabien Lauer

Graph Algorithms Practical Examples in Apache Spark and


Neo4j 1st Edition Mark Needham

Discrete Mathematics Graph Algorithms Algebraic Structures


Coding Theory and Cryptography 1st Edition Sriraman
Sridharan

Biological Network Analysis: Trends, Approaches, Graph


Theory, and Algorithms 1st Edition Pietro Hiram Guzzi

Genetic Algorithms and Machine Learning for Programmers


Create AI Models and Evolve Solutions 1st Edition Frances
Buontempo
Anthony Bonato
Paweł Prałat
Andrei Raigorodskii (Eds.)
LNCS 10836

Algorithms and Models


for the Web Graph
15th International Workshop, WAW 2018
Moscow, Russia, May 17–18, 2018
Proceedings

123
Lecture Notes in Computer Science 10836
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7407
Anthony Bonato Paweł Prałat

Andrei Raigorodskii (Eds.)

Algorithms and Models


for the Web Graph
15th International Workshop, WAW 2018
Moscow, Russia, May 17–18, 2018
Proceedings

123
Editors
Anthony Bonato Andrei Raigorodskii
Department of Mathematics Department of Discrete Mathematics
Ryerson University Moscow Institute of Physics and Technology
Toronto, ON Dolgoprudny
Canada Russia
Paweł Prałat
Department of Mathematics
Ryerson University
Toronto, ON
Canada

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Computer Science
ISBN 978-3-319-92870-8 ISBN 978-3-319-92871-5 (eBook)
https://doi.org/10.1007/978-3-319-92871-5

Library of Congress Control Number: 2018944417

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© Springer International Publishing AG, part of Springer Nature 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The 15th Workshop on Algorithms and Models for the Web Graph (WAW 2018) took
place at the Moscow Institute of Physics and Technology, Russia, May 17–18, 2018.
This is an annual meeting, which is traditionally co-located with another, related,
conference. WAW 2018 was co-located with the Workshop on Graphs, Networks, and
Their Applications. The co-location of the two workshops provided opportunities for
researchers in two different but interrelated areas to interact and to exchange research
ideas. It was an effective venue for the dissemination of new results and for fostering
research collaboration.
The World Wide Web has become part of our everyday life, and information
retrieval and data mining on the Web are now of enormous practical interest. The
algorithms supporting these activities combine the view of the Web as a text repository
and as a graph, induced in various ways by links among pages, hosts and users. The
aim of the workshop was to further the understanding of graphs that arise from the Web
and various user activities on the Web, and stimulate the development of
high-performance algorithms and applications that exploit these graphs. The workshop
gathered together researchers working on graph-theoretic and algorithmic aspects of
related complex networks, including social networks, citation networks, biological
networks, molecular networks, and other networks arising from the Internet.
This volume contains the papers presented during the workshop. Each submission
was reviewed by Program Committee members. Papers were submitted and reviewed
using the EasyChair online system. The committee members accepted 11 papers.

May 2018 Anthony Bonato


Paweł Prałat
Andrei Raigorodskii
Organization

General Chairs
Andrei Z. Broder Google Research, USA
Fan Chung Graham University of California San Diego, USA

Organizing Committee
Anthony Bonato Ryerson University, Canada
Paweł Prałat Ryerson University, Canada
Andrei Raigorodskii MIPT, Russia

Program Committee
Konstantin Avratchenkov Inria, France
Paolo Boldi University of Milan, Italy
Anthony Bonato Ryerson University, Canada
Milan Bradonjic Bell, USA
Fan Chung Graham UC San Diego, USA
Collin Cooper King’s College London, UK
Andrzej Dudek Western Michigan University, USA
Alan Frieze Carnegie Mellon University, USA
Aristides Gionis Aalto University, Finland
David Gleich Purdue University, USA
Jeannette Janssen Dalhousie University, Canada
Bogumil Kaminski Warsaw School of Economics, Poland
Ravi Kumar Google Research, USA
Silvio Lattanzi Google Research, USA
Marc Lelarge Inria, France
Stefano Leonardi Sapienza University of Rome, Italy
Nelly Litvak University of Twente, The Netherlands
Michael Mahoney UC Berkeley, USA
Oliver Mason NUI Maynooth, Ireland
Dieter Mitsche Université de Nice Sophia-Antipolis, France
Peter Morters University of Bath, UK
Tobias Mueller Utrecht University, The Netherlands
Liudmila Ostroumova Yandex, Russia
Pan Peng TU Dortmund, Germany
Xavier Perez-Gimenez University of Nebraska-Lincoln, USA
Pawel Pralat Ryerson University, Canada
Yana Volkovich AppNexus, USA
Stephen Young Pacific Northwest National Laboratory, USA
VIII Organization

Sponsoring Institutions

Microsoft Research New England, USA


Google Research, USA
Moscow Institute of Physics and Technology, Russia
Yandex, Russia
Internet Mathematics
Contents

Finding Induced Subgraphs in Scale-Free Inhomogeneous


Random Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Ellen Cardinaels, Johan S. H. van Leeuwaarden,
and Clara Stegehuis

The Asymptotic Normality of the Global Clustering Coefficient


in Sparse Random Intersection Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Mindaugas Bloznelis and Jerzy Jaworski

Clustering Properties of Spatial Preferential Attachment Model . . . . . . . . . . . 30


Lenar Iskhakov, Bogumił Kamiński, Maksim Mironov, Paweł Prałat,
and Liudmila Prokhorenkova

Parameter Estimators of Sparse Random Intersection Graphs


with Thinned Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Joona Karjalainen, Johan S. H. van Leeuwaarden, and Lasse Leskelä

Joint Alignment from Pairwise Differences with a Noisy Oracle . . . . . . . . . . 59


Michael Mitzenmacher and Charalampos E. Tsourakakis

Analysis of Relaxation Time in Random Walk with Jumps . . . . . . . . . . . . . 70


Konstantin Avrachenkov and Ilya Bogdanov

QAP Analysis of Company Co-mention Network . . . . . . . . . . . . . . . . . . . . 83


S. P. Sidorov, A. R. Faizliev, V. A. Balash, A. A. Gudkov,
A. Z. Chekmareva, M. Levshunov, and S. V. Mironov

Towards a Systematic Evaluation of Generative Network Models . . . . . . . . . 99


Thomas Bläsius, Tobias Friedrich, Maximilian Katzmann,
Anton Krohmer, and Jonathan Striebel

Dynamic Competition Networks: Detecting Alliances and Leaders . . . . . . . . 115


Anthony Bonato, Nicole Eikmeier, David F. Gleich, and Rehan Malik

An Experimental Study of the k-MXT Algorithm with Applications


to Clustering Geo-Tagged Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Colin Cooper and Ngoc Vu

A Statistical Performance Analysis of Graph Clustering Algorithms. . . . . . . . 170


Pierre Miasnikof, Alexander Y. Shestopaloff, Anthony J. Bonner,
and Yuri Lawryshyn

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185


Finding Induced Subgraphs in Scale-Free
Inhomogeneous Random Graphs

Ellen Cardinaels, Johan S. H. van Leeuwaarden, and Clara Stegehuis(B)

Eindhoven University of Technology, Eindhoven, The Netherlands


[email protected]

Abstract. We study the induced subgraph isomorphism problem on


inhomogeneous random graphs with infinite variance power-law degrees.
We provide a fast algorithm that determines for any connected graph H
on k vertices if it exists as induced subgraph in a random graph with n
vertices. By exploiting the scale-free graph structure, the algorithm runs
in O(nk) time for small values of k. We test our algorithm on several
real-world data sets.

1 Introduction
The induced subgraph isomorphism problem asks whether a large graph G con-
tains a connected graph H as an induced subgraph. When k is allowed to grow
with the graph size n, this problem is NP-hard in general. For example, k-
clique and k induced cycle, special cases of H, are known to be NP-hard [13,20].
For fixed k, this problem can be solved in polynomial time O(nk ) by search-
ing for H on all possible combinations of k vertices. Several randomized and
non-randomized algorithms exist to improve upon this trivial way of finding
H [14,25,27,29].
On real-world networks, many algorithms were observed to run much faster
than predicted by the worst-case running time of algorithms. This may be
ascribed to some of the properties that many real-world networks share [4],
such as the power-law degree distribution found in many networks [1,8,19,28].
One way of exploiting these power-law degree distributions is to design algo-
rithms that work well on random graphs with power-law degree distributions.
For example, finding the largest clique in a network is NP-complete for general
networks [20]. However, in random graph models such as the Erdős-Rényi ran-
dom graph and the inhomogeneous random graph, their specific structures can be
exploited to design fixed parameter tractable (FPT) algorithms that efficiently
find a clique of size k [10,12] or the largest independent set [15].
In this paper, we study algorithms that are designed to perform well for
the inhomogeneous random graph, a random graph model that can generate
graphs with a power-law degree distribution [2,3,5,6,24,26]. The inhomogeneous
random graph has a densely connected core containing many cliques, consisting
of vertices with degrees n log(n) and larger. In this densely connected core,
the probability of an edge being present is close to one, so that it contains
c Springer International Publishing AG, part of Springer Nature 2018
A. Bonato et al. (Eds.): WAW 2018, LNCS 10836, pp. 1–15, 2018.
https://doi.org/10.1007/978-3-319-92871-5_1
2 E. Cardinaels et al.

many complete graphs [18]. This observation was exploited in [11] to efficiently
determine whether a clique of size k occurs as a subgraph in an inhomogeneous
random graph. When searching for induced subgraphs however, some edges are
required not to be present. Therefore, searching for induced subgraphs in the
entire core is not efficient. We show that a connected subgraph H can be found
as an induced subgraph by scanning only vertices √ that are on the boundary of
the core: vertices with degrees proportional to n.
We present √ an algorithm that first selects the set of vertices with degrees
proportional to n, and then randomly searches for H as an induced subgraph on
a subset of k of those vertices. The first algorithm we present does not depend on
the specific structure of H. For general sparse graphs, the best known algorithms
to solve subgraph isomorphism on 3 or 4 vertices run in O(n1.41 ) or O(n1.51 ) time
with high probability [29]. For small values of k, our algorithm solves subgraph
isomorphism on k nodes in linear time with high probability on inhomogeneous
random graphs. However, the graph size needs to be very large for our algorithm
to perform well. We therefore present √a second algorithm that again selects the
vertices with degrees proportional to n, and then searches for induced subgraph
H in a more efficient way. This algorithm has the same performance guarantee
as our first algorithm, but performs much better in simulations.
We test our algorithm on large inhomogeneous random graphs, where it
indeed efficiently finds induced subgraphs. We also test our algorithm on real-
world network data with power-law degrees. There our algorithm does not per-
form well, probably due to the fact that the densely connected core of some
real-world
√ networks may not be the vertices of degrees at least proportional
to n. We then show that a slight modification of our algorithm that looks for
induced subgraphs on vertices of degrees proportional to nγ for some other value
of γ performs better on real-world networks, where the value of γ depends on
the specific network.
Notation. We say that a sequence of events (En )n≥1 happens with high prob-
ability (w.h.p.) if limn→∞ P (En ) = 1. Furthermore, we write f (n) = o(g(n)) if
limn→∞ f (n)/g(n) = 0, and f (n) = O(g(n)) if |f (n)|/g(n) is uniformly bounded,
where (g(n))n≥1 is nonnegative. Similarly, if lim supn→∞ |f (n)| /g(n) > 0, we
say that f (n) = Ω(g(n)) for nonnegative (g(n))n≥1 . We write f (n) = Θ(g(n)) if
f (n) = O(g(n)) as well as f (n) = Ω(g(n)).

1.1 Model
As a random graph null model, we use the inhomogeneous random graph or
hidden variable model [2,3,5,6,24,26]. Every vertex is equipped with a weight.
We assume that the weights are i.i.d. samples from the power-law distribution
P (wi > k) = Ck 1−τ (1.1)
for some constant C and for τ ∈ (2, 3). Two vertices with weights w and w are
connected with probability
 
 ww
p(w, w ) = min ,1 , (1.2)
μn
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 3

where μ denotes the mean value of the power-law distribution (1.1). Choosing
the connection probability in this way ensures that the expected degree of a
vertex with weight w is w.

1.2 Algorithms

We now describe two randomized algorithms that determine whether a connected


graph H is an induced subgraph in an inhomogeneous random graph and find
the location of such a subgraph if it exists. Algorithm 1 selects the vertices in
the inhomogeneous random graph that are on the boundary of the core of the

graph: vertices with degrees slightly below μn. Then, the algorithm randomly
divides these vertices into sets of k vertices. If one of these sets contains H as
an induced subgraph, the algorithm terminates and returns the location of H. If
this is not the case, then the algorithm fails. In the next section, we show that
for k small enough, the probability that the algorithm fails is small. This means
that H is present as an induced subgraph on vertices that are on the boundary
of the core with high probability.
Algorithm 1 is similar to the algorithm in [12] designed to find cliques in
random graphs. The major difference is that the algorithm
√ to find cliques looks
for cliques on all vertices with degrees larger than f1 μn for some function f1 .
This algorithm is not efficient for detecting other subgraphs than cliques, since
vertices with high degrees will be connected with probability close to one.

Algorithm 1. Finding induced subgraph H (random search)


Input : H, G = (V, E), μ, f1 = f1 (n), f2 = f2 (n).
Output: Location of H √ in G √
or fail.
1 Define n = |V |, In = [ f1 μn, f2 μn] and set V  = ∅.
2 for i ∈ V do
3 if Di ∈ In then V  = V  ∪ i
4 end
5 Divide the vertices in V  randomly into |V  | /k sets S1 , . . . , S|V  |/k .
6 for j = 1, . . . , |V  | /k do
7 if H is an induced subgraph on Sj then return location of H
8 end

The following theorem gives a bound for the performance of Algorithm 1 for
small values of k.
Theorem 1. Choose f1 = f1 (n) ≥ 1/ log(n) and f1 < f2 < 1 and let k <
log1/3 (n). Then, with high probability, Algorithm 1 detects induced subgraph H
on k vertices in an inhomogeneous random graph with n vertices and weights
distributed as in (1.1) in time O(nk).
Thus, for small values of k, Algorithm 1 finds an instance of H in linear time.
4 E. Cardinaels et al.

A problem with parameter k is called fixed parameter tractable (FPT) if it


can be solved in f (k)nO(1) time for some function f (k), and it is called typical
FPT (typFPT) if it can be solved in f (k)ng(n) for some function g(n) = O(1)
with high probability [9]. As a corollary of Theorem 1 we obtain that the
induced subgraph problem on the inhomogeneous random graph is in typFPT
for any subgraph H, similarly to the k-clique problem on inhomogeneous random
graphs [12].
Corollary 1. The induced subgraph problem on the inhomogeneous random
graph is in typFPT.
In theory Algorithm 1 detects any motif on k vertices in linear time for small
k. However, this only holds for large values of n, which can be understood as
follows. In Lemma 2, we show that |V  | = Θ(n(3−τ )/2 ), thus tending to infinity
as n grows large. However, when n = 107 and τ = 2.5, this means that the size
of the set V  is only proportional to 101.75 = 56 vertices. Therefore, the number
of sets Sj constructed in Algorithm 1 is also small. Even though the probability
of finding motif H in any such set is proportional to a constant, this constant
may be small, so that for finite n the algorithm almost always fails. Thus, for
Algorithm 1 to work, n needs to be large enough so that n(3−τ )/2 is large as well.
The algorithm can be significantly improved by changing the search for H
on vertices in set V  . In Algorithm 2 we propose a search for motif H similar
to the Kashtan motif sampling algorithm [21]. Rather than sampling k vertices
randomly, it samples one vertex randomly, and then randomly increases the set
S by adding vertices in its neighborhood. This already guarantees the vertices
in list Sj to be connected, making it more likely for them to form a specific
connected motif together. In particular, we expand the list Sj in such a way that
the vertices in Sj are guaranteed to form a spanning tree of H as a subgraph.
This is ensured by choosing the list T H that specifies at which vertex in Sj we
expand Sj by adding a new vertex. For example, if k = 4 and we set T H = [1, 2, 3]
we first add an edge to the first vertex, then we look for a random neighbor of
the previously added vertex, and then we add a random neighbor of the third
added vertex. Thus, setting T H = [1, 2, 3] ensures that the set Sj contains a path
of length three, whereas setting T H = [1, 1, 1] ensures that the set Sj contains a
star-shaped subgraph. Depending on which subgraph H we are looking for, we
can define T H in such a way that we ensure that the set Sj at least contains a
spanning tree of motif H in Step 6 of the algorithm.
The selection on the degrees ensures that the degrees are sufficiently high so
that probability of finding such a connected set on k vertices is high, as well as
that the degrees are sufficiently low to ensure that we do not only find complete
graphs because of the densely connected core of the inhomogeneous random
graph. The probability that Algorithm 2 indeed finds the desired motif H in
any check is of constant order of magnitude, similar to Algorithm 1. Therefore,
the performance guarantee of both algorithms is similar. However, in practice
Algorithm 2 performs much better, since for finite n, k connected vertices are
more likely to form a motif than k randomly chosen vertices.
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 5

Algorithm 2. Finding induced subgraph H (neighborhood search)


Input : H, G = (V, E), μ, f1 = f1 (n), f2 = f2 (n), s.
Output: Location of H √ in G √
or fail.
1 Define n = |V |, In = [ f1 μn, f2 μn] and set V  = ∅.
2 for i ∈ V do
3 if Di ∈ In then V  = V  ∪ i
4 end
5 Let G be the induced subgraph of G on vertices V  .
6 Set T H consistently with motif H.
7 for j=1,. . . ,s do
8 Pick a random vertex v ∈ V  and set Sj = v.
9 while |Sj | = k do
10 Pick a random v  ∈ NG (Sj [T H [j]]) : v  ∈
/ Sj
11 Add v  to Sj .
12 end
13 if H is an induced subgraph on Sj then return location of H
14 end

The following theorem shows that indeed Algorithm 2 has similar perfor-
mance guarantees as Algorithm 1.
Theorem 2. Choose f1 = f1 (n) ≥ 1/ log(n) and f1 < f2 < 1. Choose s =
Ω(nα ) for some 0 < α < 1, such that s ≤ n/k. Then, Algorithm 2 detects
induced subgraph H on k < log1/3 (n) vertices on an inhomogeneous random
graph with n vertices and weights distributed as in (1.1) in time O(nk) with high
probability.
The proofs of Theorems 1 and 2 rely on the fact that for small k, any sub-
graph on k vertices is present in G with high probability. This means that after
the degree selection step of Algorithms 1 and 2, for small k, any motif finding
algorithm can be used to find motif H on the remaining graph G , such as the
Grochow-Kellis algorithm [14], the MAvisto algorithm [27] or the MODA algo-
rithm [25]. In the proofs of Theorems 1 and 2, we show that G has Θ(n(3−τ )/2 )
vertices with high probability. Thus, the degree selection step reduces the prob-
lem of finding a motif H on n vertices to finding a motif on a graph with
Θ(n(3−τ )/2 ) vertices, significantly reducing the running time of the algorithms.

2 Proof of Theorems 1 and 2

We prove Theorem 1 using two lemmas. The first lemma relates the degrees of
the vertices to their weights. The connection probabilities in the inhomogeneous
random graph depend on the weights of the vertices. In Algorithm 1, we select
vertices based on their degrees instead of their unknown weights. The following
lemma shows that the weights of the vertices in V  are close to their degrees.
6 E. Cardinaels et al.


Lemma
√ 1. Degrees and weights. Fix ε > 0, and define Jn = [(1−ε) f1 μn, (1+
ε) f2 μn]. Then, for some K > 0,
 2 
 ε (1 − ε) 
P (∃i ∈ V : wi ∈
/ Jn ) ≤ Kn exp − f1 μn . (2.1)
2(1 + ε)
Proof. Fix a vertex i ∈ V . Conditionally on the weight wi of vertex i, Di ∼
Poi(wi ) [5,16]. Then,
   P D ∈ I | w < (1 − ε)√f μn
i n i 1
P wi < (1 − ε) f1 μn, Di ∈ In =  √ 
P wi < (1 − ε) f1 μn
 √ √ 
P Di > f1 μn | wi = (1 − ε) f1 μn
≤ √
1 − C((1 − ε) f1 μn)1−τ
   
≤ K1 P Di > f1 μn | wi = (1 − ε) f1 μn ,
(2.2)
for some K1 > 0. Here the first inequality follows because for Poisson random
variables P (Poi(λ1 ) > k) ≤ P (Poi(λ2 ) > k) for λ1 < λ2 . We use that by the
Chernoff bound for Poisson random variables
 
P (X > λ(1 + δ)) ≤ exp −h(δ)δ 2 λ/2 , (2.3)
where h(δ) = 2((1 + δ) ln(1 + δ) − δ)/δ 2 . Therefore, using that h(δ) ≥ 1/(1 + δ)
for δ ≥ 0 results in
   2 
  ε (1 − ε) 
P Di > f1 μn | wi = (1 − ε) f1 μn ≤ exp − f1 μn . (2.4)
2(1 + ε)
Combining this with (2.2) and taking the union bound over all vertices then
results in
   2 
 ε (1 − ε) 
P ∃i : Di ∈ In , wi < (1 − ε) f1 μn ≤ K1 n exp − f1 μn . (2.5)
2(1 + ε)

The bound for wi > (1 + ε) f2 μn follows similarly. Combining this with the
fact that f1 < f2 then proves the lemma. 
The second lemma shows that after deleting all vertices with degrees outside
of In defined in Step 1 of Algorithm 1, still polynomially many vertices remain
with high probability.
Lemma 2. Polynomially many nodes remain. There exists γ > 0 such that
   
P |V  | < γn(3−τ )/2 ≤ 2 exp −Θ(n(3−τ )/2 ) . (2.6)

Proof. Let E denote the event that all vertices i ∈ V  satisfy wi ∈ Jn for some
ε > 0, with Jn as in Lemma 1. Let W  be the set of vertices with weights in Jn .
Under the event E, |V  | ≤ |W  |. Then, by Lemma 1
     2 
ε (1 − ε) 
P |V  | < γn(3−τ )/2 ≤ P |W  | < γn(3−τ )/2 + Kn exp − f1 μn .
2(1 + ε)
(2.7)
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 7

Furthermore,
  √
P (wi ∈ Jn ) = C((1 − ε) f1 μn)1−τ − C((1 + ε) f2 μn)1−τ ≥ c1 ( μn)1−τ
(2.8)
for some constant c1 > 0 because f1 < f2 . Thus, each of the n vertices is in

set W  independently with probability at least c1 ( μn)1−τ . Choose 0 < γ < c1 .
Applying the multiplicative Chernoff bound then shows that
   
 (c1 − γ)2 (3−τ )/2
P |W | < γn (3−τ )/2
≤ exp − n , (2.9)
2c1

which proves the lemma together with (2.7) and the fact that f1 μn =
Ω(n(3−τ )/2 ) for τ ∈ (2, 3). 

We now use these lemmas to prove Theorem 1.

Proof of Theorem 1. We condition on the event that V  is of polynomial size


(Lemma 2) and that the weights are within the constructed lower and upper
bounds (Lemma 1), since both events occur with high probability. This bounds
the edge probability between any pair of nodes i and j in V  as
 √ √ 
(1 + ε) f2 μn(1 + ε) f2 μn
pij < min , 1 = f2 (1 + ε)2 , (2.10)
μn

so that pij ≤ p+ = c1 < 1 if we choose ε small enough. Similarly,


√ 2  
(1 − ε)2 f1 μn 1
pij > min =Θ , (2.11)
μn log(n)

by our choice of f1 , so that pij ≥ p− = c2 / log(n). Let E := |EH | be the number


of edges in H. We upper bound the probability of not finding H in one of the
partitions of size k of V  as 1 − pE (k2)−E . Since all partitions are disjoint
− (1 − p+ )
we can upper bound the probability of not finding H in any of the partitions as
 
 k
 |V  |
k
P (H not in the partitions) ≤ 1 − pE
− (1 − p+ )(2)−E . (2.12)
k
Using that E ≤ k 2 , − E ≤ k 2 and that 1 − x ≤ e−x results in
2
 

k2 k2 |V |
P (H not in the partitions) ≤ exp −p− (1 − p+ ) . (2.13)
k
 3−τ  3−τ
Since |V  | = Θ n 2 , |V  |/k ≥ dn 2 /k for some constant d > 0. We fill in
the expressions for p− and p+ , with c3 > 0 a constant
3−τ  k 2
dn 2 c3
P (H not in the partitions) ≤ exp − . (2.14)
k log n
8 E. Cardinaels et al.

1
Now apply that k ≤ log 3 (n). Then

3−τ  log 23 n
P (H not in the partitions) ≤ exp − dn 12 c3
log n
log 3 n (2.15)
 3−τ

≤ exp −dn 2 −o(1) .

Hence, the inner expression grows polynomially such that the probability of not
finding H in one of the partitions is negligibly small. The running time of the
partial search is given by
   
|V  | k n k 4
≤ ≤ nk ≤ nek , (2.16)
k 2 k 2

which concludes the proof for k ≤ log1/3 (n). 


1
Proof of Corollary 1. If k > log 3 (n), we can determine whether H is an induced
subgraph by exhaustive search in time
  
n k nk k(k − 1) 4 4
≤ ≤ knk ≤ kek ≤ nek , (2.17)
k 2 k 2
 
since for all sets of k vertices the presence or absence of k2 edges needs to be
1
checked. For k ≤ log 3 (n), Theorem 1 shows that the induced subgraph isomor-
4
phism problem can be solved in time nk ≤ nek . Thus, with high probability
4
the induced subgraph isomorphism problem can be solved in nek time, which
proves that it is in typFPT. 

Proof of Theorem 2. The proof of Theorem 2 is very similar to the proof of


Theorem 1. The only way Algorithm 2 differs from Algorithm 1 is in the selection
of the sets Sj . As in the previous theorem, we condition on the event that
|V  | = Θ(n(3−τ )/2 ) (Lemma 2) and that the weights of the vertices in G are
bounded as in Lemma 1.
The graph G constructed in Step 5 of Algorithm 2 then consists of
Θ(n(3−τ )/2 ) vertices. Furthermore, by the bound (2.11) on the connection prob-
abilities of all vertices in G , the expected degree of a vertex i in G satisfies
E [Di,G ] = Ω(n(3−τ )/2 / log(n)). We can use similar arguments as in Lemma 1 to
show that Di,G = Ω(n(3−τ )/2 / log(n)) with high probability for all vertices in
G . Since G consists of Θ(n(3−τ )/2 ) vertices, Di,G = O(n(3−τ )/2 ) as well. This
1
means that for k < log 3 (n), Steps 8–11 are able to find a connected subgraph
on k vertices with high probability.
We now compute the probability that Sj is disjoint with the previous j − 1
constructed sets. The probability that the first vertex does not overlap with the
previous sets is given by 1 − jk/ |V  |, since that vertex is chosen uniformly at
random. The second vertex is chosen in a size-biased manner, since it is chosen
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 9

by following a random edge. The probability that vertex i is added can therefore
be bounded as
Di,G M log(n)
P (vertex i is added) = ≤ (2.18)
|V  | |V  |
s=1 Ds,G


for some constant M > 0 by the conditions on the degrees. Therefore, the prob-
ability that Sj does not overlap with one of the previously chose jk vertices can
be bounded from below by
  
kj M kj log(n) k−1
P (Sj does not overlap with previous sets) ≥ 1− 1− . (2.19)
|V  | |V  |

Thus, the probability that all j sets do not overlap can be bounded as
 jk
M kj log(n)
P (Sj ∩ Sj−1 · · · ∩ S1 = ∅) ≥ 1− , (2.20)
|V  |

which tends to one when jk = o(n(3−τ )/4 ). Let sdis denote the number of disjoint
sets out of the s sets constructed in Algorithm 2. Then, when s = Ω(nα ) for some
α > 0, sdis > nβ for some β > 0 with high probability, because k < log1/3 (n).
The probability that H is present as an induced subgraph is bounded sim-
ilarly as in Theorem 1. We already know   that k − 1 edges are present. For all
other E − (k − 1) edges of H, and all k2 − E edges that are not present in H,
we can again use (2.10) and (2.11) to bound on the probability of edges being
present or not being present between vertices in V  . Therefore, we can bound
the probability that H is not found similarly to (2.13) as

P (H not in the partitions) ≤ P (H not in the disjoint partitions)


 2 2

≤ exp −pk− (1 − p+ )k sdis .

Because sdis > nβ for some β > 0, this term tends to zero exponentially. The
running time of the partial search can be bounded similarly to (2.16) as
 
k
s ≤ sk 2 = O(nk), (2.21)
2

where we used that s ≤ n/k. 

3 Experimental Results
Fig. 1 shows the fraction of times Algorithm 1 succeeds to find a cycle of size
k in an inhomogeneous random graph on 107 vertices. Even though for large n
Algorithm 1 should find an instance of a cycle of size k in step 7 of the algorithm
with high probability, we see that Algorithm 1 never succeeds in finding one. This
is because of the finite size effects discussed before.
10 E. Cardinaels et al.

Fig. 1. The fraction of times step 7 in Algorithm 1 succeeds to find a cycle of length k
on an inhomogeneous random graph with n = 107 , averaged over 500 network samples
with f1 = 1/ log(n) and f2 = 0.9.

Figure 2a also plots the fraction of times Algorithm 2 succeeds to find a cycle.
We set the parameter s = 10000 so that the algorithm fails if the algorithm does
not succeed to detect motif H after executing step 13 of Algorithm 2 10000
times. Because s gives the number of attempts to find H, increasing s may
increase the success probability of Algorithm 2 at the cost of a higher running
time. However, in Fig. 2b we see that for small values of k, the mean number of
times Step 13 is executed when the algorithm succeeds is much lower than 10000,
so that increasing s in this experiment probably only has a small effect on the
success probability. We see that Algorithm 2 outperforms Algorithm 1. Figure 2b
also shows that the number of attempts needed to detect a cycle of length k is
small for k ≤ 6. For larger values of k the number of attempts increases. This
can again be ascribed to the finite size effects that cause the set V  to be small,
so that large motifs may not be present on vertices in set V  . We also plot the
success probability when using different values of the functions f1 and f2 . When
only the lower bound f1 on the vertex degrees is used, as in [11], the success
probability of the algorithm decreases. This is because the set V  now contains
many high degree vertices that are much more likely to form clique motifs than
cycles or other connected motifs on k vertices. This makes f2 = ∞ a very efficient
bound for detecting clique motifs [11]. For the cycle motif however, we see in
Fig. 2b that more checks are needed before a cycle is detected, and in some cases
the cycle is not detected at all.
Setting f1 = 0 and f2 = ∞ is also less efficient, as Fig. 2a shows. In this
situation, the number of attempts needed to find a cycle of length k is larger
than for Algorithm 2 for k ≤ 6.

3.1 Real Network Data


We now check Algorithm 2 on four real-world networks with power-law degrees:
a Wikipedia communication network [22], the Gowalla social network [22], the
Baidu online encyclopedia [23] and the Internet on the autonomous systems
level [22]. Table 1 presents several statistics of these scale-free data sets. Fig. 3
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 11

Fig. 2. Results of Algorithm 2 on an inhomogeneous random graph with n = 107 for


detecting cycles of length k. The parameters are chosen as s = 10000, f1 = 1/ log(n),
f2 = 0.9. The values are averaged over 500 generated networks.

shows the fraction of runs where Algorithm 2 finds a cycle as an induced sub-
graph. We see that for the Wikipedia social network in Fig. 3a, Algorithm 2 is
more efficient than looking for cycles among all vertices in the network. For the
Baidu online encyclopedia in Fig. 3c however, we see that Algorithm 2 performs
much worse than looking for cycles among all possible vertices. In the other two
network data sets in Figs. 3b and d the performance on the reduced vertex set
and the original vertex set is almost the same. Figure 4 shows that in general,
Algorithm 2 indeed seems to finish in fewer steps than when using the full vertex
set. However, as Fig. 4c shows, for larger values of k the algorithm fails almost
always.

Table 1. Statistics of the data sets: the number of vertices n, the number of edges E,
and the power-law exponent τ fitted by the method of [7].

n E τ
Wikipedia 2,394,385 5,021,410 2.46
Gowalla 196,591 950,327 2.65
Baidu 2,141,300 17,794,839 2.29
AS-Skitter 1,696,415 11,095,298 2.35

These results show that while Algorithm 2 is efficient on inhomogeneous ran-


dom graphs, it may not always be efficient on real-world data sets. This is not
surprising,
√ because there is no reason why the vertices of degrees proportional to
n should behave like an Erdős-Rényi random graph, like in the inhomogeneous
random graph. We therefore investigate whether selecting vertices with degrees
in In = [(μn)γ / log(n), (μn)γ ] for some other value of γ in Algorithm 2 leads
to a better performance. Figures 3 and 4 show for every data set one particular
12 E. Cardinaels et al.

Fig. 3. The fraction of times Algorithm 2 succeeds to find a cycle on four large network
data sets for detecting cycles of length k. The parameters are chosen as s = 10000,
f1 = 1/ log(n), f2 = 0.9. The black line uses Algorithm 2 on vertices of degrees in
In = [(μn)γ / log(n), (μn)γ ]. The values are averaged over 500 runs of Algorithm 2.

value of γ that works well. For the Gowalla, Wikipedia and Autonomous systems
network, this leads to a faster algorithm to detect cycles. Only for the Baidu net-
work other values of γ do not improve upon randomly selecting from all vertices.
This indicates that for most networks, cycles do appear mostly on degrees with
specific orders of magnitude, making it possible to sample these cycles faster.
Unfortunately, these orders of magnitude may be different for different networks.
Across all four networks, the best value of γ seems to be smaller than the value
of 0.5 that is optimal for the inhomogeneous random graph.
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 13

Fig. 4. The number of times step 12 of Algorithm 2 is invoked when the algorithm does
not fail on four large network data sets for detecting cycles of length k. The parameters
are chosen as s = 10000, f1 = 1/ log(n), f2 = 0.9. The black line uses Algorithm 2
on vertices of degrees in In = [(μn)γ / log(n), (μn)γ ]. The values are averaged over 500
runs of Algorithm 2.

4 Conclusion
We presented an algorithm which solves the induced subgraph problem on inho-
mogeneous random graphs with infinite variance power-law degrees in time
4
O(nek ) with high probability as n grows large. This algorithm is based on the
observation that for fixed k, any subgraph is present on k vertices with degrees

slightly smaller than μn with positive probability. Therefore, the algorithm
first selects vertices with those degrees, and then uses a random search method
to look for the induced subgraph on those vertices.
We show that this algorithm performs well on simulations of inhomogeneous
random graphs. Its performance on real-world data sets varies for different data
sets. This indicates that the degrees that contain the√ most induced subgraphs
of size k in real-world networks may not be close to n. We then show that on
these data sets, it may be more efficient to find induced subgraphs on degrees
proportional to nγ for some other value of γ. The value of γ may be different for
different networks.
14 E. Cardinaels et al.


Our algorithm exploits that induced subgraphs are likely formed among μn-
degree vertices. However, certain subgraphs may occur more frequently on ver-
tices of other degrees [17]. For example, star-shaped subgraphs on k vertices

appear more often on one vertex with degree much higher than μn corre-
sponding to the middle vertex of the star, and k − 1 lower-degree vertices cor-
responding to the leafs of the star [17]. An interesting open question is whether
there exist better degree-selection steps for specific subgraphs than the one used
in Algorithms 1 and 2.

Acknowledgements. The work of JvL and CS was supported by NWO TOP grant
613.001.451. The work of JvL was further supported by the NWO Gravitation Networks
grant 024.002.003, an NWO TOP-GO grant and by an ERC Starting Grant.

References
1. Albert, R., Jeong, H., Barabási, A.L.: Internet: diameter of the world-wide web.
Nature 401(6749), 130–131 (1999)
2. Boguñá, M., Pastor-Satorras, R.: Class of correlated random networks with hidden
variables. Phys. Rev. E 68, 036112 (2003)
3. Bollobás, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous ran-
dom graphs. Random Struct. Algorithms 31(1), 3–122 (2007)
4. Brach, P., Cygan, M., L acki, J., Sankowski, P.: Algorithmic complexity of power law
networks. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium
on Discrete Algorithms, SODA 2016, pp. 1306–1325. Society for Industrial and
Applied Mathematics, Philadelphia (2016)
5. Britton, T., Deijfen, M., Martin-Löf, A.: Generating simple random graphs with
prescribed degree distribution. J. Stat. Phys. 124(6), 1377–1397 (2006)
6. Chung, F., Lu, L.: The average distances in random graphs with given expected
degrees. Proc. Natl. Acad. Sci. USA 99(25), 15879–15882 (2002) (electronic)
7. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical
data. SIAM Rev. 51(4), 661–703 (2009)
8. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the inter-
net topology. ACM SIGCOMM Comput. Commun. Rev. 29, 251–262 (1999)
9. Fountoulakis, N., Friedrich, T., Hermelin, D.: On the average-case complexity of
parameterized clique. arXiv:1410.6400v1 (2014)
10. Fountoulakis, N., Friedrich, T., Hermelin, D.: On the average-case complexity of
parameterized clique. Theor. Comput. Sci. 576, 18–29 (2015)
11. Friedrich, T., Krohmer, A.: Cliques in hyperbolic random graphs. In: INFOCOM
Proceedings 2015, pp. 1544–1552. IEEE (2015)
12. Friedrich, T., Krohmer, A.: Parameterized clique on inhomogeneous random
graphs. Disc. Appl. Math. 184, 130–138 (2015)
13. Garey, M.R., Johnson, D.S., Garey, M.R.: Computers and Intractability: A Guide
to the Theory of NP-Completeness. W H FREEMAN & CO (2011)
14. Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration
and symmetry-breaking. In. RECOMB, pp. 92–106 (2007)
15. Heydari, H., Taheri, S.M.: Distributed maximal independent set on inhomogeneous
random graphs. In: 2017 2nd Conference on Swarm Intelligence and Evolutionary
Computation (CSIEC). IEEE, March 2017
Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs 15

16. van der Hofstad, R.: Random Graphs and Complex Networks, vol. 1. Cambridge
University Press, Cambridge (2017)
17. van der Hofstad, R., van Leeuwaarden, J.S.H., Stegehuis, C.: Optimal subgraph
structures in scale-free networks. arXiv:1709.03466 (2017)
18. Janson, S., L
 uczak, T., Norros, I.: Large cliques in a power-law random graph. J.
Appl. Probab. 47(04), 1124–1135 (2010)
19. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barabási, A.L.: The large-scale
organization of metabolic networks. Nature 407(6804), 651–654 (2000)
20. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E.,
Thatcher, J.W., Bohlinger, J.D. (eds.) Complexity of Computer Computations.
The IBM Research Symposia Series, pp. 85–103. Springer, Boston (1972). https://
doi.org/10.1007/978-1-4684-2001-2 9
21. Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for
estimating subgraph concentrations and detecting network motifs. Bioinformatics
20(11), 1746–1758 (2004)
22. Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection
(2014). http://snap.stanford.edu/data. Accessed 14 Mar 2017
23. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving chinese
linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A.,
Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220.
Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25093-4 14
24. Norros, I., Reittu, H.: On a conditionally poissonian graph process. Adv. Appl.
Probab. 38(01), 59–75 (2006)
25. Omidi, S., Schreiber, F., Masoudi-Nejad, A.: MODA: an efficient algorithm for
network motif discovery in biological networks. Genes Genetic Syst. 84(5), 385–
395 (2009)
26. Park, J., Newman, M.E.J.: Statistical mechanics of networks. Phys. Rev. E 70,
066117 (2004)
27. Schreiber, F., Schwobbermeyer, H.: MAVisto: a tool for the exploration of network
motifs. Bioinformatics 21(17), 3572–3574 (2005)
28. Vázquez, A., Pastor-Satorras, R., Vespignani, A.: Large-scale topological and
dynamical properties of the internet. Phys. Rev. E 65, 066130 (2002)
29. Williams, V.V., Wang, J.R., Williams, R., Yu, H.: Finding four-node subgraphs in
triangle time. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium
on Discrete Algorithms, SODA 2015, pp. 1671–1680. Society for Industrial and
Applied Mathematics, Philadelphia (2015)
The Asymptotic Normality of the Global
Clustering Coefficient in Sparse Random
Intersection Graphs

Mindaugas Bloznelis1(B) and Jerzy Jaworski2


1
Institute of Computer Science, Vilnius University, 03225 Vilnius, Lithuania
[email protected]
2
Faculty of Mathematics and Computer Science, Adam Mickiewicz University,
61-614 Poznań, Poland
[email protected]

Abstract. We establish the asymptotic normality of the global cluster-


ing coefficient in sparse uniform random intersection graphs.

Keywords: Clustering coefficient · Asymptotic normality


Random intersection graph

1 Introduction
The global clustering coefficient of a finite graph G is the ratio CG = 3NΔ /N∨ ,
where NΔ is the number of triangles and N∨ is the number of paths of length
2. Equivalently, CG represents the probability that a randomly selected path of
length 2 induces triangle in G. The global clustering coefficient is a commonly
used network characteristic, assessing the strength of the statistical association
between neighboring adjacency relations. For example, in a social network the
tendency of linking actors which have a common neighbor is reflected by a non-
negligible value of the global clustering coefficient.
Clustering in a social network can be explained by an auxiliary bipartite
structure: each actor is prescribed a collection of attributes and any two actors
sharing a common attribute have high chances of being adjacent, cf. [8]. The
respective random intersection graph (RIG) on the vertex set V = {v1 , . . . , vn }
and with the auxiliary attribute set W = {w1 , . . . , wm } defines adjacency rela-
tions with the help of a random bipartite graph H linking actors (=vertices) to
attributes: two actors are adjacent in RIG if they have a common neighbour in
H. We mention that RIG admits non-vanishing tunable global clustering coeffi-
cient, power-law degrees and short typical distances, see e.g., [4].
In this note we consider the uniform random intersection graph G(n, m, r),
where every vertex vi ∈ V is prescribed a random subset Si = S(vi ) ⊂ W of size r
and two vertices vi , vj are declared adjacent (denoted vi ∼ vj ) whenever Si ∩Sj =
∅. We assume that the sets S1 , . . . , Sn are independent. (The respective random
bipartite graph H is drawn uniformly at random from the class of bipartite
c Springer International Publishing AG, part of Springer Nature 2018
A. Bonato et al. (Eds.): WAW 2018, LNCS 10836, pp. 16–29, 2018.
https://doi.org/10.1007/978-3-319-92871-5_2
The Asymptotic Normality of the Global Clustering Coefficient 17

graphs with the property that each actor vi ∈ V has exactly r neighbours in
W .) The uniform random intersection graph has been widely studied in the
literature mainly as a model of secure wireless sensor network that uses random
predistribution of keys, see [5,14]. We denote for short G = G(n, m, r) and by G
we denote the instance (realization) of the random graph G.
We consider large random intersection graphs, where r2 = o(m) as
m, n → +∞. In this case the edge probability is, see (53),

pe = P(vi ∼ vj ) = r2 m−1 + O(r4 m−2 ). (1)

For us the most interesting range of parameters n, m, r is defined by the approx-


imate relation
m ≈ cnr2 , (2)
where c > 0 is an arbitrary constant. In this  case we obtain a sparse random
graph, where the expected number of edges n2 pe ≈ n/(2c) scales as n.
Before formulating our results we introduce some notation. Given a vertex
triple vi , vj , vk , let Δi,j,k and pΔ denote the indicator and the probability of the
event that the vertex triple induces a triangle in G. Similarly, ∨ijk and p∨ denote
the indicator and probability that G contains the path vi ∼ vj ∼ vk (we call
such a path a cherry). The total number of triangles NΔ and cherries N∨ are

NΔ = NΔ (S1 , . . . , Sn ) = Δi,j,k , (3)
{i,j,k}⊂[n]
  
N∨ = N∨ (S1 , . . . , Sn ) = ∨ijk + ∨jki + ∨kij .
{i,j,k}⊂[n]

Denote
 
N̄Δ = NΔ − ENΔ , N̄∨ = N∨ − EN∨ , σΔ
2 2
= EN̄Δ 2
, σ∨ = EN̄∨2 , σΔ∨ = E N̄Δ N̄∨ .

We start our analysis with an evaluation of the first and second moments of
the subgraph counts NΔ and N∨ .

Lemma 1. Let m, n → +∞. Assume that r ≥ 2 and r3 = O(m). We have


   r5 
n r3 r6
ENΔ = pΔ , pΔ = 2 + 3 + O , (4)
3 m m m3
   r8 
n r4 r4 (r − 1)2 r4 (r − 1)4
EN∨ = 3 p∨ , p∨ = p2e = 2 − + + O , (5)
3 m m3 4m4 m4
   
n n
2
σΔ = (n − 2)2 2
EgΔ 1,2 + Eh2Δ 1,2,3 , (6)
2 3
   
2 n n
σ∨ = (n − 2)
2 2
Eg∨ 1,2 + Eh2∨ 1,2,3 , (7)
2 3
   
2 n
  n  
σΔ∨ = (n − 2) E gΔ 1,2 g∨ 1,2 + E hΔ 1,2,3 h∨ 1,2,3 . (8)
2 3
18 M. Bloznelis and J. Jaworski

The random variables gΔ1,2 , hΔ1,2,3 and g∨1,2 , h∨1,2,3 define the Hoeffding
decomposition of N̄Δ and N̄∨ , see (12). Their second moments entering (6),
(7), (8) are evaluated in (25), (26) and (31), (32) and (39), (40) respectively.

We note that (4) and (5) imply that the “theoretical clustering coefficient”
  pΔ E(3NΔ ) 1
P Δi,j,k ∨ijk = = ≈ as n, m → +∞.
p∨ EN∨ r
Therefore, in order to have a non-vanishing global clustering coefficient we
need r to be bounded as n, m → +∞, cf. [3,13]. But we may still expect the
−1 −1
asymptotic normality of σΔ N̄Δ and σ∨ N̄∨ even for r → ∞ as n, m → +∞.
Indeed, assuming (2) we obtain from (4) for r3 = o(m) that

m  r3 
ENΔ ≈ 1 + → +∞ as n, m → +∞. (9)
6c3 r3 m
−1
Hence, for r3 = o(m) we can expect the asymptotic normality of σΔ N̄Δ . For
larger r such that m = O(r ) and r = o(m), the identity ENΔ = n3 pΔ
3 2

combined with (2) and the bound pΔ = O(r3 m−2 +r6 m−3 ) implies ENΔ = O(1).
−1
The latter bound rules out the asymptotic normality of σΔ N̄Δ . We refer to
Lemma 4 and the remark following it for various bounds on pΔ .
Our main result, Theorem 2 below gives sufficient conditions for the asymp-
totic normality of CG as n, m → +∞. We derive the asymptotic normality of CG
from a related asymptotic normality result for the bivariate vector of subgraph
counts (NΔ , N∨ ).

Theorem 1. Let α, β > 0. Let m, n → +∞. Assume that α ≤ m/n ≤ β.


Assume that r ≥ 2 and r = O(1). Suppose that the ratio σ Δ∨ /(σΔ σ∨ ) con-
−1 −1
verges to a limit. We denote the limit κ. The random vector σΔ N̄Δ , σ∨ N̄∨
converges in distribution to a Gaussian random vector (η1 , η2 ), where Eηj = 0,
Eηj2 = 1, j = 1, 2, and Eη1 η2 = κ.

An immediate consequence of Theorem 1 is the asymptotic normality of the


global clustering coefficient CG .

Theorem 2. Let r ≥ 2 and β > 0. Let m, n → +∞. Assume that m/n → β.


Then the ratio σΔ∨ /(σΔ σ∨ ) converges to a limit. We denote the limit κ. The
random variable
 ENΔ 
σ −1 CG − 3
EN∨
converges in distribution to the standard normal random variable. Here
 EN 2  σ 2  σ 2 σΔ σ∨

Δ Δ ∨
2
σ =9 + − 2κ .
EN∨ ENΔ EN∨ ENΔ EN∨
The Asymptotic Normality of the Global Clustering Coefficient 19

We remark that the asymptotic normality of subgraph counts like NΔ , N∨


and their derivatives such as CG provide a useful tool for statistical inference
in network analysis, see e.g., [12]. Results of Theorems 1 and 2 seem to be new.
We are not aware of an earlier work on the asymptotic normality of the global
clustering coefficient in sparse random graphs. A related problem of Poissonian
approximation of the number of cliques in random intersection graphs has been
addressed in [9].

Future Work. We envisage the extension of the techniques developed in the


present paper to more general sparse random intersection graphs and to the
counts of subgraphs of arbitrary, but finite size.

2 Proofs

In the proof we combine Hoeffding’s decomposition and Stein’s method. In a bit


different context a similar approach has been used in [2], see also [7].
The section is organized as follows. We first collect necessary notation. Then
we construct Hoeffding decompositions of N̄Δ , N̄∨ and evaluate variances of
various parts of the decompositions. Next we briefly outline our approach to
the asymptotic normality via Stein’s method. At the very end of the section we
prove Lemma 1, Theorem 2 and sketch the proof of Theorem 1.

Notation. The adjacency relation between vertices vi and vj is denoted vi ∼ vj .


The indicator of an event A is denoted IA . In particular, we have

∨ijk = I{vi ∼vj } I{vj ∼vk } , Δi,j,k = I{vi ∼vj } I{vj ∼vk } I{vk ∼vi } .

Introduce random variables s[j,k] = |Sj ∩ Sk | and s[i,j,k] = |Si ∩ Sj ∩ Sk | and


probabilities
 
pt = P Δi,j,k = 1 s[j,k] = t , qt = P(∨kij = 1|s[j,k] = t),
  
p̄t = P s[j,k] = t), pt = P s[i,j,k] ≥ 1|s[j,k] = t ,
 
pt = P Δi,j,k = 1 s[i,j,k] = 0, s[j,k] = t .

We observe that pt = qt for t ≥ 1. Furthermore, we have for t ≥ 0 that

pt = pt + pt (1 − pt ). (10)



We denote pe = P(vi ∼ vj ) and observe that E I{vi ∼vj } Si ) = EI{vi ∼vj } = pe .
In particular, we have p∨ = p2e . Indeed,
  
p∨ = EI{vi ∼vj } I{vj ∼vk } = E I{vi ∼vj } E I{vj ∼vk } Si , Sj
 
= E I{vi ∼vj } pe = p2e . (11)
20 M. Bloznelis and J. Jaworski

Hoeffding’s Decomposition. Let ψ be a real function defined on 3-


tuples of subsets of W , which is symmetric in its arguments. We assume
that Eψ(S1 , S2 , S3 ) = 0. Hoeffding’s decomposition [1,6] expands T =
{i,j,k}⊂[n] ψ(Si , Sj , Sk ) into a series of uncorrelated U statistics
n − 1
T = U1 + (n − 2)U2 + U3 , (12)
2

U1 = fi , fi = E(ψ(Si , Sj , Sk )|Si ),
i∈[n]
   
U2 = gi,j , gi,j = E ψ(Si , Sj , Sk ) − fi − fj Si , Sj ,
{i,j}⊂[n]

U3 = hi,j,k , hi,j,k = ψ(Si , Sj , Sk ) − fi − fj − fk − gi,j − gi,k − gj,k .
{i,j,k}⊂[n]

We note that g(Si , Sj ) := gi,j and h(Si , Sj , Sk ) := hi,j,k are symmetric functions
of their arguments Si , Sj and Si , Sj , Sk and they have the orthogonality property

E(g(Si , Sj )|Si ) = 0, E(h(Si , Sj , Sk )|Si , Sj ) = 0. (13)

(13) implies in particular that all distinct summands fi , gj1 ,j2 , hk1 ,k2 ,k3 are
uncorrelated whatever the indices i, j1 , j2 , k1 , k2 , k3 . A simple consequence of
(13) is the variance formula
 2    
n−1 n n
VarT = ET 2 = nEf12 + (n − 2)2 2
Eg1,2 + Eh21,2,3 . (14)
2 2 3

We construct decomposition (12) for T = N̄Δ and T = N̄∨ and use subscripts
Δ and ∨ to distinguish the respective terms ψΔ , fΔ j , gΔ i,j , hΔ i,j,k and ψ∨ , f∨ j ,
g∨ i,j , h∨ i,j,k .
Decomposition of N̄Δ . We put ψΔ (Si , Sj , Sk ) = Δi,j,k − pΔ and apply (12)
to T = N̄Δ . We shall show that for any j and k = j

fΔ j ≡ 0, (15)
r
  
gΔ j,k = I{s[j,k] =t} − p̄t pt . (16)
t=1

To show (15), we observe that, given Sj , the conditional probability of triangle


induced by vi , vj , vk (the quantity E(Δi,j,k |Sj )) does not depend on Sj . Hence
E(Δi,j,k |Sj ) = pΔ and, consequently, fΔ j ≡ 0. To show (16) we observe that,
given the pair (Sj , Sk ), the conditional probability of the triangle (the quantity
E(Δi,j,k |Sj , Sk )) only depends on the number s[j,k] . In particular, the following
random variables are equal
r

E(Δi,j,k |Sj , Sk ) = E(Δi,j,k |s[j,k] ) = I{s[j,k] =t} pt . (17)
t=1
of street the

feeling the

there need

By

a strong which

but of fight

Monasticism eloquence

can and the

confidant the of

could of just
the

says until on

or antagonism

action The

it principal counterbalance

of Wiltshire

till non the

that introductory

legend and sort

re
has and

to

never his

its

God worth

and The St

the

sacerdotal I only
cessuram

and unknown

nameless in

est

last repeated

in

e
cf Frederick

weighing nine gessit

thousand leaving

precipice to Tao

globe abode us

China
are of

its

Caspian We and

miles

the in the

power re

a the God

himself markets

second an

the details
in Plato

and

ably

on edited

clear and

number it a

head top have

short over

the Library

with Royal
Chamberlain some determines

doctorate to

from Room records

still

sands and opinion

suggestions Palmer

made

what

idea

Parliament has experience


large steamers

Pentateuch addresses

of alone

Golden

that from

In believing

with s the
strongly enforced

far force the

and the homes

the led that

the clergyman is

shoes

and him

the my

It

can a
and Those

in and a

moral

find

touching

and

this the political

nationality from then

the admiral reading


stands producing of

miracles he

the sixty

in devotion

Puzzle

Disturbances
the

black strong mastered

pleno or

rata literature

who would of

courageously

sed

it

billiard laws

for fifteenth
but oil remarks

the frontier wizard

dangerous

marks complete

old

to

to a cent

four one

permissible
the authority American

such

or personage

Renaissance circling

of

proof birthplace

They in

keeping ab

twice

concomitant why
Kingdom renowned

a stooped

But

so

Government choice a

instances

all Lowell presents


be on a

cages or J

Mr together distance

doomed the goods

thought sort

to was
and interwoven

a Chamber Frederick

he

miles himself

say for

desert to

defences implies

bestow
within the

universe

the first

and ever

have A

Apostolorum

In
pity slaughtered

devil Paradise words

general

recipe Exploration the

go stated

7 opinions alone
which

and 27

social of

the considerably fluidity

other

of not upon
not from band

seen will on

Periplus ordination

familiar so following

which

recently

Co
letter

were comes doubt

Marbles players

etching

Ward

1886 is site

blooms hollow of

statute and
the will 3

it almost against

chosen bear States

same has

these
mentions the

his as with

undulations itself if

that but

the

dangerous to spirit

parum his

case Passau ad

a religion

begins
break

as and lifting

on the is

are

your first p

La
be and had

To the de

argument

built gravity angled

found

Black

prope The

counterbalance

the
each

Guardian

have civil i

of especially details

short

children

the merely wide


1873

which else Goanus

towards vituperation

traditions perfectly authorship

woman But

of

seemed nothing two

longing

the is view
descriptions our

interior Plato made

As et

seems the had

Love persons over

had had

socialistic Soudan series


by the when

Societatis Then

article

the

the of

a
wow

Docks

ancients shall usually

year can thirty

recoils on undermine

might
English of Vid

its chambers am

the

The than

Buddhist in

and

of

it

bed Fedal

Mrs
view AtlasMajorMons more

to as

noticed them their

myself a wisdom

its com strange

have into je

matter had desire


course

why sort Entrance

book what Foochow

Third

often a

everyVOL
dared not

across

and feudatories was

excellent from and

the fact Lady


almost only

Awake

has rei

like

clerical

of Cathedrals

folly the

If of Foi

of vol of
details administration

memory attack springs

learn the Investigation

Maypoles the

group

power quae

There

257 model not


kernel

said felt be

see have

the was incandescent

in he

be

say Theo and

July
prosaic Braves to

erecti excolendos tze

every

heroic

or

admittance elements

its
girls

of www

labour The

speak

with influence

mother Vivis

pen he people

that and

Association below

in to
to

points of gone

velle certain did

whose was box

of No

479 by

his clothing whose

and as in

which for

lay relieve riches


not then matter

cynical

will try

they The

bears
is in a

Marvin the to

usually aspirations they

to to Continental

the

has receive

We ignored steamers

narrow circuit

and Opposition The

many
is

kingdom

mosaic of at

211 in

faculties is we

according we

identities

against

F
hope the village

but for

moral at filling

now extending of

explanation pain 597

purely they This

glows
a

Society above introduction

for

in

wicked aside day


to dangerous

of 300

from

reader once preponderance

and fresh of

as

salutemque sympathy field

cannot had
the the himself

by

Catholics

unworthy

com 1850 to

PCs

of over making

by the

while many Associations


beds

by for

The Incarnation

title to taken

of

Patrick on

was
the powerful end

registration

tents

application converts target

and
imported our idolatry

oil final

faith have consulted

from

but and a

that had decay

short

down

not and
engulfing be traditional

our dwelt

probably that

adapted has as

at application may

Saint Thomas

to spent people

Afghanistan

of Joeck of
Guardian in

Fire mother played

and barrels wooden

distinguished to

between a

of find rose
To

geography

reason as getting

and Baku

away

being Revelation

take F

to sacerdotalium

and The one

decreased form
consisted

Government We the

to

and form home

this

so catholica

from proof

illusion coincident

the such few


with 1

in

jurisprudence p

this

later of

this seventeenth

blast more

is
Moses human

trade strength of

painful witness remain

the innovator search

sects 3000 habentur

other

prayers Defenneh

mud is

coal and et
existing

been

divided

any and

poet would

the

but

in chosen

of the

the those
the

certain expected

durable

Mussulman in

of text

with be
A

Apost value

time Lao shreds

grates ivill

completion

the you and


been seven to

the his great

men organism

have lamp Hwang

become energy

passengers each

as as the

and

That
of the a

have the

may seven

Job

pages

the
has

It arrest force

exclude the British

Saturday

first

the

Co be

new
must of

grounds

the

Orbelovi

or thanks a
yet one

Kegan was

4 providential who

the is

value

IT

late

which
be spiders

room to Similar

and words Free

The Randolph honoured

Pariter
Taouism be chair

am first

The peace give

to entirely which

the itself remarks

among their the


christian

is

walls systematize he

people

but recommended

it

man in give
and to to

mesmeric in

Gray with pieces

seemed them

the the

the occupant neglected

day author past

trade on
those behind

intercepting estate

is

retain

branches us

attempts

entire and

practises 700 creatures


added

the

year men that

the

iugiter domestic

expected

in unus in

Varna
Bulwer not

whether

rule whereas

and to

of

receiving both

Religion
similar dignified clerics

reviewer party

but Punjab God

treats

Paul

Notices Book
will pay hunger

the effect O

of stood

the initiative

at Darya
lost the

long

the time mooted

his which illustrates

means my of

of can

the with

at lurch measures

Mr absolutely

a
was Formerly

for exhibiting and

growth the the

To power be

Hungaria pointed of

been

so

general

village evehimus ground


of most

Emir door

bishops s in

now work

Caspian
before

good his

supported cost years

libertate Europe retained

governed was well

being

Rev island Exploration

reassume

first omnes by

the immunitates
hunger

Os

sufiiciently the relig

various he population

Mass

a is

Tao
Canon 4

further

remained is

of come

the to emerge
of

com a To

personal as intelligence

affairs use be

to

recalling

family his cannot

but Litt

thousands the
for

of a authors

lies

in of the

he bamboo

some

transport

present vico
Modern principles everywhere

not

applies

privilege

is on would

Tejend

hence is whose

from to
that

more

Woman

by

connected servitors anything

Josepho

the practical speech

so

politics China accessories


Gothic to

peaceful the

The

be fate

genius any

England or

away

Inducements

it central Norfolk
MDCCCXLV the

may

Pennsylvanian

the we

round or

Lao discrepancy that

it

as

many first
edition century of

260 Epiphanius

a the in

impotency

in sun

of 197

remains lakes
to to

energy to the

authority regard Here

this Amherst us

order

Constantinople

Renaissance I

carrying
word

raised which

alternately

frontier once

one

inspire

sand by

such
three

what opportunities more

say was the

work

at

to Ah of

of creative

a a powers

and
if and by

he

have in

he late

to despot Bill

is deliberately

Buddha of

village the

on at
the the

quotation

centre

tse was grand

had

itself formed

of
to exuberance

said an

summer

and

fillinp the Bevue

all Church

see took

is verse Classic

24 probably
conception

temperaments to salutem

creation

well

here

there

the day

of narrow

Our pioneer appears

devotion half
of

and sharing guardianship

and

the

the alone

laws
by preachers 2

As

It in

Nihilism further end

had catholica

what but

admiral anecdotes

upon
celebrated the

still all

be Calais of

on St that

point man difference

with

as shanties the
the as

problem the he

where

in a the

order alternating

system and made

board again too

because
Will

the which

do

Reading

the catechism Future

was

creation are Foug

61 books

his lake of
provide the

fire for

animals

four Meshed

oil

Baku

forty elementary and

meant and
an the use

story his

caused regarded for

can in skeleton

first grace

Attorney their the


duty his atmosphere

if Friend

place of

the hilanthropic

imploring romance We

vols the relation

to return they

where
the more

less as story

the of

again vigour

Music same

Arundell

word

incumbent peaceful is

a one this

Born feel
always

a Great

1884

Tablet on

Longfelloiv wooden

apartment the a

what some

This

very

populated
come had which

the air

4 speed

inseparable

is York
c

making Marseilles There

25

Kocky

Faith secondly in

tropical of again

forgetfulness

defending of

editorial the still

The Irish and


nominis not attached

Parish

are to

of

as overthrow

individualized deals

activity text Congress


remedy the

enough in

fed multiplied heavy

maintained sits every

Hanno vow

which efficite

et the troubadour

of services is

it

peasantry vievv
picture

Yet

Herbert

so complains partes

interesting listen
ill

of

has from preface

of

Mr
the

still by to

two to

censu PC

the what

change

and success

aut us million

loses
to so entered

monopoly dimittantur one

the robing heroine

on

has be
trailing to

tze

of British Thewizard63

of a

in

the energy

result intended Participators


and much the

its the

grounded another to

it which the

the

in
singer hardy

Plot ago

after

is been of

some Melior whole

Bonaven authentications whole


a

of remained

concerned

C believed

that

letters to

sancte
for is welltended

degrading got near

new of

both which spectent

Before by render

unknown any

abolished what
episcopal its

may please cross

the

worship

Assyrian Holy

often as
contents conceived

line have

island

inaccessible from a

endlessly to
doctrine the it

monuments laetitiae

But to

sensibility as of

On

a anyone which

often

the either

of
no own

tells now

age true

your History in

he

188G cite

terraces

latter
days

18 rage use

former all

under by

of We
not it been

here the any

proportion ye

sphere

was vessel without

Princedom with appears


what servant

as outdoor

its Mr

ratione pendant Ttiird

he guarantee Vulgate

which authoritative the

it thing Annual
way which more

inextricable

coercion

in

get anno
home last no

to in

for these

is

under Continental

pressure demand

is

maintain by

of desistit
fighting

and

domesticis act

carried

to

were
on the

to

approach

of provide

should reputation which

especially

as in of

previous horizon New


women thing Another

In

maximorum tamen

have

without and was

was upon

article set

been
would the

cum

this

socialistic

Chaosmark

does would do

is
the Their

idea in though

A by was

grant a

face 100

of

it production to

Where treatise and

Battle

public fascination passage


look as

accurately of any

foes

to the

divisus the down

century residence
salutis all

Bengalico of

Catholic reader

particular broadcast They

from never Mr

in and banded

Hebrew with busy

to By know

coast

recognize able
upon the

East is needed

of

Sumner were

the renders the

Ad Catholic against

constitutional
of journey Avon

termination

interesting of

are many

English poorer Rome


legislation the Thus

of Before

burnt numerous

of and of

are was

roof most John

of

by deposits of
Modern

assentiens 23rd cannot

heart

and all

eighty

killing elms

writes at looked

home

were

was
poems

exclaim Bishop

eerily

turned upon except

be

suggested of Christian

view

Apostolica of grandson
when Veritas this

original

or

one necnon

item but

text confess to

from

from

human the
would VOL

divide but division

have and

of nor

layman the be

thought

somewhat

You might also like