arXiv preprint arXiv:2302.00360

Link streams offer a good model for representing interactions over time. They consist of links (*b*,*e*,*u*,*v*), where *u* and *v* are vertices interacting during the whole time interval [*b*,*e*]. In this paper, we deal with the problem of enumerating maximal cliques in link streams. A clique is a pair (*C*,[*t*0,*t*1]), where *C* is a set of vertices that all interact pairwise during the full interval [*t*0,*t*1]. It is maximal when neither its set of vertices nor its time interval can be increased. Some of the main works solving this problem are based on the famous Bron-Kerbosch algorithm for enumerating maximal cliques in graphs. We take this idea as a starting point to propose a new algorithm which matches the cliques of the instantaneous graphs formed by links existing at a given time *t* to the maximal cliques of the link stream. We prove its validity and compute its complexity, which is better than the state-of-the art ones in many cases of interest. We also study the output-sensitive complexity, which is close to the output size, thereby showing that our algorithm is efficient. To confirm this, we perform experiments on link streams used in the state of the art, and on massive link streams, up to 100 million links. In all cases our algorithm is faster, mostly by a factor of at least 10 and up to a factor of 10**4. Moreover, it scales to massive link streams for which the existing algorithms are not able to provide the solution.

]]>

EGC 2023, vol. RNTI-E-39, pp.139-150

Les flots de liens offrent un formalisme de description d’interactions au cours du temps. Un lien correspond à deux sommets qui interagissent sur un intervalle de temps. Une clique est un ensemble de sommets associé à un intervalle de temps durant lequel ils sont tous connectés. Elle est maximale si ni son ensemble de sommets ni son intervalle de temps ne peuvent être augmentés. Les algorithmes existants pour énumérer ces structures ne permettent pas de traiter des jeux de données réels de plus de quelques centaines de milliers d’interactions. Or, l’accès à des données toujours plus massives demande d’adapter les outils à de plus grandes échelles. Nous proposons alors un algorithme qui énumère les cliques maximales sur des réseaux temporels réels et massifs atteignant jusqu’à plus de 100 millions de liens. Nous montrons expérimentalement qu’il améliore l’état de l’art de plusieurs ordres de grandeur.

]]>

mercredi 25 janvier 2023, 26-00/332 LIP6, Sorbonne Université

A 2-hop labeling (a.k.a. hub labeling) consists in assigning to each node of a graph a small subset of nodes called hubs so that any pair of nodes have a common hub lying on a shortest path joining them. Such labelings appeared to provide a very efficient representation of distances in practical road network where surprisingly small hub sets can be computed. A graph parameter called skeleton dimension allows to explain this behaviour. Connecting any two nodes through a common hub can be seen as a 2-hop shortest path in a super-graph of the original graph. A natural extension is to consider more hops and is related to the notion of hopsets introduced in parallel computation of shortest paths. Surprisingly, a 3-hop construction leads to a data-structure for representing distances which is asymptotically both smaller and faster than 2-hop labeling in graphs of bounded skeleton dimension. Another natural question is to ask whether 2-hop labelings can be efficient more generally in sparse graphs. Unfortunately, this is not the case as there exists bounded degree graphs that require quasi-linear average hub set size. The construction of such difficult graphs is related to the construction of dense graphs with n nodes that can be decomposed into n induced matchings as introduced by Ruzsa and Szemerédi in the seventies.

]]>The 15th Latin American Theoretical Informatics Symposium (LATIN 2022)

In this paper we study a model of Schr ̈oder trees whose labelling is increasing along the branches. Such tree family is useful in the context of phylogenetic. The tree nodes are of arbitrary arity (i.e. out-degree) and the node labels can be repeated throughout different branches of the tree. Once a formal construction of the trees is formalized, we then turn to the enumeration of the trees inspired by a renormalisation due to Stanley on acyclic orientations of graphs. We thus exhibit links between our tree model and labelled graphs and prove a one-to-one correspondence between a subclass of our trees and labelled graphs. As a by-product we obtain a new natural combinatorial interpretation of Stanley’s renormalising factor. We then study different combinatorial characteristics of our tree model and finally, we design an efficient uniform random sampler for our tree model which allows to generate uniformly Erdös-Renyi graph with a constant number of rejections on

average.

]]>

ALENEX 2023

Listing triangles is a fundamental graph problem with many applications, and large graphs require fast algorithms. Vertex ordering allows the orientation of edges from lower to higher vertex indices, and state-of-the-art triangle listing algorithms use this to accelerate their execution and to bound their time complexity. Yet, only basic orderings have been tested. In this paper, we show that studying the precise cost of algorithms instead of their bounded complexity leads to faster solutions. We introduce cost functions that link ordering properties with the running time of a given algorithm. We prove that their minimization is NP-hard and propose heuristics to obtain new orderings with different trade-offs between cost reduction and ordering time. Using datasets with up to two billion edges, we show that our heuristics accelerate the listing of triangles by an average of 38% when the ordering is already given as an input, and 16% when the ordering time is included.

]]>

lundi 17 janvier 2022, LIP6, Sorbonne Université

Graphs are ubiquitous tools to represent networks, may they be networks modeling data from neurosciences, sociology, molecular biology, chemistry, etc. A cornerstone of the analysis of graphs is the Laplacian matrix L that encodes their structure. From a linear algebra point-of-view, the analysis of L offers fundamental insights on key properties of the graph: the diffusion speed of an information or a disease on a network, the vulnerability of a network to targeted or random attacks, the redundancy of certain parts of the network, the network’s structure in more or less independent modules, are all examples of characteristics of a network one may extract from the Laplacian matrix.

In this work, we concentrate on two specific problems that often arise in the context of graph-based data: i/ compute inverse traces of the form Tr( (L+qI)^(-1) ), ii/ compute smoothing operations of the form (L+qI)^(-1) y where q>0 and y some vector defined over the nodes of the graph. These two problems arise in many well-known graph-based algorithms, such as semi-supervised learning, label propagation, graph Tikhonov regularization, graph inpainting, etc.

In the context of large graphs, the required inverse which scales as O(n^3) in the worst-case, is often too expensive in practice. Many approaches have been developed in the state-of-the-art to circumvent this problem: polynomial approximation and (preconditioned) conjugate gradient are the two most well-known.

In this work, we develop a new class of techniques based on random spanning forests. We show that these forests are natural candidates to provide original, efficient, and easy-to-implement estimators.

This is joint work with Pierre-Olivier Amblard, Luca Avena, Simon Barthelmé, Alexandre Gaudillière and Yusuf Yigit Pilavci

]]>arXiv:2209.12062

In order to manage massive graphs in practice, it is often necessary to resort to graph compression, which aims at reducing the memory used when storing and processing the graph. Efficient compression methods have been proposed in the literature, especially for web graphs. In most cases, they are combined with a vertex reordering pre-processing step which significantly improves the compression rate. However, these techniques are not as efficient when considering other kinds of graphs. In this paper, we focus on the class of bipartite graphs and adapt the vertex reordering phase to their specific structure by proposing a dual reordering scheme. By reordering each group of vertices in the purpose of minimizing a specific score, we show that we can reach better compression rates. We also suggest that this approach can be further refined to make the node orderings more adapted to the compression phase that follows the ordering phase.

]]>

ASONAM 2022

Abstract—Measuring the influence of users in social networks is key for numerous applications. A recently proposed influence metric, coined as $\psi$-score, allows to go beyond traditional centrality metrics, which only assess structural graph importance, by further incorporating the rich information provided by the posting and re-posting activity of users. The $\psi$-score is shown in fact to generalize PageRank for non-homogeneous node activity. Despite its significance, it scales poorly to large datasets; for a network of N users it requires to solve N linear systems of equations of size N. To address this problem, this work introduces a novel scalable algorithm for the fast approximation of $\psi$-score, named Power-$\psi$. The proposed algorithm is based on a novel equation indicating that it suffices to solve one system of equations of size N to compute the $\psi$-score. Then, our algorithm exploits the fact that such system can be recursively and distributedly approximated to any desired error. This permits the $\psi$-score, summarizing both structural and behavioral information for the nodes, to run as fast as PageRank. We validate the effectiveness of the proposed algorithm on several real-world datasets.

]]>

28th colloquium GRETSI, Nancy (France), 2022

A link stream is a set of triplets (t, u, v) modeling interactions over time and their effective analysis is key for numerous applications. They are traditionally studied via signal processing and graph theory approaches, which allow to study their dynamical and structural properties. However, current techniques do not allow to accurately reveal the frequency-structure patterns contained in them. To overcome this limitation, this work introduces a novel decomposition for link streams. Our decomposition analyses the time dimension via traditional signal dictionaries, like Fourier or wavelets, and the structural dimension via a new decomposition for graphs that we tailored to analyze sequences of graphs. We show that our decomposition allows to naturally design filters that can recover specific structures with specific frequencies.

]]>

June 27th, 2022, 11am

Room : 24-25/405

Data streams pose several challenges for learning algorithms, including mainly, but not limited to, restricted resources (in terms of memory and processing time), high-dimensionality, and concept drift constraints. To process these evolving data, we need efficient and accurate techniques and strategies, such as window models, summarization techniques, and other manners to restrict the storage to a part of — and/or synopsis information from — the stream instead of maintaining it in its entirety. This talk will present how such challenges can be addressed and how we can reduce machine learning algorithms’ resource costs while maintaining good accuracy.

]]>