Evaluation of community detection methods

See also Cluster validation for general clustering validation.

Internal and external evaluation are the primary ways to evaluate and compare methods. Internal evaluation uses a quality measure (e.g., Modualrity) to compare and evaluate different Community detection methods. External evaluation uses either synthetic or real data.

When using synthetic data (“benchmark” approach), we plant Community structure into a network based on a model of community structure and then ask whether a given method can recover the planted communities. This is of course much cleaner than working with real data. However, one should note that we are planting the communities based on a model of such structure and therefore it will not be fair to compare methods that are based on different models of community structure. For instance, if we plant partitions based on a Stochastic block model, we are assuming that each community is more or less a random graph, and thus does not exhibit local structure (e.g., clustering). Therefore, methods that rely on local clustering (e.g., Clique percolation or Link community) would not work, whereas Stochastic block model-based community detection would work best. Likewise, if we plant pervasively overlapping community structure that is cliquish, Clique percolation would work much better than Stochastic block model or Modularity optimization methods.

Furthermore, there is also the issue of comparing identified community structure to the planted partition (see Clustering comparison). The most common approach is using Normalized mutual information or adjusted Rand index. However, these measures have critical biases (see Gates2019element).

It is also common to use metadata of real-world networks to evaluate community detection methods. For instance Choobdar2019assessment performed a systematic evaluation of many methods based on their usefulness in biological tasks. Yet, as shown in Peel2017ground and Hric2014community, it is usually not a good idea to consider metadata as “ground truth” communities. Still, evaluation methods that use metadata such as Gene set enrichment analysis are most commonly used methods in application of community detection.

Given “ground truth” (or rather reference clustering), Mutual information is often used. An important, overlooked question is how good the ground truth is. Some work about tagged networks123.

Overlapping community structure makes the problem harder. For instance, Mutual information should be extended to covers. There are two proposals45. There are other suggestions: using conductance of boundary nodes6.


Statistical significance of communities: http://www.pnas.org/content/108/18/7321.short ; http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0018961

An approach to use data7.


How can we define statistical significance? What will be the proper null models for community structure?

There are approaches to use online social network services such as Facebook (http://fellows-exp.com/).

  1. Palla, Gergely; Farkas, Illés J; Pollner, Péter; Derényi, Imre; Vicsek, Tamás (2008). “Fundamental statistical features and self-similar properties of tagged networks”. New Journal of Physics 10 (12): 123026. doi:10.1088/1367-2630/10/12/123026. ISSN 1367-2630 

  2. Pollner, Péter; Palla, Gergely; Vicsek, Tamás (2010). “Clustering of tag-induced subgraphs in complex networks”. Physica A: Statistical Mechanics and its Applications 389 (24): 5887–5894. doi:10.1016/j.physa.2010.09.012. ISSN 03784371. 

  3. Tibély, Gergely; Pollner, Péter; Vicsek, Tamás; Palla, Gergely (2012). “Ontologies and tag-statistics”. New Journal of Physics 14 (5): 053009. doi:10.1088/1367-2630/14/5/053009. ISSN 1367-2630. 

  4. Lancichinetti, Andrea; Fortunato, Santo; Kertész, János (2009). “Detecting the overlapping and hierarchical community structure in complex networks”. New Journal of Physics 11 (3): 033015. doi:10.1088/1367-2630/11/3/033015. ISSN 1367-2630. 

  5. “Template:Citation error”. http://arxiv.org/abs/1202.0425. 

  6. “Evaluating Overlapping Communities with the Conductance of their Boundary Nodes”. http://arxiv.org/abs/1206.3992. 

  7. “Benchmarking community detection methods on social media data”. http://arxiv.org/abs/1302.0739.