Evaluation of community detection methods
See also Cluster validation for general clustering validation.
Internal and external evaluation are the primary ways to evaluate and compare methods. Internal evaluation uses a quality measure (e.g., Modualrity) to compare and evaluate different Community detection methods. External evaluation uses either synthetic or real data.
When using synthetic data (“benchmark” approach), we plant Community structure into a network based on a model of community structure and then ask whether a given method can recover the planted communities. This is of course much cleaner than working with real data. However, one should note that we are planting the communities based on a model of such structure and therefore it will not be fair to compare methods that are based on different models of community structure. For instance, if we plant partitions based on a Stochastic block model, we are assuming that each community is more or less a random graph, and thus does not exhibit local structure (e.g., clustering). Therefore, methods that rely on local clustering (e.g., Clique percolation or Link community) would not work, whereas Stochastic block modelbased community detection would work best. Likewise, if we plant pervasively overlapping community structure that is cliquish, Clique percolation would work much better than Stochastic block model or Modularity optimization methods.
Furthermore, there is also the issue of comparing identified community structure to the planted partition (see Clustering comparison). The most common approach is using Normalized mutual information or adjusted Rand index. However, these measures have critical biases (see Gates2019element).
It is also common to use metadata of realworld networks to evaluate community detection methods. For instance Choobdar2019assessment performed a systematic evaluation of many methods based on their usefulness in biological tasks. Yet, as shown in Peel2017ground and Hric2014community, it is usually not a good idea to consider metadata as “ground truth” communities. Still, evaluation methods that use metadata such as Gene set enrichment analysis are most commonly used methods in application of community detection.
Given “ground truth” (or rather reference clustering), Mutual information is often used. An important, overlooked question is how good the ground truth is. Some work about tagged networks^{1}^{2}^{3}.
Overlapping community structure makes the problem harder. For instance, Mutual information should be extended to covers. There are two proposals^{4}^{5}. There are other suggestions: using conductance of boundary nodes^{6}.
Measures
Statistical significance of communities: http://www.pnas.org/content/108/18/7321.short ; http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0018961
An approach to use data^{7}.
References
 http://www.cse.chalmers.se/~moradi/Lic/SEA.pdf (http://sea2012.labri.fr/index.php?n=Main.Papers)
 http://arxiv.org/abs/1006.0375  Information theoretic model validation for clustering
How can we define statistical significance? What will be the proper null models for community structure?
 http://pre.aps.org/abstract/PRE/v81/i4/e046110  Statistical significance of communities in networks
 http://arxiv.org/abs/1012.2363  Finding statistically significant communities in networks
 http://prl.aps.org/abstract/PRL/v105/i22/e220601  Significance Analysis and Statistical Mechanics: An Application to Clustering
 http://arxiv.org/abs/1110.0305  Significant communities in large sparse networks
There are approaches to use online social network services such as Facebook (http://fellowsexp.com/).
 Comparing network covers using mutual information
 Defining and Evaluating Network Communities based on Groundtruth

Community detection: effective evaluation on large social networks

A revisit to evaluating accuracy of community detection using the normalized mutual information

Palla, Gergely; Farkas, Illés J; Pollner, Péter; Derényi, Imre; Vicsek, Tamás (2008). “Fundamental statistical features and selfsimilar properties of tagged networks”. New Journal of Physics 10 (12): 123026. doi:10.1088/13672630/10/12/123026. ISSN 13672630 ↩

Pollner, Péter; Palla, Gergely; Vicsek, Tamás (2010). “Clustering of taginduced subgraphs in complex networks”. Physica A: Statistical Mechanics and its Applications 389 (24): 5887–5894. doi:10.1016/j.physa.2010.09.012. ISSN 03784371. ↩

Tibély, Gergely; Pollner, Péter; Vicsek, Tamás; Palla, Gergely (2012). “Ontologies and tagstatistics”. New Journal of Physics 14 (5): 053009. doi:10.1088/13672630/14/5/053009. ISSN 13672630. ↩

Lancichinetti, Andrea; Fortunato, Santo; Kertész, János (2009). “Detecting the overlapping and hierarchical community structure in complex networks”. New Journal of Physics 11 (3): 033015. doi:10.1088/13672630/11/3/033015. ISSN 13672630. ↩

“Template:Citation error”. http://arxiv.org/abs/1202.0425. ↩

“Evaluating Overlapping Communities with the Conductance of their Boundary Nodes”. http://arxiv.org/abs/1206.3992. ↩

“Benchmarking community detection methods on social media data”. http://arxiv.org/abs/1302.0739. ↩