Using statistical models, network metrics and semi-structured interviews to analyze a friendship network in goSupermodel

Master's defence by Casper Petersen, DIKU

It takes place: 16 August, 2012 at 15:30 in room 24.5.62, Njalsgade 128


As the stock of online social network sites have grown, so too has the interest from academia intrigued by their affordances, reach and the prospect of data on the use of such sites in quantities unheard of no more than a decade ago. Analysis of social network sites are predominantly descriptive in nature and are often confined to use data from the large, mainstream web sites catering to a heterogenous audience as they are the only ones offering up some of their data for public scrutiny. In this thesis, I address these shortcomings by studying a sampled friendship network constructed from data taken directly from the databases of goSupermodel – an online social network site targeting a homogenous audience (preadolescent girls, ages 9 to 14) – and conduct, not only a descriptive, but an inferential and qualitative analysis of this network.

The inferential analysis uses a new method developed by Clauset et (a) fit fifteen statistical models to the observed empirical distribution of friends, (b) quantitatively asses these fits using goodness-of-fit tests and (c) use the most probable models to posit and test hypotheses aiming at identifying what gameplay element, or combination of game- play elements, would give rise to the empirical distribution of friends observed. The descriptive analysis employs nine network metrics – from centrality, over group-level to global metrics – to investigate the topology of the friendship network and compare the findings to those of six other social networks. Based on the findings, a set of hypotheses are posited and tested. In the qualitative analysis, nine active users of goSupermodel were interviewed to find out (a) how players in goSupermodel makes friends in the game and (b) to investigate whether any support for the quantitative findings could be found.

The results of the inferential analysis showed that negative binomial, log-normal, exponential and gamma models all provided quantitative good fits to the empirical distribution of friends, but none of the hypotheses formulated on the basis of these models could be verified, due to (a) a rigid evaluation criteria and (b) lack of agreement between which statistical models were assessed the most probable using the adopted goodness-of-fit tests. The descriptive analysis showed that the friendship network, due to being sampled, was different from the remaining net- work when comparing centrality metrics, but the network shared some similarity with a Slashdot frends/enemy network and a Gnutella peer-to-peer network when comparing the k-core decom- position and when evaluating the strength of the correlation between centrality metrics, using Spearman’s ρ and Pearson’s r. The hypotheses posited on the basis of this analysis were all verified, but in many cases the strength of the correlation was weak and alternative explana- tions were plentiful. The qualitative hypotheses found partial evidence to support aspects of the quantitative hypotheses, but in many cases the results either conflicted with previous similar research or where inconclusive due to poor execution of the interviews and the general reliability and validity concerns.

Supervisor: Jakob Grue Simonsen, DIKU

Censor: Troels Andreasen, RUC