Evolution of Scientific Citation Networks

Preferential attachment (PA), also known as cumulative advantage or "the rich get richer" effect, is a process for which some form of wealth is distributed among a number of individuals according to how much they already have. In the context of networks, the PA rule is used to design a simple evolution mechanism: nodes receive new connections with a probability proportional to their current degrees (the number of links they have already received). 

The principal reason for scientific interest in PA is that it can generate power law distributions. However the basic model has important drawbacks: for instance, it predicts a strong relation between a node's age and its degree (the so-called "first-mover advantage"), which plays a fundamental role for the emergence of scale free topologies in the model. However this feature is rather unrealistic for several real systems. Take for instance the case of citation networks: as scientists well know, an often-cited work typically attracts more citations, but classic treatises from Einstein or Hawking, for example, are no longer cited as they once were; on the other hand, brand new papers that represent the state-of-the-art of a field can gain popularity very quickly.

In this paper we introduce a network evolution mechanism where a broad degree distribution does not result from strong time bias in the system. The model assumes that nodes are endowed with a heterogeneous attractiveness (or fitness) that decays with time, which we name relevance. To support our hypothesis of decaying relevance we consider citation data provided by the American Physical Society (APS). We find that the empirical relevance of papers indeed decays with time after their publication. Moreover the distribution of the total relevance (and thus of the initial fitness) is heterogeneous, showing an exponential decay in the tail.

We thus build a model where decaying and heterogeneous relevance combined with degree determine the probability that a node receives new links. Using an master equation approach in the continuous time limit, we show that the expected final degree of a node depends only on its total relevance. Then if the distribution of total relevances is exponential (as indicated by data), the resulting degree distribution is a power-law, with the same exponent of the original PA model. 

Time decay of the average relevance values for papers divided into groups according to their final citation count.
Distribution of the total relevance in the APS data.
Simulation results of the model with exponentially-decaying relevance and initial values drawn from an exponential distribution.

Besides shedding light on the formation process of citation networks, the model can be used to assess the metrics typically used to measure the scientific impact of individual researchers. In this paper we use bibliometric data artificially generated through a model of citation dynamics based on the relevance mechanism and calibrated on empirical data. The use of such a controlled setup has the advantage of avoiding the biases present in real databases, and allows us to assess which aspects of the model dynamics and which traits of individual researchers a particular indicator actually reflects. We find that the simple average citation count of the authored papers performs well in capturing the intrinsic scientific ability of researchers, whatever the length of their career. On the other hand, when productivity complements ability in the evaluation process, the notorious H and G indices reveal their potential, yet their normalized variants do not always yield a fair comparison between researchers at different career stages. Notably, the use of logarithmic units for citation counts allows us to build simple indicators with performance equal to that of H and G.