Validation of Economic Networks

Complex networks typically display patterns that involve multiple nodes, such as motifs, hierarchies and communities. These structures can be detected as the properties of the empirical system that deviate from a benchmark model. Maximum entropy models, in particular, provide appropriate null models by constraining some properties of the network and randomising everything else. In this way, they embody a null hypothesis that these constraints can explain any other property of the system. To clarify the concept, take for example a maximum entropy null model defined by constraining property X observed in the empirical network. If the model also reproduces property Y then we know right away that Y is just a statistical consequence of X. Otherwise if Y is not reproduced, the null hypothesis is rejected and we can statistically validate Y as a salient feature of the network that is independent of X. This also implies that the empirical network is out of the statistical equilibrium of the maximum entropy ensemble defined by X.

Statistical validation is particularly useful in the context of bipartite networks, where nodes are divided into two sets such that links exist only between nodes of different sets. This is the natural representation for many system, such as a network of individuals connected with the groups they are affiliated with, or a network of financial institutions and the assets that form their portfolios. In these system, to determine the similarity of two nodes of the same set, the simplest approach is to project the network and count how many common neighbors these nodes have in the other set. However, this measure can be easily influenced by single-node variables; for instance, nodes that have high degree (many connections) in the bipartite network are likely to have more common neighbors than low-degree nodes. 

To be more concrete, let's consider a network of five portfolios (P) and four asset classes (C) drawn in the figure. Portfolio P1 has two common neighbors with portfolio P2 and just one common neighbor with either P3 and P4, so the naive conclusion is that P1 has the highest overlap with P2. However, we note that P1 owns two asset classes while P2 owns all four of them, hence the overlap P1-P2 is always two (whatever the asset classes owned by P1). On the other hand, the overlap of P1 with a single-asset portfolio can be one, as in the case of P3, or zero, as for P4. We say that the expected value of such overlap is 1/2, which is smaller than the observed overlap P1-P3. So which overlap between P1-P2 and P1-P3 is more "significant"

To answer this question we can build on the maximum entropy model that constrain node degrees for both sets of the bipartite network, known as the Bipartite Configuration Model (BiCM). In this paper we showed how to use the BiCM to numerically compute the null probability distribution of the network projection, which allows to assess the statistical significance of the similarity values between nodes of the same set that are observed for a real network. 

Example bipartite network of portfolios and asset classes, together with its projection of the set of portfolios.

Portfolio overlaps and systemic risk

In the same paper we applied this method to study the trends of common asset holdings by financial institutions in the period 1999-2013. Portfolio overlap represents an important channel for financial contagion, according to the following mechanism. If one fund is in trouble it will sell some of its assets, causing the devaluation of the latter and therefore losses for other funds that had invested in the same assets. The larger the portfolio overlap the higher these losses, which may cause these funds to sell their assets in turn, and so on. Such a mechanism has the potential to trigger fire sales and severe losses at the systemic level. Indeed we found that the proportion of significant overlaps increased steadily before the global financial crisis of 2007–2008 and reached a maximum when the crisis occurred, implying that systemic risk from fire sales liquidation was maximal at that time. We further showed that market trends tend to be amplified in the portfolios with validated overlaps, such that it is possible to have an informative signal about institutions that are about to suffer (enjoy) the most significant losses (gains).

Average degree of institutions in the validated network as a function of time. The vertical line correspond to the date in which we observe the maximum total market value in the dataset just before prices started to fall during the financial crisis.

Economic Complexity, relatedness and the Innovation System

In this other paper we applied similar ideas in another context, namely to bipartite networks of countries economies and the various activities (scientific publication, patenting, and industrial production in different sectors) in which they are proficient. We put together and project different dataset to build a holistic view of the innovation system as a multilayer network of interactions among these activities. In the field of Economic Complexity, the common neighbors of two activities take the name of co-occurrences and are used as proxy of the overlap between the capabilities required to achieve a competitive advantage in both activities. In this situation the BiCM-induced null model represents the hypothesis that activities are independent and there is no capability structure behind the network, so that co-occurrences happen at random, some more likely than others just because of the ubiquity of activities and the diversification of countries. We showed that significant co-occurrences allow to identify which capabilities and prerequisites are needed to be competitive in a given activity, and even measure how much time is needed to transform, for instance, the technological know-how into economic wealth and scientific innovation, being able to make predictions with a very long time horizon. 

Schematic construction of the innovation system as a multilayer network of co-occurrences between activities, obtained from projecting the bipartite country-activity networks.

Naturally, statistical validation has an some intrinsic degrees of freedom, such as the formulation of the null model and the significance level employed. Notably, in this paper we proposed a meta-validation approach that allows to identify model-specific significance thresholds for which the signal is strongest, and at the same time to obtain results independent of the way in which the null hypothesis is formulated. 

Resources