posted on December 14, 2012 11:02
What is network science?
Network science is an emerging discipline that uses tools from pure mathematics (such as graph theory, matrix algebra and Markov processes) and from statistics to investigate relationships between entities. The basic idea is that many systems can be described in terms of interacting parts; a network representation of a system breaks it down into the entities themselves, called “nodes”, and the pairwise relationships across nodes, called “edges” or “arcs”. This tool is extremely general. For example, the physical internet can be represented by a network whose nodes are computers and whose edges are the cables connecting them to each other; an ecosystem can be modeled by another network, with species as nodes and feeding patterns as edges; a group of humans (or even society as large) by networks that have individuals as nodes and various kind of social and economic relationships as edges. All network-type representation emphasizes patterns of interaction; their mathematical underpinning allows to build model capable of capturing sophisticated system behavior starting from very simple building blocks.
Network science has assembled itself in a comparatively short time – about 15 years. At the time of writing, it is undergoing very fast growth in publications, applications, tools for empirical analysis and number of practitioners. This happens at a time in which the sudden availability of large datasets for cheap or free is pushing previously “soft” sciences like sociology, psychology and (to a lesser extent) economics onto a more and more quantitative-centered approach. Network science is being used extensively by all these disciplines (as well as by physics, biology and computer science), and looks set to play an important role in both natural and social sciences for the coming decade.
Why does it matter for public policy?
Many public policy problems involve information flows, collaboration patterns, financial plumbing etc. that lend themselves well to be looked at in terms of relationships (long-term) or interaction (short-term) between entities. Networks allow the modeler to place the pattern of such relationship at center stage. Given that the actors of public policy live in the social world (as opposed to the natural science world), social networks are especially important. For example, every time we consider co-production of public services, networks are a natural candidate for modeling them. Can health care costs be reduced by increasing the role of patients in diagnosis and treatment? That’s a social network problem. Can public projects be evaluated by members of the public? Social network problem.
So far, network science has been mainly an analytical tool for policy makers – and even applied analysis is still in its infancy. Nevertheless, the first applications are beginning to emerge.
The development policy world is currently exploring an approach to economic development in terms of networks of products (Hidalgo and Hausmann, 2009).
The idea is to rethink the production function; from something that accepts money and labor as an input and turns them into money to something that produces all kinds of good and services starting from all kind of material and immaterial resources. This is formalized as a bipartite network (resources going into products, with some of the products in turn being resources for other products). This network is then treated in several ways to infer measures of fitness of a local economy (associated with diversity) and predict patterns of development (products for with most of the resources are already available are better candidates, as new industries, than products for with most resources are not produced locally). The United Nations Development Programme in one of the institutions using this approach to guide policy.
Social network analysis can be used to monitor projects that involve online collaboration between citizens and institutions. This supposedly yields insights on how to design collaboration environments that are scalable (they don’t collapse under their own weight as participation increase), and sustainable (after bootstrapping, the collaborative process keeps going with only small inputs of additional work by the institutions themselves. The Council of Europe has prototyped formal monitoring by social network analysis in its Edgeryders project in 2012 (Designing collective intelligence – video, 15 mins).
Finally, the European University Institute is developing a social experiment in which a microfinance program in Eastern Ghana would be augmented by an online social network of entrepreneurs. The hypothesis being tested is the impact of the availability of finance on economic development can be greatly augmented by the availability of information in the form of social models and peer-to-peer support.
Existing tools and what’s missing
Tools for network analysis exist, and keep getting better. The main general-purpose desktop applications are Pajek, Gephi and Tulip; all three are supported by open source communities of developers. Gephi and Tulip both share a plugin-based architecture, and that makes them quite flexible (for example, Gephi has a streaming plugin, so that the program can be run off a server and fed real-time data over the Internet. Most modules in Gephi also come as a Java library for easier reuse). Other tools are more specific, like NetLogo for simulation or CFinder for community detection.
These tools have vastly democratized network analysis, and allowed it to tackle much larger networks than was possible in the pioneering phase. However, some features are found lacking when social scientists try to use them for policy-oriented work. I can point to at least two.
- the time dimension is typically not treated if not at the descriptive level. The origin of network analysis in pure mathematics has led to prioritizing static analysis of a graph as a pure mathematical object. For example, many network metrics are scale-dependent: they cannot be used to compare networks of different sizes, and therefore they cannot be used to comparing a growing network to its own past. This is a clear limitation for economists and sociologists operating in a big data landscape, who are typically interested in gauging the evolution of networks that change over time. More research is needed to develop indicators that incorporate explicitly the time dimension; capture them with efficient algorithm; and develop them into software modules.
- the treatment of probability is also a problem. Econometricians tend to build forecasts by applying probability theory to their data. In the context of a network, that would mean modeling the probability that node i forms a link to node j at time t, conditional on that link having been formed before. It is not clear that this is a computationally tractable problem; in order to bring it under control, efficient algorithms must be invented.
A final note. The European Center for Living Technology at University of Venice has launched Masters of Networks, a workshop for policy makers and network scientists to get together and try to attack some public policy problems using the knowledge of both communities. If you care about this, you might consider participating.