Thursday, July 27, 2017

Putting some parameters on what we don't yet know

From Measuring Social Connectedness by Michael Bailey, Ruiqing (Rachel) Cao, Theresa Kuchler, Johannes Stroebel, and Arlene Wong. The abstract:
We introduce a new measure of social connectedness between U.S. county-pairs, as well as between U.S. counties and foreign countries. Our measure, which we call the "Social Connectedness Index" (SCI), is based on the number of friendship links on Facebook, the world's largest online social networking service. Within the U.S., social connectedness is strongly decreasing in geographic distance between counties: for the population of the average county, 62.8% of friends live within 100 miles. The populations of counties with more geographically dispersed social networks are generally richer, more educated, and have a higher life expectancy. Region-pairs that are more socially connected have higher trade flows, even after controlling for geographic distance and the similarity of regions along other economic and demographic measures. Higher social connectedness is also associated with more cross-county migration and patent citations. Social connectedness between U.S. counties and foreign countries is correlated with past migration patterns, with social connectedness decaying in the time since the primary migration wave from that country. Trade with foreign countries is also strongly related to social connectedness. These results suggest that the SCI captures an important role of social networks in facilitating both economic and social interactions. Our findings also highlight the potential for the SCI to mitigate the measurement challenges that pervade empirical research on the role of social interactions across the social sciences.
An interesting step forward in measuring and understanding social networks and their consequence.

Not a criticism of the researchers approach but a matter of curiosity.
How many people, as personal individuals, use how many social networking sites?

How do they use those sites and for what purpose and duration?

What is the measured correlation for those individuals between the networks on the different social networking platforms they use?
For example, I use Facebook, LinkedIn, Twitter, and Pinterest. Pinterest I use only to store images, not as a social networking platform. LinkedIn I use almost solely to keep track of colleagues with only the lightest of social networking engagement. Twitter I use for information access. Facebook I use to keep track of family and friends from my youth when I was growing up overseas. There would be almost no networking element to Pinterest or Twitter. The degree of network overlap between my LinkedIn network and my Facebook network would be minimal.

I think what the researchers has done is interesting but I also think there are a lot more points of consideration to take into account.

Tyler Cowen summarizes the key findings as:
1. For the population of the average county, 62.8% of friends live within 100 miles.

2. Over distances of less than 200 miles, the elasticity of friends to distance is about – 2.0, and about – 1.2 for distances greater than 200 miles.

3. Conditional on distance, social connectedness is significantly stronger within state lines.

4. “Counties with a higher social capital index have less geographically concentrated social networks.”

5. Social connectedness predicts trade flows, even after controlling for distance, and it also predicts patent citations.
All interesting but I am not sure what it tells us that we don't already know. Each one of these observations can be argued into a pretzel. For example, "Social connectedness predicts trade flows" might as easily be "Trade flow predicts social connectedness," i.e. the flow of causation might be the reverse of that implied. You do business and then you create relationships.

Personally, I am also concerned about the county-based approach to the analysis. After the election in 2016, there was a slew of maps looking at voting patterns by county. And that is useful to an extent. The drawback is that counties have a high standard deviation in population size ranging from a few dozens of people to nearly ten million (Los Angeles). By looking only at county level voting, you end up with a map that is a vast sea of red (Republicans) and a few small lakes of blue (Democrats). It is an interesting and useful perspective up to a point but is not a complete perspective when the election was within a couple of points.

Kudos to Bailey et al for focusing on the measurement of networks but I think we are at the very beginning of an immense field of inquiry and all early findings will be just a matter of sketching the terrain for later detailed exploration.

Thanatos Savehn (in the comments) puts it more bluntly.
Actually, you can specify your model first and then run a test to generate some data. If the data fits well you’re discovered something (e.g. ) Sadly, only risk takers will look into an urn full of hypotheses, each with a prior probability of being true of 0.001 or less, pull out a promisingly interesting one and risk his reputation, time and future income by testing it. And most academics are by nature not risk takers. So instead they find some data, put their creative abilities into piecing it all together into a coherent and marketable (helpful, clever, weird, PC, etc.) narrative and sell it to the gullible public.

No comments:

Post a Comment