Big Data: Big Problem

Big Data, Big Obstacles:

After decades of fretting over declining response rates to traditional surveys (the mainstay of 20th-century social research), an exciting new era would appear to be dawning thanks to the rise of big data. Social contagion can be studied by scraping Twitter feeds; peer effects are tested on Facebook; long-term trends in inequality and mobility can be assessed by linking tax records across years and generations; social-psychology experiments can be run on Amazon’s Mechanical Turk service; and cultural change can be mapped by studying the rise and fall of specific Google search terms. In many ways there has been no better time to be a scholar in sociology, political science, economics, or related fields.

However, what should be an opportunity for social science is now threatened by a three-headed monster of privatization, amateurization, and Balkanization. A coordinated public effort is needed to overcome all of these obstacles.
This article is written by Dalton Conley, the author of the Intro to Sociology textbook I use, and one of my favorite sociologists. It's also signed by several other prominent social scientists.

But their point, and it's been something bugging me for awhile now, is that as more and more social media users demand privacy, their activities become increasingly the provincial domain of the service they are using. And social scientists have no way to access this very important data for doing social scientific research.

Unlike the old days Conley mentions, when private companies like Bell Labs were eager to share their data with academics in order to understand their users, markets and culture better, most social media companies today take an arrogant (if not ignorant) proprietary view of their "in-house" information. Or in Silicon Valley geek/nerd-speak, "Bruh no way."
Although some data can be culled from the web—Twitter feeds and Google searches—other data sit behind proprietary firewalls. And as individual users tune up their privacy settings, the typical university or independent researcher is increasingly locked out. Unlike federally funded studies, there is no mandate for Yahoo or Alibaba to make its data publicly available. The result, we fear, is a two-tiered system of research. Scientists working for or with big Internet companies will feast on humongous data sets—and even conduct experiments—and scholars who do not work in Silicon Valley (or Alley) will be left with proverbial scraps.
Which is a polite way of saying, unless you're willing to jump in bed with these social media companies and submit to their terms and conditions regarding data, you're screwed.
Today, public investment in science is waning as federal budgets are cut and states do not fill in the funding gaps for their flagship research universities. Meanwhile, the average corporation has been transformed by the shareholder-value revolution to be much more concerned with short-term profits and thus increasingly oriented away from basic research. Social science conducted at Foursquare or Yahoo typically must serve the bottom line.

Hand in hand with the privatization of data is the amateurization of their analysis. Does time on Facebook really make us more depressed, as one recent study has claimed? Well, maybe. But perhaps it is just that depressed people spend more time alone, logging on in their dark rooms.  Trained social scientists are needed to deal with such big-data pitfalls as reverse causality, or others such as unobserved heterogeneity, sample-selection issues, aggregation bias, or spatial or temporal autocorrelation. 
Worse, the "social scientists" employed by big social media companies like Facebook tend to be the very definition of amateurishness and shoddy methodology.
Currently, many firms employ some well-trained social and behavioral scientists free to pursue their own research; likewise, some companies have programs by which scholars can apply to be in residence or work with their data extramurally. However, as Facebook states, its program is "by invitation only and requires an internal Facebook champion."
LOL. Go Facebook! Maybe that explains the sinister disaster they perpetrated on their users last summer running their "Emotional Manipulation Study" by untrained dopes like this.

As long as your results support Facebook (or fill in the blank) you're good to go. Otherwise, no soup for you.
To be clear, we are not advocating the abandonment of nationally representative, long-running scientific treasures like the Panel Study of Income Dynamics, the National Election Study, or the General Social Survey; we think connecting such studies to other, novel forms of data only serves to strengthen them. We are not naïve about other perils of social science in the era of big data, including privacy breaches, but we are certain that such disasters (and others) are more likely to befall us if social scientists are not active participants in the big-data revolution.
You can add my name to the list of signatories if you wish.

