Neal D. Goldstein, PhD, MBI

About | Blog | Books | CV | Data | Lab

Feb 7, 2015

Civic Hacking for Public Health

A friend of mine has recently become involved with a movement known as "civic hacking." For the uninitiated, me included, one definition posted by Code for America is "Collaborating with others to create, build, and invent open source solutions using publicly-released data, code, and technology to solve challenges relevant to our neighborhoods, our cities, our states, and our country."

After reading this it occurred to me: That's what we do (been doing) in public health. Granted, the term may be new and trendy, but the concept of collaboration in science is well established. For example, see this article I coauthored on collaboration in autism research. Another example is this "hackathon" event to develop graphical resources for infectious disease epidemiology. The end product of a collaboration may be a publication or presentation (particularly in academics), or some kind of tool or resource, with the intention of moving the field forward. Often the code used in the analysis (particularly in a methodologically demanding piece) can be obtained from the authors or as an online appendix to benefit others working on similar problems. And sometimes, perhaps not all that rare, the research may be self-serving (and just as valuable).

Regardless of whether you're doing this for altruistic or individual reasons, there are some great minds working on real public health problems that at the end of the day may very well impact you. As a public health professional, most likely you have already personally worked with publicly released data (e.g., NHANES) or have used an existing tool that mines big data (e.g., Google Flu Trends; now defunct). For the non-public health professional, the knowledge that these public data exist and are freely available may be the impetus to begin working in this field.

As I began to brainstorm the possibilities using my friends coding expertise and my knowledge of public health, I realized there is a plethora of public source of health and health related data that we can tap into, and at no cost. Further, the greater population of civic hackers may not be aware of these data sources. Some of these need to be downloaded as stand-alone datasets, others provide an API for tapping into them on the fly. I've already previously written about geocoding and mapping to census geographies in the journal Epidemiology (R code available for download as an eAppendix to both articles); one example perhaps of epidemiologic civic hacking (wow that's a mouthful). I also intend to write a blog post here in the future that is more of an all-encompassing how-to guide building on the methods detailed in each of the two research letters I link to above.

Therefore, this post serves as my initial list of these data (at least the major sources), and while I'm using it to keep track for my own needs, I thought this list might benefit other civic hackers. This is intended as a starting point, or for use to generate ideas, and probably won't be maintained over time (unless there is demanding interest). The categories are for convenience, and are not mutually exclusive. Please also note that many of these data use sophisticated sampling methodologies that must be taken into account for valid analysis and inference -- when in doubt consult an epidemiologist or biostatistician. There may also be IRB requirements. When in doubt, contact the owners of these data for specific data-use requirements.

Lastly, if you're interested in collaborating on something, let's talk.

Clinical Trials


Census Bureau

  • American FactFinder: Population, housing, economic and geographic information aggregated by census geographies.
  • TIGER products: Geographical boundary data that can readily be linked to other census information.

Population-based Health Surveys


  • Public APIs: API list for access to a variety of data and programming tools.

Cite: Goldstein ND. Civic Hacking for Public Health. Feb 7, 2015. DOI: 10.17918/goldsteinepi.

About | Blog | Books | CV | Data | Lab