This Is Big: California’s Open Data Portal

While driving to Sacramento last week for the official unveiling of California’s new health Open Data Portal, I found myself thinking about the scientists who first gathered at the University of California, Santa Cruz, in 1985 to discuss sequencing the human genome. That was almost 30 years ago, and they clearly knew they were on to something big, but they couldn’t have known how big.

We now know that the results — which are continuously unfolding — have so far helped us understand diseases like cancer, produce new medicines, and make personalized health care a reality. Long before the term “big data” was coined, the genome project opened a door to a future where almost unlimited quantities of information exist.

But information has to be accessible and usable, and that is the purpose of the Open Data Portal. It makes information already collected by the government available to the public and provides tools to visualize and download them. This means that massive quantities of valuable information — some of which had been located in paper files and some in electronic formats that were difficult to find and use — are now open to the public and free for anyone to use. By releasing only data that are aggregated and de-identified, the portal also serves to protect privacy.

At the outset, the portal includes statistics from the California Department of Public Health (CDPH), including birth profiles, poverty rates, West Nile virus prevalence, ER asthma hospitalizations by zip code and age, and the locations of thousands of health care facilities, such as hospitals, clinics, and nursing homes.

As time goes on, the portal will offer more and more data sets. Soon data from other departments in the California Health and Human Services Agency, including OSHPD (California Office of Statewide Health Planning and Development), will be added. We will have easier access to information on how well our health care system and individual facilities are working. Eventually, it may allow us to spot problems in the system before they grow to full-blown crises.

Anyone who doubts the utility of transparency has only to remember what happened a few years back when California counties began publicly releasing health and safety code information about restaurants. In many counties, the data were turned into letter grades that were required to be posted prominently at the restaurant. Maps were created by entrepreneurial types. The knowledge had an immediate impact on peoples’ eating habits, and one study showed a measurable reduction in food-borne illnesses in Los Angeles. Score one for freeing the data.

Crucially, the portal’s resources will enable us to track and understand disparities in health across our large and diverse state. This is a less traditional use of data — and one that echoes CHCF’s core mission of helping improve health and health care for all Californians. To understand disparities — by gender, age, ethnicity, income, and more — requires robust data collection and the ability to analyze it in multiple ways. We will be able to look beyond care in hospitals and clinics to examine impacts on health such as environment, geography, and social factors. When we come up with programs to enhance some aspect of health, public data analysis, distribution, and visualization will show us if we’re making a dent in the trends.

So I am excited about the launch of California Open Data Portal. Like the genome scientists coming together in 1985, we’re here at the start of something big. I foresee researchers, policymakers, clinicians, and students using it to explore gaps in health and health care. We can be sure that entrepreneurs and health technology experts will use the data to create mobile apps to help solve current problems and ones we haven’t yet identified.

Many data geeks are lining up to use the portal already, but I urge everyone to take a look inside and see what ideas are sparked that can make and keep our state a healthy place for everyone.