Despite the relatively flat budget in recent years, the National Institutes of Health (NIH) is undergoing changes which will better equip the agency to collect and analyze large biomedical data sets. The cancelation of the NIH’s National Children’s Study (NCS) and the continued investment in the Big Data to Knowledge (BD2K) initiative is indicative of the agency’s efforts to adapt towards recent technological advancements in big data over older more conventional means of collecting biomedical data.
In December of 2014, the NIH announced its decision to terminate the NCS program, which has cost $1.3 billion since 2000. The NCS was envisioned as a means of discovering connections between the environmental factors that appear during an individual’s upbringing and an assortment of medical conditions that could manifest later during adulthood (Tozi & Wayne, 2014), and would have collected data on the development of 100,000 children until they reached the age of 21. However, the NCS program was plagued with a series of problems which led to its termination, including a failure to incorporate new technologies developed over the program’s 14 year period. NIH Director Francis Collins explained, “Science and technology overtook the study's designers, who weren't able to incorporate developments such as social media or data collection using mobile devices into their plans.” The NIH’s new Big Data to Knowledge (BD2K) initiative, on the other hand, will seek to capitalize on recent advancements in data analytics and data management.
The BD2K is estimated to cost $656 million through the year 2020, and these funds will go towards developing new tools, software, and procedures to more cost effectively provide biomedical data to researchers. Philip E. Bourne, NIH Associate Director for Data Science, elaborated on the challenges associated with collecting biomedical data, “The future of biomedical research is about assimilating data across biological scales from molecules to population. As such, the health of each one of us is a big data problem. Ensuring that we are getting the most out of the research data that we fund is a high priority for NIH.” NIH hopes that studies which are fully able to capitalize on large data sets will enhance the predictability of medical conditions such as heart attacks and breast cancer, as well as improve current treatment and preventative measures. A total of 12 centers will be established, of which 11 will be responsible for researching broadly applicable aspects of big data science, such as analysis of genomic data, data integration and use, and managing data from electronic health records (NIH, 2014). The last center will be dedicated solely towards coordination between the other 11 BD2K centers.
During FY 2014, NIH spent $32 million on the BD2K initiative, and Stanford University, UCLA, UC Santa Cruz, and USC were the major beneficiaries (Healy, 2014). Over the next four years, the aforementioned universities will receive $38 million towards funding their BD2K projects. USC neuroscientist Paul Thomson described the ongoing problems that result from the current methods of data analytics and data management at the NIH ENIGMA Center for Medicine, Imaging and Genomics project: “’Studies that use such data generate volumes of information that are simply too big to be transferred electronically,’ Thompson said. To review the whole-genome sequencing performed on 815 subjects with Alzheimer's disease recently, the researchers said they watched delivery trucks disgorge boxes and boxes of disc drives for weeks at a time.”
In summary, NIH is in the process of adapting to advancements in big data which will enable scientists and researchers to assimilate previously incomprehensible volumes of biomedical data; NIH’s commitment towards developing biomedical data analytics and data management is substantial and is likely to have enduring long-term impacts.