Office of Science and Technology Policy Unveils Big Data Initiative

Print this pagePrint this page
Publication date: 
10 April 2012
Number: 
48

On Thursday, March 29, 2012, the White House Office of  Science and Technology Policy (OSTP), in collaboration with several federal  departments and agencies, announced the creation of a Big Data Research and  Development Initiative to a packed auditorium at the American Association for  the Advancement of Science.

The goals of this initiative are “to advance state-of the-art core technologies needed to collect, store,  preserve, manage, analyze, and share huge quantities of data; harness these  technologies to accelerate the pace of discovery in science and engineering; strengthen  our national security; and transform teaching and learning; and to expand the  workforce needed to develop and use Big Data technologies.”

Last year the President’s Council of Advisors on Science and  Technology (PCAST) concluded that the Federal Government is under-investing in  research and development related to sharing and storing large quantities of  data.  In response, OSTP launched a Big  Data Senior Steering Group to coordinate and expand the Government’s  investments in this area.

OSTP Director John Holdren began the event by emphasizing “it’s not the data per se that create value,  what really matters is our ability to derive from them new insights, to  recognize relationships, to make increasingly accurate predictions.  Our ability, that is, to move from data, to  knowledge, to action.”

Though the private sector will take the lead in developing  Big Data systems, Holdren stated that the government will play a large role in  supporting Big Data research and development by investing in a Big Data work  force, using new Big Data approaches to make progress on key national  challenges, and shaping policies on issues such as electronic privacy. 

Subra Suresh, Director of the National Science Foundation  (NSF), outlined the strategies being implemented at NSF used to derive  knowledge from Big Data; to develop infrastructure to manage, curate, and serve  data to communities; and to build education and workforce opportunities.

NSF’s Big Data interdisciplinary efforts include: a  collaborative project between NSF and the National Institutes of Health (NIH)  to advance big data science and engineering, funding a $10 million Expeditions  in Computing project based at the University of California, Berkeley;  integrating human knowledge; and computer algorithms and machines to develop a  new understanding of these Big Data.  NSF  will also encourage research universities to develop interdisciplinary graduate  programs in Big Data and will provide the first round of grants to support  “EarthCube” which is a system that will allow geoscientists to access, analyze,  and share information about Earth.  In  addition, NSF will issue a $2 million award for undergraduate training in  complex data, provide $1.4 million to support a group of statisticians and  biologists to study protein structures and biological pathways, and create an “Ideas  Lab” forum to enhance efforts to understand teaching and learning  environments. 

NIH Director Francis Collins was excited to announce the  need for Big Data projects in the biological sciences community.  He described a new collaboration between the  National Human Genome Research Institute working with the National Center for  Biotechnology Information and the European Bioinformatics Institute to put the  largest set of data on human genetic variation, produced by the international  1000 Genomes Project, on the Amazon Web Services Cloud.  The 200 terabytes of data from this project  had become so massive that user access was very challenging.  Therefore having the data in the cloud, and  making it freely available, has benefited the science community by granting  improved access to this data. 

Marcia McNutt, Director of the US Geological Survey (USGS)  announced the 2012 awardees for eight grant proposals selected through its John  Wesley Powell Center for Analysis and Synthesis.  These projects will focus on areas of  research including climate change, earthquake recurrence rates, and ecological  indicators. 

Zach Lemnios, Assistant Secretary of Defense for Research  and Engineering at the Department of Defense (DOD) stated that the DOD will  invest approximately $250 million annually, with $60 million available for new  initiatives projects.  He described Big Data  challenges such as the capability to use the large amounts of generated data  and how scientists perform computations and employ data capacity.  The three areas of focus for the Department’s  work on Big Data include data-to-decision projects focused on reasoning and  inferences, autonomy research to develop ways to adapt to “real world”  scenarios, and human-system research such as the need for new technological  interfaces.

Ken Gabriel, Acting Director of the Defense Advanced  Research Projects Agency (DARPA) announced that the agency is beginning the  XDATA program, which will invest approximately $25 million annually for four  years to develop computational techniques and software tools for analyzing  large volumes of data.  The goals of this  project are to develop scalable algorithms for processing data and to create effective  human-computer interaction tools.

William Brinkman, Director of the Department of Energy (DOE)  Office of Science (SC), spoke about the need to store, analyze, and use Big Data.  Brinkman described one of the roles of SC,  which is to operate and maintain facilities at National Laboratories including supercomputers,  x-ray light sources, advanced light sources, nanoscience and systems biology  laboratories.  The data at these  facilities is rapidly generated and there is a need for a way to better manage  this Big Data.  Brinkman was pleased to  announce that SC is establishing the Scalable Data Management, Analysis and  Visualization (SDAV) Institute to bring together six national laboratories and  seven universities to develop tools to help scientists manage and visualize  data on DOE’s supercomputers. 

The announcements from federal agency staff were followed by  a panel discussion with industry and academic leaders.  These panelists provided insight and analysis  into how Big Data can be used by universities, such as Stanford and MIT, which  provide large-scale online courses in order to study student learning.  Other topics of discussion included how Big Data  is used to detect patterns in pathology, what is the effect of Big Data on  human resources at large companies, and what are the skills challenges in a Big  Data workforce. 

For more information on this initiative, visit the OSTP fact  sheet or the NSF website