Tuesday, July 19, 2011

Should funding councils mandate the use of commercial clouds re: - white house announcement

[The recent press release from the White House Office Science Technology Policy highlghts an important and growing trend in research and science around the world.
The announcement reinforces a recent PCAST report “Smarter not Faster is the Future of Computing Research”

1. The demand for research computing is real and growing dramatically - and only will continue to increase. It will be the fundamental underpinning for all future research.

2. But the type of research computing is changing significantly from traditional HPC largely based on "modelers"  and “simulations” (e.g. astrophysics, computational chemistry, etc) to those who are drowning in data- genomics, astronomical instruments, health imaging etc.  The computing needs for the those drowning in data are vastly greater and quite a bit different than for the modelers.  Clouds, in particular commercial clouds, are much better suited for the later.  As he pointed out in these reports, unfortunately the vocal component of the research computing tends still to be the modelers, while the those drowning in data simply just want solutions.  What the funding councils are seeing is the huge demand from those drowning in data for more computing resources and they are confused, to say the least, at why the traditional HPC modeling resource is not addressing their needs.

3. The needs of the those drowning in data can probably be best addressed by public private partnerships with the commercial cloud sector - as we are now witnessing with the offerings made by Google, Azure and Amazon.  Yes there still real remain challenges in terms of privacy, security, etc . But the big advantage of the commercial clouds is the incredible innovation that is occurring in terms of solutions (e.g. Azure Excel cloud for big data, Daytona, Enomaly SpotCloud, Green Clouds, Click Compute, etc) and the fact that the computing needs can scale with the science demands.  It is also very likely that we will also see a continuation and growth in the free cloud services for researchers, as witnessed by the recent offerings of Microsoft, Google etc

4. Unfortunately funding councils around the world don’t have the policy tools to support researchers using commercial clouds, other than the recently announced brokering services offered by JANET/JISC in the UK and SURFnet in the Netherlands. Most funding councils can only fund the purchase of infrastructure. Right now, with the current environment the focus is on purchasing infrastructure, science users are limited to a fixed size resource (renewed every 3-5 years) where demand from more and more users always increase. As a result their ability to do science may decline over the 3-5 years cycle because of increasing pressure for more users to share the same resource. In some cases funding councils may fund the purchase of a community cloud or private cloud for research, but I see that having the same limitations as in the past with traditional HPC in regards to scaling and innovation. Commercial clouds have no constraints in terms of scaling to meet science needs.  Although some claim that commercial clouds are more expensive to use compared to dedicated research or community clouds, that is rapidly changing with the advent of free clouds and bulk discount being negotiated by organizations such as SURFnet and JANET.  As well, once universities start to pass on the real costs of research computing in terms of cooling, power and space the cost difference with commercial clouds will largely disappear.

5. There is no question that the "traditional" HPC modeling compute needs will still be important and the issues that you pointed out need to be addressed.


From Big Data to New Insights

Today, Microsoft is announcing the availability of a new tool called Daytona that will make it easier for researchers to harness the power of “cloud computing” to discover insights in huge quantities of data.
Daytona, which will be freely available to the research community, builds on an existing cloud computing collaboration between the National Science Foundation and Microsoft.
Increases in the ability to make predictions and more informed decisions on the basis of Big Data will have implications far beyond science and engineering.  They have the potential to improve healthcare, increase economic productivity, personalize education, strengthen our national security, and improve the management of America’s infrastructure, including transportation and the electric grid.  Given the scope and magnitude of Big Data’s potential economic, societal and scientific impact, the Office of Science and Technology Policy has asked Federal agencies to form a Senior Steering Group on this subject.  Agencies are working to identify the investments that the government should be making in areas such as research, education and workforce development, Big Data-related prizes and challenges, infrastructure, and pilot projects that explore a broad range of potential applications.
Harnessing the power of Big Data will require new partnerships between government, industry, and academia—partnerships like the Microsoft-NSF collaboration.  It will also require efforts to shape the national research agenda, such as the Computing Community Consortium’s papers on data analytics.  I hope that more companies and research communities will join with the Administration to make the most of this opportunity.
Tom Kalil is Deputy Director for Policy at OSTP
See more about Innovations, Technology

From Data to Knowledge to Action:
Enabling 21st Century Discovery in Science and Engineering


A new revolution in discovery and learning
Recent rapid advances in information and communication technologies – both hardware and
software – are creating a new revolution in discovery and learning.
Over the past several decades, computational science – the large-scale simulation of phenomena
– has joined theory and experiment as a fundamental tool in many branches of science and
Today we are at the dawn of a second revolution in discovery – a revolution that will have far
more pervasive impact. The focus of this new approach to science – called eScience – is data,
the ability to collect and manage orders of magnitude more data than ever before possible;
the ability to provide this data directly and immediately to a global community;
the ability to use algorithmic approaches to extract meaning from large-scale data sets; and
the ability – and, in fact, the need – to use computers rather than humans to guide the
hypothesis/measurement/evaluation loop of scientific discovery.
Enormous numbers of tiny but powerful sensors are being deployed to gather data – deployed on
the sea floor, in the forest canopy, on the sides of volcanoes, in buildings and bridges, in living
organisms (including ourselves!). Modern scientific instruments, from gene sequencers to
telescopes to particle accelerators, generate unprecedented amounts of data. Other contributors
to the data tsunami include point-of-sale terminals, social networks, the World Wide Web,
mobile phones (equipped with cameras, accelerometers, and GPS technology), and electronic
health records. These sensors, instruments, and other information sources – and, indeed,
simulations too – produce enormous volumes of data that must be captured, transported, stored,
organized, accessed, mined, visualized, and interpreted in order to extract knowledge and
determine action.
This “computational knowledge extraction” lies at the heart of 21st century discovery.
1 Contact: Erwin Gianchandani, Director, Computing Community Consortium (202-266-2936; erwin@cra.org).
For the most recent version of this essay, as well as related essays, visit http://www.cra.org/ccc/initiatives

A national imperative
The fundamental tools and techniques of eScience include sensors and sensor networks,
backbone networks, databases, data mining, machine learning, data visualization, and cluster
computing at enormous scale. eScience, even more than computational science, illustrates the
extent to which advances in all fields of science and engineering are married to advances in
computer science and the mathematical sciences.
Traditional simulation-oriented computational science was transformative, but it was a niche.
Simulations can predict the behaviors of systems given underlying models, but this is only one
aspect of the scientific process. In contrast, eScience – 21st century computational science – will
be pervasive, affecting a broad spectrum of investigators and a broad spectrum of fields, since
virtually all of science requires extracting useful insights from the data arising from
measurements and simulations.


It is not just the volume of data that is driving change; it is also the rate at which data are being
generated, and the dimensionality. At every level of the “discovery pyramid,” from “small
science” to “big science,” the nature of discovery is changing rapidly and dramatically.
Furthermore, these advances in eScience are beginning to drive new fields of research, such as:
-“Astroinformatics” – large-scale exploration of the sky from space and from the ground,
requiring data mining, analysis, and visualization;

- Chemistry and materials science, including “matinformatics” – real-time chemical analysis of
complex sample mixtures, requiring data unification, uncertainty quantification, clustering,
and classification; and
Systems biology – systems analysis of underlying biochemical interactions that give rise to
biological functions and behaviors, requiring data unification, clustering, classification,
feature detection, information extraction, uncertainty analysis, anomaly detection, and

Green Internet Consultant. Practical solutions to reducing GHG emissions such as free broadband and electric highways. http://green-broadband.blogspot.com/

email:     Bill.St.Arnaud@gmail.com
twitter:  BillStArnaud
blog:       http://billstarnaud.blogspot.com/
skype:    Pocketpro