Monday, January 14, 2008

Some excellent comments by Dan Reed on academic high performance computing

[It is astounding how much money granting councils continue to fund for the purchase of stand alone computers and clusters by university researchers. Much of this computation could be more easily done on virtual machines or clouds as Dan Reed points out. Not only will this save significant dollars, but it will also be a first step in addressing the challenges of global warming caused by the power and cooling required for these machines. As well the universities can save money in avoiding the significant costs of the physical infrastructure to host these facilities. Some excerpts from Dan Reed's commentary in HPCwire--BSA]

Outsourcing: Perhaps It Is Time?

In late November, I briefed the NSF OCI advisory committee on the
PCAST report. The ensuing discussion centered on the rising academic
cost of operating research computing infrastructure. The combination
of rising power densities in racks and declining costs for blades
means computing and storage clusters are multiplying across campuses
at a stunning rate. Consequently, every academic CIO and chief
research officer (CRO) I know is scrambling to coordinate and
consolidate server closets and machine rooms for reasons of
efficiency, security and simple economics.

This prompted an extended discussion with the OCI advisory committee
about possible solutions, including outsourcing research
infrastructure and data management to industrial partners. Lest this
seem like a heretical notion, remember that some universities have
already outsourced email, the lifeblood of any knowledge-driven
organization. To be sure, there are serious privacy and security
issues, as well as provisioning, quality of service and pricing
considerations. However, I believe the idea deserves exploration.

Computing Clouds

All of this is part of the still ill-formed and evolving notion of
cloud computing, where massive datacenters host storage farms and
computing resources, with access via standard web APIs. In a very real
sense, this is the second coming of Grids, but backed by more robust
software and hardware of enormously larger scale. IBM, Google, Yahoo,
Amazon and my new employer -- Microsoft -- are shaping this space,
collectively investing more in infrastructure for Web services than we
in the computational science community spend on HPC facilities.

I view this as the research computing equivalent of the fabless
semiconductor firm, which focuses on design innovation and outsources
chip fabrication to silicon foundries. This lets each group -- the
designers and the foundry operators -- do what they do best and at the
appropriate scale. Most of us operate HPC facilities out of necessity,
not out of desire. They are, after all, the enablers of discovery, not
the goal. (I do love big iron dearly, though, just like many of you.)

In the facility-less research computing model, researchers focus on
the higher levels of the software stack -- applications and
innovation, not low-level infrastructure. Administrators, in turn,
procure services from the providers based on capabilities and pricing.
Finally, the providers deliver economies of scale and capabilities
driven by a large market base.

This is not a one size fits all solution, and change always brings
upsets. Remember, though, that there was a time (not long ago) when
deploying commodity clusters for national production use was
controversial. They were once viewed as too risky; now they are the
norm. Technologies change, and we adapt accordingly.