Thursday, December 7, 2006

Will Amazon help reduce university research computing costs?

A good article in this month's Nature about Amazon's new Elastic Compute Cloud EC2 and how it might benefit academic and university computing. Universities and funding agencies are spending an increasing amount of their dollars on expensive computational infrastructure, as various departments from architecture to zoology are requiring computing clusters and high performance computers in order to do their research. It is not only the capital cost of the equipment that is a major problem - but also the cost of electricity, air conditioning and maintenance that is becoming a serious issue. The beauty of the Amazon EC2 service is that it is built around web services and virtualization. This makes it very easy for researchers who have moved their applications to web services to easily incorporate the compute services offered by Amazon. See my previous post on Eucalyptus. Thanks to Richard Ackerman and Declan Butler blogs for this timely information -- BSA

Information on Amazon's web services and EC2

Excerpts from Rickard Ackerman's blog

Amazon Computing Cloud - for academics?

Declan Butler has an article in Nature about researchers using Amazon's compute services

The service is still in a test phase, so few scientists have even heard of it yet, let alone tried it. But it is a movement that experts believe could revolutionize how researchers use computers. In future, they will export computing jobs to industry networks rather than trying to run them in-house, says Alberto Pace, head of Internet services at CERN, the European particle-physics laboratory near Geneva. CERN has built the world's largest scientific computing grid, bringing together 10,000 computers in 31 countries to handle the 1.5 gigabytes of data that its new accelerator, the Large Hadron Collider, will churn out every second once it is switched on next year.

"I see no reason why the Amazon service wouldn't take off," Pace says. "For a lab that wants to go fast and cheaply, this is a huge advantage over buying material and hiring IT staff. You spend a few dollars, you have a computer farm and you get results."

[Dutch computer scientist Rudi] Cilibrasi, a researcher at the National Institute for Mathematics and Computer Science in Amsterdam, was using Amazon's service to test an algorithm aimed at predicting how much someone will like a movie based on their current preferences. He says he is a convert: "It's substantially more reliable, cheaper and easier to use [than academic computing networks]. It opens up powerful computing-on-demand to the masses."

From Declan Butler's Blod


Virtualization uses a layer of software to allow multiple operating systems to run together. This means that different computers can be recreated on the same machine. So one machine can host say ten ‘virtual’ computers, each with a different operating system.

That’s a big deal. Running multiple virtual computers on a single server uses available resources much more efficiently. But it also means that instead of having to physically install a machine with a particular operating system, a virtual version can be created in seconds. Such virtual computers can be copied just like a file, and will run on any machine irrespective of the hardware it is using.

Virtualization is going to be one of the next big things in computing, as it brings both large economies of compute resources, and unprecedented flexibility.

Scientists are also testing using virtualization to overcome one of the biggest drawbacks of most current Grids - see here and here for more info on Grids — and computing clusters. They are balkanised, each using a different operating systems or versions, which results in poor use of the available computing resources. Virtualizing the Grid allows virtual computers — image files — to be run on top of all available resources irrespective of the underlying operating systems.

Researchers can also develop applications on whatever software and operating system they have on their lab machine. But at present when they go to run the application at a large-scale, they often need to completely rewrite it to fit the protocols and systems used by a particular cluster or Grid. Virtualization frees researchers from these constraints.

I asked Ian Foster, cofounder of the Grid computing concept what he thought of the prospects for Amazon type-services.

“It’s neat stuff. Exactly what it means remains to be seen, but my expectation is that Amazon’s EC2 and S3 will be seen as significant milestones in the commercial realization of Grid computing. I also think that they may turn out to be important technologies for scientific communities, because they start to address the current high costs associated with hosting services.”

In passing, anyone who has tested the Amazon service, do get in touch to give me your experience, and how you have used it, on via Declan's blog