Wednesday, November 24, 2010

Research Collaborative Tools for integrating commercial clouds and university cyber-infrastructure

[Around the world there are a number of initiatives on developing new collaborative tools and generic portal services for various research communities that allows the seamless integration of commercial cloud services and campus HPC facilities. There is no question that some applications require dedicated high performance HPC facilities on the campus, but there is a wide range of other research and education applications and services using commercial clouds that could make life much more easier for both researchers and IT staff. Two great examples of this type of architectural thinking is the new Globus Online: A cloud based managed file transfer service and SURFnet’s COIN – Collaboration Infrastructure Project. Other related initiatives include Zero Hub an Internet 2’s CoManage Project.

The big advantage of providing integrated collaborative services with commercial clouds as Ian Fosters eloquently states is that “The biggest IT challenge facing science today is not volume but complexity…. It is establishing and operating the processes required to collect, manage, analyze, share, archive, etc., that data that is taking all of our time and killing creativity. And that's where outsourcing can be transformative....For that to happen, we need to make it easy for providers to develop "apps" that encapsulate useful capabilities and for researchers to discover, customize, and apply these "apps" in their work. The effect, I will argue, will be a dramatic acceleration of discovery.”

And of course the major attraction to me personally is that these are the types of collaborative services that could run on a zero carbon infrastructure such as Greenstar, and then direct the compute or application jobs to the appropriate cloud or HPC with the lowest carbon footprint.

The COIN infrastructure will link collaboration services set up by educational institutions, research organisations, commercial parties and SURFnet and enable them to interact, thus making custom, flexible online collaboration possible.
At the moment, users are still obliged to choose one or, at most, a couple of online applications for their groupwork. Sharing information between these systems is almost impossible. The COIN project is designed to change this by ensuring that institutions connected to SURFnet can offer their users a greater variety of collaboration services. The aim is to develop a set of online tools that users can combine into a collaboration environment suitable for them.
COIN is based around OpenSocial, a powerful collaborative tool originally developed by Google and now made open to the research and education community. For more details please see

Globus Online: A cloud-based managed file transfer service

Moving data .. can sound trivial, but in practice is often tedious and difficult. Datasets may have complex nested structures, containing many files of varying sizes. Source and destination may have different authentication requirements and interfaces. End-to-end performance may require careful optimization. Failures must be recovered from. Perhaps only some files differ between source and destination. And so on.

Many tools exist to manage data movement: RFT, FTS, Phedex, rsync, etc. However, all must be installed and run by the user, which can be challenging for all concerned. Globus Online uses software-as-a-service (SaaS) methods to overcome those problems. It's a cloud-hosted, managed service, meaning that you ask Globus Online to move data; Globus Online does its best to make that happen, and tells you if it fails.

The Globus Online a service can be accessed via different interfaces depending on the user and their application:

-A simple Web UI is designed to serve the needs of ad hoc and less technical users
-A command line interface exposes more advanced capabilities and enables scripting for use in automated workflows
- A REST interface facilitates integration for system builders who don't want to re-engineer file transfer solutions for their end users

All three access methods allow a client to:

-establish and update a user profile, and specify the method(s) you want to use to authenticate to the service;
-authenticate using various common methods, such as Google OpenID or MyProxy providers;
-characterize endpoints to/from which transfers may be performed;
-request transfers;
-monitor the progress of transfers; and
-cancel active transfers

The two keys to successful SaaS are reliability and scalability. The service must behave appropriately as usage grows to 1,000 then 1,000,000 and maybe more users. To this end, we run Globus Online on Amazon Web Services. User and transfer profile information are maintained in a database that is replicated, for reliability, across multiple geographical regions. Transfers are serviced by nodes in Amazon's Elastic Compute Cloud (EC2) which automatically scale as service demands increase.

We will support InCommon credentials and other OpenID providers in addition to Google; support other transfer protocols, including HTTP and SRM; and continue to refine automated transfer optimization, by for example optimizing endpoint configurations based on number and size of files.

twitter: BillStArnaud
skype: Pocketpro