Wednesday, May 16, 2012

How Internet 2 is helping researchers make effective use of commercial clouds

[Here is an excellent article in HPC in the cloud on how the Internet 2 network is helping researchers and educators use commercial clouds.
 Along with Ian Foster I have long argued that commercial clouds are ideal for small and medium science teams especially  in humanities, social sciences, bio-informatics, genomics, etc. Small and medium sciences teams, made up of one or two PIs, a couple post docs, technician and some graduate students is how the  overwhelming type of research is done at our universities.   They don’t have the IT human resources to manage large scale cyber-infrastructure facilities like physics or engineering, and in many cases they only need computing resources on an infrequent basis.  As such, in the past, they often purchased a small cluster that was often lightly utilized.  Many of the tools they need are now available on commercial clouds as small businesses and graduate students prefer to develop tools on such facilities because of the commercial revenue from click-compute business models.

Over the past couple of days the Canadian Institute for Advanced Research – CIFR organized a cyber-infrastructure consultation meeting with senior Canadian researchers, funding councils and government departments. In the discussions at the meeting I was struck, by how simple tasks such as moving large data files is still a significant problem for most researchers, especially those outside of physics and computer science. Many researchers are still using FedX or snail mail to ship their data to fellow researchers. This is where Ian Foster’s work with Globus On Line team is so important – to develop tools using commercial clouds that eliminate or remove the mundane tasks that consume an inordinate amount of a researcher’s time such as file transfer, indexing, massaging data, cataloging etc etc. This does not mean, in many cases, that the researchers data or computation is done on the commercial cloud, with all the attendant problems of privacy and security, etc

To my mind making life easier for researcher should be the number priority for cyber-infrastructure. Many networks like JANET, SURFnet, NORDUnet are undertaking similar initiatives as Internet 2’s NET + services. – BSA]

Ian Foster’s presentations on Research IT as a Service

Cloud Services Satisfy a Higher Calling


Cloud computing is enabling services at scale for everyone, from scientific organizations and commercial providers to individual consumers. Higher education, in particular, has many collaborative projects that lend themselves to cloud services, however often those services are not tailored to the uniqueness of an academic environment. For example, there are very few businesses that have their research department work with their competitors, whereas in higher education, most research educations occur between institutions. That's where the Internet2 NET+ project comes in. During their annual member meeting, the networking consortium announced the addition of 16 cloud services to its NET+ program, aimed at reducing the barriers to research. HPC in the Cloud spoke with Shel Waggener, Senior Vice President of Internet2 Net+, and Associate Vice Chancellor & CIO for University of California, Berkeley, to get the full story.

Internet2 sees itself as a bridge between the academic communities and commercial vendors. "We're focused on cloud computing enabling scale for a community," Waggener stated, adding, "The ability to have any researcher, any student, anywhere at any institution and instantly use services together is a very powerful opportunity."
Internet2 is probably best known for its 100 Gigabit Ethernet, 8.8 aggregate Terabit network that is used by the national science labs and the research institutions that are Internet2 members. This not-for-profit was established for the benefit of research support for higher education in the United States. Their mission since 1996 has been focused on removing the barriers to research, and one of these barriers has been the network since researchers often require a level of network capacity beyond the scope of commercial carriers. With the advance of cloud computing, the same limitation now applies to services that are accessed through the network (i.e., IaaS, PaaS, SaaS, etc.). The expanded NET+ offering allows Internet members to simply add the services they want to their core membership.
In the current model, individual researchers must go through the sometimes complex, costly and time-consuming process of creating a cloud environment on their own. This first step is a very big one. There are contractual terms, payment and billing options and other administrative tasks that must be attended to, then the service has to be set up to enable sharing across multiple team members and multiple organizations. Each of these parties would also need to create accounts and implement security protocols.
From Waggener: "There is a lot of work done every day by researchers around the world that is in essence lost, a one-time effort with no marginal gain, because as soon as they do that work, then they're focused on their science, and when they're done, it's gone. All the work that went into enabling that science has been sunset. Through the NET+ services model, there is more effort at the outset – collaboration isn't free – but the payoffs are huge."
With Internet2, there is a master agreement with the provider, and then there's a campus member agreement that allows users to add subscriptions to these services. All the terms are signed off by all the legal counsel at the member institutions. So as a faculty member, you know exactly what you are going to get.
Internet2 is taking community-developed services, for specific researchers or specific disciplines and moving those into a community cloud architecture. They're taking their investments in middleware and innovations in federated identity and allowing researchers to use their local institutional credentials and be validated at another institution using InCommon's identity management services. This makes it possible for a Berkeley student to obtain instant access to the services at Michigan or Harvard, and allows faculty members from different universities to collaborate on data analytics or to share computing resources.
But to make an HPC cloud project truly successful, Waggener believes they need to integrate in the commercial solutions that exist today. "We're taking advantage of economies of scale here, not trying to replicate Blue Gene," notes Waggener.
The strategy beyond the latest round of cloud service partnerships is to take offerings that were designed for the commercial sector and help tune them for higher education, while keeping costs down. By its nature, the higher ed space is more difficult to work than other domains as one institution contains every possible type of engagement. A solution that is perfect for one department may not be ideal for another. Waggener explains that fine-tuning services to meet these unique needs usually creates cost barriers for companies trying to offer services to higher education. The goal of this program is to eliminate those cost premiums for the commercial providers and in doing so simplify the academic-leaning business processes, so that both sides can take out the unnecessary costs – administrative, legal, contractual and so on – while enabling the faster adoption of services. Win-win.
In Waggener's viewpoint, the biggest challenge to traditional academic computing is that the largest resources are always constrained. They become oversubscribed immediately no matter how large they are and how quickly they are deployed, and this oversubscription creates and underutilization of the resource. Queue management becomes a significant problem, Waggener notes, and you end up with code being deployed that hasn't been fully optimized for that level of research. Some of the largest big data analysis jobs are left waiting significant blocks of time to achieve their science. The instrument isn't the challenge, says Waggener, it's all of the dynamics around tuning the specific experiment or analytic activity of that particular resource.
Now, with the advance of cloud computing, there is an explosion in global capacity, in resources, but researchers are still single threading their applications.
"If you want to use some number of machines simultaneously, the question becomes how do you do that? Do you get an account at Amazon? Do you run it through your credit card? What if you want to share all that information and results with someone else? You basically have to create a relationship between the individual researchers and Amazon, that's a costly and time-intensive task," comments Waggener.
"The Amazon setup has historically for small-to-medium businesses, but that's not how researchers work. The right approach isn't to get in the way of researchers who want to immediately access those resources, but in fact to have those brokerages done in advance so that the contracts are already in place, so they can log in using their institutional credentials and pick a resource availability from Dell, or from IBM, or from Amazon, in a brokered fashion that takes care of all the complexities of higher education. For the first time, we can work with commercial providers who can leverage their R&D cost for commercial purposes and not have to simply work with them to negotiate a discount price off of their commercial rate for education but instead tune the offering and remove many of the costs that drive the expenditures and overhead for the commercial side and the higher ed side."
The result is custom-tuned services – both in regard to terms and conditions and, in many cases the offering itself – designed to meet the community's needs.

R&E Network and Green Internet Consultant.

twitter:  BillStArnaud
skype:    Pocketpro