[Here is an excellent article in HPC in the cloud on how the
Internet 2 network is helping researchers and educators use commercial clouds.
Along with Ian Foster I have long argued that
commercial clouds are ideal for small and medium science teams especially in humanities, social sciences,
bio-informatics, genomics, etc. Small and medium sciences teams, made up of one
or two PIs, a couple post docs, technician and some graduate students is how
the overwhelming type of research is
done at our universities. They don’t have
the IT human resources to manage large scale cyber-infrastructure facilities
like physics or engineering, and in many cases they only need computing
resources on an infrequent basis. As
such, in the past, they often purchased a small cluster that was often lightly
utilized. Many of the tools they need
are now available on commercial clouds as small businesses and graduate
students prefer to develop tools on such facilities because of the commercial
revenue from click-compute business models.
Over the past couple of days the Canadian Institute for Advanced
Research – CIFR organized a cyber-infrastructure consultation meeting with
senior Canadian researchers, funding councils and government departments. In
the discussions at the meeting I was struck, by how simple tasks such as moving
large data files is still a significant problem for most researchers,
especially those outside of physics and computer science. Many researchers are
still using FedX or snail mail to ship their data to fellow researchers. This
is where Ian Foster’s work with Globus On Line team is so important – to develop
tools using commercial clouds that eliminate or remove the mundane tasks that consume
an inordinate amount of a researcher’s time such as file transfer, indexing,
massaging data, cataloging etc etc. This does not mean, in many cases, that the
researchers data or computation is done on the commercial cloud, with all the
attendant problems of privacy and security, etc
To my mind making life easier for researcher should be the number
priority for cyber-infrastructure. Many networks like JANET, SURFnet, NORDUnet
are undertaking similar initiatives as Internet 2’s NET + services. – BSA]
Ian Foster’s presentations on Research IT as a Service
Cloud
Services Satisfy a Higher Calling
Cloud
computing is enabling services at scale for everyone, from scientific
organizations and commercial providers to individual consumers. Higher
education, in particular, has many collaborative projects that lend themselves
to cloud services, however often those services are not tailored to the
uniqueness of an academic environment. For example, there are very few
businesses that have their research department work with their competitors,
whereas in higher education, most research educations occur between
institutions. That's where the Internet2 NET+ project comes in. During their annual member
meeting, the networking consortium announced the addition of 16 cloud services to
its NET+ program, aimed at reducing the barriers to research. HPC in the Cloud spoke with Shel Waggener, Senior Vice
President of Internet2 Net+, and Associate Vice Chancellor & CIO for
University of California, Berkeley, to get the full story.
Internet2
sees itself as a bridge between the academic communities and commercial
vendors. "We're focused on cloud computing enabling scale for a
community," Waggener stated, adding, "The ability to have any
researcher, any student, anywhere at any institution and instantly use services
together is a very powerful opportunity."
Internet2 is probably best known for its 100 Gigabit Ethernet,
8.8 aggregate Terabit network that is used by the national science labs and the
research institutions that are Internet2 members. This not-for-profit was
established for the benefit of research support for higher education in the
United States. Their mission since 1996 has been focused on removing the
barriers to research, and one of these barriers has been the network since
researchers often require a level of network capacity beyond the scope of
commercial carriers. With the advance of cloud computing, the same limitation
now applies to services that are accessed through the network (i.e., IaaS,
PaaS, SaaS, etc.). The expanded NET+ offering allows Internet members to simply
add the services they want to their core membership.
In the current model, individual researchers must go through the
sometimes complex, costly and time-consuming process of creating a cloud
environment on their own. This first step is a very big one. There are
contractual terms, payment and billing options and other administrative tasks
that must be attended to, then the service has to be set up to enable sharing
across multiple team members and multiple organizations. Each of these parties
would also need to create accounts and implement security protocols.
From Waggener: "There is a lot of work done every day by
researchers around the world that is in essence lost, a one-time effort with no
marginal gain, because as soon as they do that work, then they're focused on
their science, and when they're done, it's gone. All the work that went into
enabling that science has been sunset. Through the NET+ services model, there
is more effort at the outset – collaboration isn't free – but the payoffs are
huge."
With Internet2, there is a master agreement with the provider,
and then there's a campus member agreement that allows users to add
subscriptions to these services. All the terms are signed off by all the legal
counsel at the member institutions. So as a faculty member, you know exactly
what you are going to get.
Internet2 is taking community-developed services, for specific
researchers or specific disciplines and moving those into a community cloud
architecture. They're taking their investments in middleware and innovations in
federated identity and allowing researchers to use their local institutional
credentials and be validated at another institution using InCommon's identity
management services. This makes it possible for a Berkeley student to obtain
instant access to the services at Michigan or Harvard, and allows faculty
members from different universities to collaborate on data analytics or to
share computing resources.
But to make an HPC cloud project truly successful, Waggener
believes they need to integrate in the commercial solutions that exist today.
"We're taking advantage of economies of scale here, not trying to
replicate Blue Gene," notes Waggener.
The strategy beyond the latest round of cloud service
partnerships is to take offerings that were designed for the commercial sector
and help tune them for higher education, while keeping costs down. By its
nature, the higher ed space is more difficult to work than other domains as one
institution contains every possible type of engagement. A solution that is
perfect for one department may not be ideal for another. Waggener explains that
fine-tuning services to meet these unique needs usually creates cost barriers
for companies trying to offer services to higher education. The goal of this
program is to eliminate those cost premiums for the commercial providers and in
doing so simplify the academic-leaning business processes, so that both sides
can take out the unnecessary costs – administrative, legal, contractual and so
on – while enabling the faster adoption of services. Win-win.
In Waggener's viewpoint, the biggest challenge to traditional
academic computing is that the largest resources are always constrained. They
become oversubscribed immediately no matter how large they are and how quickly
they are deployed, and this oversubscription creates and underutilization of
the resource. Queue management becomes a significant problem, Waggener notes,
and you end up with code being deployed that hasn't been fully optimized for
that level of research. Some of the largest big data analysis jobs are left
waiting significant blocks of time to achieve their science. The instrument
isn't the challenge, says Waggener, it's all of the dynamics around tuning the
specific experiment or analytic activity of that particular resource.
Now, with the advance of cloud computing, there is an explosion
in global capacity, in resources, but researchers are still single threading
their applications.
"If you want to use some number of machines simultaneously,
the question becomes how do you do that? Do you get an account at Amazon? Do
you run it through your credit card? What if you want to share all that
information and results with someone else? You basically have to create a relationship
between the individual researchers and Amazon, that's a costly and
time-intensive task," comments Waggener.
"The Amazon setup has historically for small-to-medium
businesses, but that's not how researchers work. The right approach isn't to
get in the way of researchers who want to immediately access those resources,
but in fact to have those brokerages done in advance so that the contracts are
already in place, so they can log in using their institutional credentials and
pick a resource availability from Dell, or from IBM, or from Amazon, in a
brokered fashion that takes care of all the complexities of higher education.
For the first time, we can work with commercial providers who can leverage
their R&D cost for commercial purposes and not have to simply work with
them to negotiate a discount price off of their commercial rate for education
but instead tune the offering and remove many of the costs that drive the
expenditures and overhead for the commercial side and the higher ed side."
The result is custom-tuned services – both in regard to terms
and conditions and, in many cases the offering itself – designed to meet the
community's needs.
[….]
R&E Network and Green Internet Consultant.
email:
Bill.St.Arnaud@gmail.com
twitter: BillStArnaud
blog:
http://billstarnaud.blogspot.com/
skype: Pocketpro