Sunday, December 4, 2011

Open Lightpath Exchanges, The Data Deluge, Software Defined R&E Networks

[Several readers of this blog have pointed to the article in last week’s NYTimes on the genomics data deluge.
 There is such a huge volume of  genomics and bio-informatics data being produced that it cannot be transferred over commercial Internet networks, and instead organizations are using FedX and other sneaker nets to ship the data.  The same crisis in data volumes is also occurring in the climate modeling and other fields as well.

Research and Education networks for many years have been warning about this coming data tsunami.   For the most part they have the capacity and the tools to easily enable the transfer of these large data volumes.   No commercial networks have this capability at this time. But the biggest problem is a lot of this data is not being generated by universities or R&E organizations but commercial facilities closely aligned with the R&E community. Numerous bioinformatics companies, like SoftGenetics, DNAStar, DNAnexus and NextBio, have sprung up to as they have found life sciences a fertile market for products that handle large amounts of information.

This poses a real dilemma for many R&E network, especially those who receive public funding.  They cannot be seen to competing with the private sector (even though commercial networks do not yet have the capability or technology to deliver such data volumes), and in many cases their stated public policies do not allow then to connect commercial facilities.  Compounding this problem is that most of the modern computational tools needed to analyze this data are only available on commercial clouds.  Academic HPC facilities and university based cloud solutions generally cannot scale as quickly as commercial cloud providers in providing as many cores as required on demand to analyze this data.  As well many grad students and many small innovative business are developing the necessary analysis tools to  work only on the commercial clouds, as they are driven by the revenue opportunity of “click compute” models offered by many commercial cloud providers.

R&E networks are thus conflicted.  Academic institutions and commercial organizations need access to commercial clouds to analyze this torrent of data – yet their acceptable use policy may prohibit the interconnection to commercial facilities, especially if the other end of the connection is also a commercial organization.  This is where Open Lightpath Exchanges can play a critical role, much like the earlier NAPs played in the early day of the commercialization of the Internet.

Open LightPath Exchanges, by their very definition are policy free.  That means anyone can cross connect to anyone else regardless whether they are commercial organizations or academic institutions. Open LightPath Exchanges are being established all around the world and many more are expected to be deployed in the coming year.  A good background paper on Open LightPath Exchanges “Open Exchanges for Open Science” can be found at:

Open Lightpath Exchanges allow commercial organizations, who benefit from R&E data, to bring their own fiber to the exchange point so that they can interconnect to R&E networks and commercial clouds. Many R&E networks also connect to commercial clouds through Open LightPath Exchange Points.  But what if you are not located near a city that hosts an Open LightPath Exchange?  Several R&E networks offer what are called “Distributed” Open Lightpath Exchange points – but these facilities are often restricted to academic institutions.  This is where Software Defined Networks can help as they allow the deployment of “condominium” optical networks that allow both commercial and academic institutions to share the same fiber or lightpath, and yet not have policy of funding conflicts in terms of use of the fiber.

The CANARIE/CRC User Controlled Lightpths (UCLP) was one of the first to develop such technology to enable the deployment of condominium lightpaths, where each organization can independently manage their own set of lightpaths, with independent use policies, on a common fiber or optical infrastructure. Internet 2 and National Lambda Rail are now making significant strides in this field as well as other research initiatives such as the ORCA experiment described below. Now that NLR has been re-energized by  Dr. Patrick Soon Shiong acquisition, to drive a national bio-informatics strategy I think we will see a huge push to integrate Open Lightpath Exchanges with Software Defined Networks. Some pointers follow – BSA]

DNA Sequencing Caught in Deluge of Data

RENCI to Demonstrate On-Demand Resources and Provisioning at SC11

 Scientists studying data or compute-intensive problems require high bandwidth and computational resources, often from heterogeneous systems at different sites.
ORCA was developed by Duke computer science professor Jeff Chase and his students with funding from the National Science Foundation. It is one of the experimental control frameworks for the NSF's Global Environments for Network Innovation (GENI) project. GENI is a virtual laboratory for networking experiments that will help researchers develop the tools and protocols that will define future internets. With funding from the Department of Energy Advanced Scientific Computing Research program and the NSF Software Development for Cyberinfrastructure program, researchers are adapting ORCA as an Infrastructure as a Service (IaaS) platform for serving the diverse needs of computational scientists.
The first demonstration will execute a scientific workflow by using ORCA to allocate a slice of computational resources from multiple cloud providers and bandwidth-provisioned network connections between provider sites. The workflow, managed by the Pegasus workflow management system, will use six serial applications, which will run on Condor clusters dynamically provisioned from clouds owned by RENCI in Chapel Hill, NC, and by Duke University in Durham, NC. The two clouds are connected by the Breakable Experimental Network (BEN), an experimental network that connects RENCI and its partner institutions at Duke, UNC-Chapel Hill and North Carolina State University.
A related demonstration will use the ORCA framework to execute a Hadoop workflow on multiple clouds connected through bandwidth-provisioned network pipelines. Hadoop is a software framework for data-intensive distributed applications. A third demonstration will take a closer look at a part of the first demonstration: the on-demand provisioning of computational infrastructure to stand up a Condor cluster in a networked cloud environment.

Green Internet Consultant. Practical solutions to reducing GHG emissions such as free broadband and electric highways.

twitter:  BillStArnaud
skype:    Pocketpro