Google cloud takes on cancer research with help from its friends

by Barb Darrow @gigabarb Fortune - SEPTEMBER 8, 2015, 9:00 AM EDT - See Original Here

[Comment: Big Data is being employed in the search for a cure. So necessary. A great first step. The question is... pediatric cancers??? - Alan]

The Broad Institute will use Google Cloud Engine’s pre-emptible VMs, Cycle Computing’s orchestration, and machine learning to parse cancer cell and genetic data.

Google has been energetic pushing its cloud as a viable alternative to Amazon Web Services and Microsoft Azure when it comes to big, burly computing jobs. As part of that effort, on Tuesday it made previously announced “preemptible VMs” available to customers and said that the Broad Institute will use these computing instances, along with software services from Cycle Computing, to perform complex cancer research.

Pre-emptible VMs is tech talk for deeply discounted (70% off!) cloud computing processors which can be applied to big research projects or other tasks as needed. Amazon has long offered “spot instances” that customers can bid on in an auction-like format.

The Broad Institute’s Cancer Group in Cambridge, Massachusetts has data sets on cancer cell lines, “gene expression data,” and information about how various molecules interact with that gene expression. Gene expression data provides clues as to how often a given gene in a creature’s DNA is actually used by a cell.

“This is a cool workload from a science perspective in that it uses machine learning to understand the relationships between cell lines and cancer,” said Jason Stowe, CEO of Cycle Computing. Scientists can then look at the expressions of those gene mutations and how they interact with those molecules.

“They build a map of the relationships so a researcher on a particular mutation will know what else they should look at, what other data points in Broad’s data set should be examined.”

Machine learning is an advanced form of pattern recognition that applies algorithms to tough data problems without having to be programmed specifically to do that work, and then makes predictions based on that work.

New York-based Cycle is an ideal partner for big public cloud companies that want to show that their massive resources can be corralled, scheduled, and deployed efficiently into high-performance computing (HPC) dynamos. Cycle is an expert in scheduling and managing those big workloads.

Google announced the news that Broad and Cycle are using Google Compute Engine (GCE) for this key work. Not only that, Cycle Computing cloud orchestration and management services will run on GCE as well as AWS from here on. That’s a big win for Google over its competitors.

Google and Broad are already cooperating to run Broad’s genomic analysis as a service on GCE.

The fact that Cycle Computing will now run on GCE is a plus for Google, but should not be surprising given that Cycle Computing was fully on board a Google-led container management effort announced in July.

Until now, Cycle hs worked on big high-performance computing projects with customers running on AWS, but Stowe said this move is no reflection on Amazon. “We followed the customer. Google’s pre-emptible VMs are very cost performant and our customers need to be able to use the best cloud services for their project,” he told Fortune.

In short: Researchers need to be able to use whatever cloud gives them the most usability for their money in any given project. The implication being that sometimes that optimal cloud may be from Google, sometimes from Amazon, and sometimes from Microsoft.