Exploring Clouds for Acceleration of Science
E-CAS Research Projects
The Exploring Clouds for Acceleration of Science (E-CAS) project is comprised to two phases. Phase one ran from April 2019 to April 2020. Phase two runs from September 2020 to September 2021.
Six projects participated in the first phase of the program for on-demand, scalable infrastructure, and their innovative use of newer technologies such as hardware accelerators and machine learning platforms. View phase one webinars by the six E-CAS research teams.
Two projects out of the six have since been selected for the second phase of the program, and are currently developing and repeatedly running the science workloads at scale. The phase two research projects are:
Deciphering the Brain’s Neural Code Through Large-Scale Detailed Simulation of Cortical Circuits
Salvador Dura-Bernal and William Lytton, SUNY Downstate
This project aims to help decipher the brain’s neural coding mechanisms with far-reaching applications, including developing treatments for brain disorders, advancing brain-machine interfaces for people with paralysis, and developing novel artificial intelligence algorithms. Using a software tool for brain modeling, researchers will run thousands of parallelized simulations exploring different conditions and inputs to the simulation of brain cortical circuits.
Investigating Heterogeneous Computing at the Large Hadron Collider
Philip Harris, Massachusetts Institute of Technology (MIT)
Only a small fraction of the 40 million collisions per second at the Large Hadron Collider (LHC) are stored and analyzed due to the huge volumes of data and the compute power required to process it. This project proposes a redesign of the algorithms using modern machine learning techniques that can be incorporated into heterogeneous computing systems, allowing more data to be processed and thus larger physics output and potentially foundational discoveries in the field.
These are the four research projects that were also initially supported by phase one of E-CAS:
Accelerating Science by Integrating Commercial Cloud Resources in the CIPRES Science Gateway
Mark Miller, San Diego Supercomputing Center (UCSD)
CIPRES is a web portal that allows scientists around the world to analyze DNA and protein sequence data to determine the natural history of a group or groups of living things. For example, one can ask where mammals originated, or how does Ebola virus spread, or whether a given plant is really a new species, or an unwelcome imported species, or how does a given species interact with other species and its environment over long periods of time. CIPRES helps answer these kinds of questions by providing access to parallel phylogenetics codes run on large HPC clusters provided by the NSF XSEDE program. CIPRES currently runs analyses for about 12,000 scientists per year, and that number is growing each year. CIPRES accelerates research by increasing each researcher’s throughput. Job runs go faster using parallel codes, and users can run many jobs simultaneously on large clusters. For example, CIPRES provides access to P100 GPUs that can speed up some jobs by 100-fold relative to a single core run. But GPUs are in short supply in the XSEDE portfolio, and so usage must be strictly limited. This project will develop the infrastructure needed to cloudburst CIPRES jobs to newer, faster V100 GPUs at AWS. As a result, individual jobs will run up to 1.5 fold faster, and users will have access to twice as many GPU nodes as they did in the previous year. The infrastructure created will also open the door for scalable access to AWS cloud resources through CIPRES for all users.
Ice Cube Computing in the Cloud
Benedikt Riedel, University of Wisconsin
The IceCube Neutrino observatory located at the South Pole supports science from a number of disciplines including astrophysics, particle physics, and geographical sciences operating continuously being simultaneously sensitive to the whole sky. Astrophysical Neutrinos yield understanding of the most energetic events in the universe and could show the origin of cosmic rays. Being able to burst into cloud supports follow-up computations of observed events & alerts to and from the community such as other telescopes and LIGO. This project plans to use custom spot instances and FPGA based filters in AWS and GPU/TensorFlow Machine Learning in GCP.
Building Clouds: Worldwide Building Typology Modeling from Images
Daniel Aliaga, Purdue University
This Exploring Clouds for Acceleration of Science (E-CAS) project will exploit the computational power and network connectivity to provide a world-scalable solution for generating building-level information for urban canopy parameters as well as for improving the information for estimating local climate zones, both of which are critical to high resolution urban meteorological/environmental models. The challenge is that current computational models have a bottleneck, not just in terms of the physics and processes within the land surface and boundary layer schemes, but even more critically the need is for providing a robust means of generating parameter values that define the urban landscape. This is how the proposed E-CAS inverse modeling approach comes into play. By utilizing images and world-wide input about building properties, we can infer a sampling of 3D building models at world scale containing more than just the geometrical shape information and enable world-scale urban weather modeling.
Development of BioCompute Objects for Integration into Galaxy in a Cloud Computing Environment
Raja Mazumder, George Washington University
BioCompute Objects allow researchers to describe bioinformatic analyses comprised of any number of algorithmic steps and variables to make computational experimental results clearly understandable and easier to repeat. Galaxy is a widely used bioinformatics platform that aims to make computational biology accessible to research scientists that do not have programming experience. The project will create a library of BioCompute objects that describe bioinformatic workflows on Amazon Web Services, which can be accessed and contributed to by Galaxy users from all over the world. This project also plans to utilize AWS Direct Connect over Internet2 to connect the library of biocomputer objects to the campus HPC environment at George Washington University.