Replicating Particle Collisions at CERN with Kubeflow
This is where Kubeflow comes in. They started by training their 3DGAN on an on-prem OpenStack cluster with 4 GPUs. To verify that they were not introducing overhead by using Kubeflow, they ran training first with native containers, then on Kubernetes, and finally on Kubeflow using the MPI operator. They then moved to an Exoscale cluster with 32 GPUs and ran the same experiments, recording only negligible performance overhead. This was enough to convince them that they had discovered a flexible, versatile means of deploying their models to a wide variety of physical environments.
Beyond the portability that they gained from Kubeflow, they were especially pleased with how straightforward it was to run their code. As part of the infrastructure team, Ricardo plugged Sofia’s existing Docker image into Kubeflow’s MPI operator. Ricardo gave Sofia all the credit for building a scalable model, whereas Sofia credited Ricardo for scaling her team’s model. Thanks to components like the MPI operator, Sofia’s team can focus on building better models and Ricardo can empower other physicists to scale their own models.
- Login or register to post comments
- Printer-friendly version
- 1705 reads
- PDF version
More in Tux Machines
- Highlights
- Front Page
- Latest Headlines
- Archive
- Recent comments
- All-Time Popular Stories
- Hot Topics
- New Members
digiKam 7.7.0 is releasedAfter three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. |
Dilution and Misuse of the "Linux" Brand
|
Samsung, Red Hat to Work on Linux Drivers for Future TechThe metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. |
today's howtos
|
Recent comments
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago