Department of Defense
High Performance Computing Modernization Program

U.S

The Department of Defense (DoD) High Performance Computing Modernization Program (HPCMP) is pleased to announce its newest supercomputing capability in support of the DoD Test and Evaluation (T&E), Acquisition Engineering (AE), and Science and Technology (S&T) communities. IBM will provide the capability, to be delivered later in 2019, which consists of a supercomputing system housed in a shipping container with onboard power conditioning and cooling, along with the corresponding hardware and software maintenance services. The system will provide over 6 PetaFLOPS of single-precision performance for training and inference machine learning workload, and over 1 Petabyte of high-performance, solid-state storage for data analytic workloads. The system brings a significant capability to support militarily significant use-cases that were not possible with supercomputers installed in fixed facilities.

The new supercomputer will initially be based at the US Army Combat Capabilities Developmental Command Army Research Laboratory (ARL) DoD Supercomputing Resource Center (DSRC), and will serve users from all of the services and agencies of the Department. This “HPC in a Container” is designed to be deployable to the tactical edge; deployment opportunities to remote locations are currently being explored and evaluated.

  • The IBM system consists of:
    • 22 nodes for machine learning training workloads, each with two IBM POWER9 processors, 512 GB of system memory, 6 nVidia V100 graphical Processing units with 32 GB of high-bandwidth memory each, and 15 TB of local solid-state storage.
    • 128 GPGPU-accelerated nodes for inferencing workloads, each with two IBM POWER9 processors, 256 GB of system memory, and 4 TB of local solid state storage.
    • Three solid-state parallel file systems, totaling 1.3 PB.
    • A 100 Gigabit per second InfiniBand network, as well as dual 10 gigabit Ethernet networks.
    • Platform LSF HPC job scheduling integrated with a Kubernetes container orchestration solution.
    • Integrated support for TensorFlow, PyTorch, Caffe, in addition to traditional HPC libraries and toolsets including FFTW and Dakota.
    • A shipping container-based facility with onboard uninterruptible power supply, chilled water cooling, and fire suppression systems.

The system is expected to be delivered later in 2019, and to enter production service shortly thereafter.

 


Top