AI and HPC GPU Acceleration Benefit from Open Source Efforts

Enterprise compute requirements continue to grow at an exponential rate driven by increased use of mainstream high-performance computing (HPC), Big Data, artificial intelligence (AI), and machine learning applications.

Traditional infrastructures typically cannot deliver the needed compute capacity or perform computations fast enough to deliver results in a timely manner. This is driving the adoption of GPUs in data centers to accelerate workloads and speed the time to results.

Open source for GPUs
Use of GPUs-for-computing is following similar adoption and deployment patterns as that of other HPC technologies. Specifically, in the past, many organizations have turned to open source solutions such as server clustering using Linux, distributed computing using Hadoop, and streaming analytics using various Apache solutions to help meet new and evolving computing requirements.

Open source solutions are preferred because they allow more flexibility since they do not lock a company into one vendor’s solutions. Moreover, with developer communities like Github, companies get the assurance that problems will be resolved quickly, and innovations will be added constantly.

Additionally, leading technology companies are embracing open source for their own needs further helping to drive developments. A good example of this is Google’s work on two major projects: TensorFlow and Kubernetes. Kubernetes is a container orchestration platform that helps DevOps teams deploy and manage their application code. TensorFlow makes machine learning model building and deployment far more accessible to developers.

On the GPU front, ROCm is the first open-source HPC/Hyperscale-class platform for GPU computing that’s also programming-language independent. The community behind ROCm is bringing the open source philosophy of choice, minimalism, and modular software development to GPU computing.

ROCm lets companies choose or even develop tools and a language runtime for their applications. Some key characteristics of ROCm include:

Built for scale: It supports multi-GPU computing in and out of server-node communication through remote direct memory access (RDMA). It also simplifies the stack when the driver directly incorporates RDMA peer-sync support.
A rich system runtime: ROCm offers critical features that large-scale application, compiler, and language-run-time development requires.
ROCm is developed to work with the newest GPU hardware. Using APIs and other methods, companies can retain their investment in applications and programming they have done using CUDA proprietary software.

To that point, the ROCr System Runtime is language independent and makes heavy use of the Heterogeneous System Architecture (HSA) Runtime API. This approach lets developers use and execute programming languages such as HCC C++ and HIP, the Khronos Group’s OpenCL, and Continuum’s Anaconda Python. A complete list of downloadable ROCm software and associated documentation can be found here.

Such support is quite important to preserve past investments in GPU-based application development. For example, hipified code is portable to AMD/ROCm and NVIDIA/CUDA. On a CUDA platform, HIP provides the same performance as coding in native CUDA and developers can use native CUDA tools (nvcc, nvprofetc). On ROCm, developers can use native ROCm tools (hcc, rocm-prof, codexl).

Additionally, developers will find that the ROCm platform includes many tools and libraries including a rich set of open source math libraries, optimized parallel programming frameworks, CodeXL profiler, and GDB debugging.

Github work on ROCm includes tuned deep learning frameworks. The new Foundation for Deep Learning acceleration MIOpen 1.0 includes support for Convolution Neural Network (CNN) acceleration — built to run on top of the ROCm software stack. The MIOpen library serves as a middleware layer between AI frameworks and the ROCm platform, supporting both OpenCL and HIP-based programming environments.

Finding a technology partner
Today’s marketplace is very dynamic. GPU-for-computing developments are happening at a rapid pace. New GPU hardware is being introduced all the time. The new hardware offers new capabilities. And, there are constant updates and enhancement to ROCm.

Most organizations need to focus on their core business and often do not have the time or internal expertise to evaluate, select, deploy, and optimize the best and most recent technology. The fastest way to realize the benefits of using GPUs for HPC and AI is to partner with a company that has complete (hardware and software) GPU solutions and real-world expertise in deploying those solutions. This is where AMD can help.

AMD’s suite of GPU hardware and open-source software offerings are designed to dramatically increase performance, efficiency, and ease of implementation of HPC and deep learning workloads.

There are several areas where AMD is driving the development of ROCm for HPC. A series of published HIP APIs allow developers to program for AMD GPUs in a CUDA-like schema.

AMD provides a tool that automatically converts CUDA code into HIP format. This “hipify” script will convert up to 99 percent of the code to HIP format automatically.

As a proof of concept, the ROCm team ported the whole CAFFE machine-learning framework (with around 55,000 lines of code) from CUDA to HIP. In that effort, 99.6 percent of the code went unmodified or was automatically converted, and the remaining 0.4 percent took less than a week of developer time to tie up loose ends.

Recent work on TensorFlow provides further evidence of AMD’s commitment to open source initiatives, in general, and ROCm, in particular. Specifically, AMD’s release of TensorFlow v1.8 for ROCm-enabled GPUs provides a pre-built whl package, allowing a simple install akin to the installation of generic TensorFlow for Linux. With this release, AMD published installation instructions, and also a pre-built Docker image.

These efforts on the software side can be combined with Radeon Instinct™ GPU accelerators to speed up the execution of new AI and HPC applications.

GPUs are increasingly being used for new HPC and AI applications. Open source solutions offer many advantages including a huge developer community ensuring continuous enhancements to keep pace with changing requirements.

AMD is championing the open source ROCm efforts. The goal of which is to help organizations get the most out of their existing and new GPU hardware and to port and run parallelized applications on any hardware easily.

Scroll to top