Use case: how to build and run Docker containers with NVIDIA GPUs

A step-by-step guide to set-up

DevOpsUPDATED ON June 4, 2021

DevOps Docker container orchestration with Nvidia GPUs

In this instalment of our DevOps consulting series, we look at how to build and run Docker containers using high-powered NVIDIA GPUs. GPU-accelerated computing is the use of a graphics processing unit to accelerate deep learning, analytics, and engineering applications. First introduced in 2007 by NVIDIA, today GPU accelerators power energy-efficient data-centres worldwide and play a key role in applications’ acceleration.

Containerizing GPU applications leads to a number of benefits including ease of deployment, streamlined collaboration, isolation of individual devices and many more. However, Docker® containers are most commonly used to easily deploy CPU-based applications on several machines, where containers are both hardware- and platform-agnostic. The Docker engine doesn’t support natively NVIDIA GPUs as it uses specialized hardware and requires the NVIDIA driver to be installed.

This is our experience of using a graphics processing unit to build and run Docker containers and a step-by-step description of how this was achieved.

When does IT Outsourcing work?

(And when doesn’t it?)

To start, we’re going to need a server with NVIDIA GPU. Hetzner has a server with GeForce® GTX 1080

Requirements:

OS
CentOS 7.3 

Docker
Docker version 17.06.0-ce 

NVIDIA Drivers
latest 

Let’s download and install the necessary drivers for this graphic card:

After downloading, we need to install the driver, performing all the steps

1./NVIDIA-Linux-x86_64-<major_version>.<minor_version>.run 
Here’s how Nvidia and Docker work together:

We will need to install nvidia-docker и nvidia-docker-plugin. You can learn more about how to do that on nvidia github

1 wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm 
2 sudo rpm -i /tmp/nvidia-docker*.rpm && rm /tmp/nvidia-docker*.rpm

Launching service:

1  sudo systemctl start nvidia-docker 

Testing:

1  nvidia-docker run --rm nvidia/cuda nvidia-smi 

Should get the following result:

1 Thu Jul 27 13:44:07 2017 
2 +-------------------------------------------------------------+
3 | NVIDIA-SMI 375.20 Driver Version: 375.20 |
4 |--------------------+----------------+-----------+
5 | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
6 | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
7 |==================+=====================+======================|
8 | 0 GeForce GTX 1080 Off | 0000:01:00.0 Off | N/A |
9 | 33% 36C P8 11W / 180W | 0MiB / 8145MiB | 0% Default |  
10 +------------------------+---------------------+----------------------+
11   
12 +-----------------------------------------------------------------------+
13 | Processes: GPU Memory |
14 | GPU PID Type Process name Usage |
15 |================================================================|
16 | No running processes found |
17 +-----------------------------------------------------------------------+

Docker container with GPU support in orchestrator.

* Docker Swarm is not suitable as in docker-compose V3 there is no possibility to get in the inside of the device.

From the official website:

Thus, we can use the resources of the graphic card, but if we need to use orchestration tools, then the nvidia-docker will not be able to start, since it is an add-on over the Docker.

We’ve just launched a container in the Rancher cluster.

Now let’s dive into the details of what Nvidia-docker actually is. Basically, this is a service that creates a Docker volume and mounts the devices into a container.

To find out what was created and mounted, we will need to run the following command:

1  curl -s http://localhost:3476/docker/cli 

Here’s the result:

volume-driver=nvidia-docker
volume=nvidia_driver_375.20:/usr/local/nvidia:ro
device=/dev/nvidiactl
device=/dev/nvidia-uvm
device=/dev/nvidia-uvm-tools
device=/dev/nvidia0

For mathematical calculations, we use a Python library – tensorflow-gpu (TensorFlow)

Let’s write Dockerfile, where the base image is taken from Docker Hub Nvidia/CUDA

1FROM nvidia/cuda:8.0-cudnn5-runtime-centos7
2
3 RUN pip install tensorflow-gpu
4
5 ENTRYPOINT ["python", "math.py"]

Then write docker-compose to build and run the compute container:

1 version: '2'
2 services:
3    math: 
4   build: .
5     volumes:
6       - nvidia_driver_375.20:/usr/local/nvidia:ro
7     devices:
8      - /dev/nvidiactl
9       - /dev/nvidia-uvm
10       - /dev/nvidia-uvm-tools
11       - /dev/nvidia0
12
13
14 volumes:
15 nvidia_driver_375.20:
16  river: nvidia-docker
17  external: true

Launching Docker container:

1  docker-compose up -d 

If everything is done correctly, then when you run the command:

1  nvidia-docker run --rm nvidia/cuda nvidia-smi 
You get the following result:
1 Thu Jul 27 15:12:40 2017
2 +-----------------------------------------------------------------------+
3| NVIDIA-SMI 375.20                 Driver Version: 375.20               |
4|--------------------------+-------------------+-------------------+
5| GPU  Name     Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC|
6| Fan  Temp  Perf  Pwr:Usage/Cap|     Memory-Usage | GPU-Util  Compute M.|
7|===============================+======================+========|
8|   0  GeForce GTX 1080    Off  | 0000:01:00.0     Off |             N/A |
9| 39%   53C   P2   86W / 180W |   7813MiB /  8145MiB |    56%    Default |
10 +-----------------------+----------------------+----------------------+
11
12 +----------------------------------------------------------------------+
13| Processes:                                              GPU Memory |
14|  GPU       PID  Type  Process name                      Usage      |
15|==============================================================|
16|    0     27798    C   python                             7803MiB |
17 +---------------------------------------------------------------------+

In the processes, you can see that python uses 56% of the GPU

Thus, we’ve just taught Docker, the leading container platform, to work with GeForce graphic cards, and it can now be used to containerize GPU-accelerated applications. This means you can easily containerize and isolate accelerated application without any modifications and deploy it on any supported GPU-enabled infrastructure.

K&C - Creating Beautiful Technology Solutions For 20+ Years . Can We Be Your Competitive Edge?

Drop us a line to discuss your needs or next project

Related Service

Cloud Native Development, Migration, Infrastructure & Consulting Agency

Read more >