Use case: how to build and run Docker containers with NVIDIA GPUs

DevOpsPUBLISHED ON August 23, 2020 | MODIFIED ON December 4, 2020

Docker Consulting Series – Building & Running Containers With NVIDIA GPUs

In this installment of our DevOps consulting series, we look at how to build and run containers using high-powered NVIDIA GPUs. GPU-accelerated computing is the use of a graphics processing unit to accelerate deep learning, analytics, and engineering applications. First introduced in 2007 by NVIDIA, today GPU accelerators power energy-efficient data-centers worldwide and play a key role in applications’ acceleration.

Containerizing GPU applications provides multiple benefits, such as ease of deployment, streamlined collaboration, isolation of individual devices and many more. However, Docker® containers are most commonly used to easily deploy CPU-based applications on several machines, where containers are both hardware- and platform-agnostic. Docker engine doesn’t support natively NDIVIA GPUs as it uses specialized hardware and requires installing he NVIDIA driver.

For one of our projects we had to use a graphics processing unit to build and run Docker containers. Further, we offer you a step-by-step description of how this was achieved.

To start, we’re going to need a server with NVIDIA GPU. Hetzner has a server with GeForce® GTX 1080

Requirements:

OS
CentOS 7.3 

Docker
Docker version 17.06.0-ce 

NVIDIA Drivers
latest 

 

Let’s download and install necessary drivers for this graphic card:

After downloading, we need to install driver, performing all the steps

1./NVIDIA-Linux-x86_64-<major_version>.<minor_version>.run 
Here’s how Nvidia and Docker work together:

We will need to install nvidia-docker и nvidia-docker-plugin. You can learn more about how to do that on nvidia github

1 wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm 
2 sudo rpm -i /tmp/nvidia-docker*.rpm && rm /tmp/nvidia-docker*.rpm

Launching service:

1  sudo systemctl start nvidia-docker 

Testing:

1  nvidia-docker run --rm nvidia/cuda nvidia-smi 

Should get the following result:

1 Thu Jul 27 13:44:07 2017 
2 +-------------------------------------------------------------+
3 | NVIDIA-SMI 375.20 Driver Version: 375.20 |
4 |--------------------+----------------+-----------+
5 | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
6 | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
7 |==================+=====================+======================|
8 | 0 GeForce GTX 1080 Off | 0000:01:00.0 Off | N/A |
9 | 33% 36C P8 11W / 180W | 0MiB / 8145MiB | 0% Default |  
10 +------------------------+---------------------+----------------------+
11   
12 +-----------------------------------------------------------------------+
13 | Processes: GPU Memory |
14 | GPU PID Type Process name Usage |
15 |================================================================|
16 | No running processes found |
17 +-----------------------------------------------------------------------+

Docker container with GPU support in orchestrator.

 

* Docker Swarm is not suitable as in docker-compose V3 there is no possibility to get in the inside of the device.

 

From the official website:

Thus, we can use the resources of the graphic card, but if we need to use orchestration tools, then the nvidia-docker will not be able to start, since it is an add-on over the Docker.

We’ve just launched container in the Rancher claster.

Now let’s dive into the details of what nvidia-docker actually is. Basically, this is a service that creates a Docker volume and mounts the devices into a container.

To find out what was created and mounted, we will need to run the following command:

1  curl -s http://localhost:3476/docker/cli 

Here’s the result:

volume-driver=nvidia-docker
volume=nvidia_driver_375.20:/usr/local/nvidia:ro
device=/dev/nvidiactl
device=/dev/nvidia-uvm
device=/dev/nvidia-uvm-tools
device=/dev/nvidia0

For mathematical calculations, we use a Python library – tensorflow-gpu (TensorFlow)

 

 

Let’s write Dockerfile, where the base image is taken from Docker Hub Nvidia/CUDA

 

1FROM nvidia/cuda:8.0-cudnn5-runtime-centos7
2
3 RUN pip install tensorflow-gpu
4
5 ENTRYPOINT ["python", "math.py"]

Then write docker-compose to build and run the compute container:

1 version: '2'
2 services:
3    math: 
4   build: .
5     volumes:
6       - nvidia_driver_375.20:/usr/local/nvidia:ro
7     devices:
8      - /dev/nvidiactl
9       - /dev/nvidia-uvm
10       - /dev/nvidia-uvm-tools
11       - /dev/nvidia0
12
13
14 volumes:
15 nvidia_driver_375.20:
16  river: nvidia-docker
17  external: true

 

Launching Docker container:

1  docker-compose up -d 

If everything is done correctly, then when you run the command:

 

1  nvidia-docker run --rm nvidia/cuda nvidia-smi 
You get the following result:
1 Thu Jul 27 15:12:40 2017
2 +-----------------------------------------------------------------------+
3| NVIDIA-SMI 375.20                 Driver Version: 375.20               |
4|--------------------------+-------------------+-------------------+
5| GPU  Name     Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC|
6| Fan  Temp  Perf  Pwr:Usage/Cap|     Memory-Usage | GPU-Util  Compute M.|
7|===============================+======================+========|
8|   0  GeForce GTX 1080    Off  | 0000:01:00.0     Off |             N/A |
9| 39%   53C   P2   86W / 180W |   7813MiB /  8145MiB |    56%    Default |
10 +-----------------------+----------------------+----------------------+
11
12 +----------------------------------------------------------------------+
13| Processes:                                              GPU Memory |
14|  GPU       PID  Type  Process name                      Usage      |
15|==============================================================|
16|    0     27798    C   python                             7803MiB |
17 +---------------------------------------------------------------------+

In the processes, you can see that python uses 56% of the GPU

Thus, we’ve just taught Docker, the leading container platform, to work with GeForce graphic cards, and it can now be used to containerize GPU-accelerated applications. This means you can easily containerize and isolate accelerated application without any modifications and deploy it on any supported GPU-enabled infrastructure.

 

Like it?
If you want to receive interesting information – subscribe!

Related Service

There no posts to show.