基于Nvidia GPU和Docker容器的深度學(xué)習(xí)環(huán)境搭建

caohaoyu 發(fā)布于2019-06-28 16:43 / 2194人閱讀

摘要：基于和容器的深度學(xué)習(xí)環(huán)境搭建云主機(jī)操作系統(tǒng)位安裝安裝如果沒(méi)有，需安裝安裝安裝有兩種方式安裝安裝本文選擇安裝方式。

基于Nvidia GPU和Docker容器的深度學(xué)習(xí)環(huán)境搭建

GPU云主機(jī)：

操作系統(tǒng)：Ubuntu 16.04 64位
GPU： 1 x Nvidia Tesla P40

1. 安裝CUDA Driver 1.1 Pre-installation Actions

安裝gcc、g++、make：

# sudo apt-get install gcc g++ make
# gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

如果沒(méi)有，需安裝linux-headers：

# sudo apt-get install linux-headers-$(uname -r)

1.2 安裝NVIDIA driver

CUDA安裝有兩種方式：
1.Package安裝
2.Runfile安裝

本文選擇runfile安裝方式。

首先禁用Nouveau：

# lsmod | grep nouveau
nouveau  1495040  0
mxm_wmi16384  1 nouveau
wmi20480  2 mxm_wmi,nouveau
video  40960  1 nouveau
i2c_algo_bit   16384  1 nouveau
ttm94208  1 nouveau
drm_kms_helper155648  1 nouveau
drm   364544  3 ttm,drm_kms_helper,nouveau
# vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
# sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.4.0-62-generic
W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.

Reboot云主機(jī)：

# reboot

重啟后check下Nouveau drivers沒(méi)有被load：

# lsmod | grep nouveau
#

登錄：http://developer.nvidia.com/c... 下載相應(yīng)的runfile：

# wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux

開(kāi)始安裝CUDA Driver:

# chmod +x cuda_10.0.130_410.48_linux
# sudo sh ./cuda_10.0.130_410.48_linux 
Logging to /tmp/cuda_install_1699.log
Using more to view the EULA.
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: y

Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: y

Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: 

Install the CUDA 10.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-10.0 ]: 

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 10.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /root ]: 

Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so

Installing the CUDA Samples in /root ...
Copying samples to /root/NVIDIA_CUDA-10.0_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-10.0
Samples:  Installed in /root, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_1699.log

安裝成功！

Reboot云主機(jī)：

# reboot

設(shè)備驗(yàn)證：

# ls /dev/nvidia*
ls: cannot access "/dev/nvidia*": No such file or directory
# vi nvidia-probe.sh

#!/bin/bash
### BEGIN INIT INFO
# Provides:          jd.com
# Required-Start:    $local_fs $network
# Required-Stop:     $local_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: nvidia service
# Description:       nvidia service daemon
### END INIT INFO

/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=`lspci | grep -i NVIDIA`
  N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
  NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`

  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
  done

  mknod -m 666 /dev/nvidiactl c 195 255

else
  exit 1
fi

/sbin/modprobe nvidia-uvm

if [ "$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=`grep nvidia-uvm /proc/devices | awk "{print $1}"`

  mknod -m 666 /dev/nvidia-uvm c $D 0
else
  exit 1
fi
    
# chmod +x nvidia-probe.sh 
# ./nvidia-probe.sh
# ls /dev/nvidia*
/dev/nvidia0  /dev/nvidiactl  /dev/nvidia-uvm

/dev下成功發(fā)現(xiàn)設(shè)備!

配置開(kāi)機(jī)自啟動(dòng)：

# cp nvidia-probe.sh /etc/init.d/
# sudo update-rc.d nvidia-probe.sh defaults 95

1.3 Post-installation Actions

配置環(huán)境變量：

# vi /etc/profile
......
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

開(kāi)機(jī)啟動(dòng)Persistence Daemon：

# vi /etc/rc.local
......
/usr/bin/nvidia-persistenced --verbose

exit 0

1.4 CUDA driver驗(yàn)證

查看Driver Version：

# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  410.48  Thu Sep  6 06:36:33 CDT 2018
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)

使用deviceQuery示例驗(yàn)證：

# cd ~/NVIDIA_CUDA-10.0_Samples/1_Utilities/deviceQuery/
# make
"/usr/local/cuda-10.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64-gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp
"/usr/local/cuda-10.0"/bin/nvcc -ccbin g++   -m64  -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o 
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
# cd ../../bin/x86_64/linux/release/
# ls
deviceQuery
# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla P40"
  CUDA Driver Version / Runtime Version  10.0 / 10.0
  CUDA Capability Major/Minor version number:6.1
  Total amount of global memory: 22919 MBytes (24032378880 bytes)
  (30) Multiprocessors, (128) CUDA Cores/MP: 3840 CUDA Cores
  GPU Max Clock rate:1531 MHz (1.53 GHz)
  Memory Clock rate: 3615 Mhz
  Memory Bus Width:  384-bit
  L2 Cache Size: 3145728 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:   65536 bytes
  Total amount of shared memory per block:   49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:   1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size(x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:  2147483647 bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution:  Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory:No
  Support host page-locked memory mapping:   Yes
  Alignment requirement for Surfaces:Yes
  Device has ECC support:Enabled
  Device supports Unified Addressing (UVA):  Yes
  Device supports Compute Preemption:Yes
  Supports Cooperative Kernel Launch:Yes
  Supports MultiDevice Co-op Kernel Launch:  Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 7
  Compute Mode:
 < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

參考：

https://github.com/NVIDIA/nvi...

https://docs.nvidia.com/cuda/...

2. 安裝Nvidia-docker 2.1 安裝Docker

安裝docker-ce：

#sudo apt-get remove docker docker-engine docker.io

# sudo apt-get install 
apt-transport-https 
ca-certificates 
curl 
software-properties-common
# curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# sudo add-apt-repository 
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu 
   $(lsb_release -cs) 
   stable"
# sudo apt-get update
# sudo apt-get install docker-ce
# docker version
Client:
 Version:   18.06.1-ce
 API version:   1.38
 Go version:go1.10.3
 Git commit:e68fc7a
 Built: Tue Aug 21 17:24:56 2018
 OS/Arch:   linux/amd64
 Experimental:  false

Server:
 Engine:
  Version:  18.06.1-ce
  API version:  1.38 (minimum version 1.12)
  Go version:   go1.10.3
  Git commit:   e68fc7a
  Built:Tue Aug 21 17:23:21 2018
  OS/Arch:  linux/amd64
  Experimental: false

2.2 安裝nvidia-docker

安裝nvidia-docker：

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | 
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | 
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

驗(yàn)證nvidia-docker：

# docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Thu Oct 25 09:03:27 2018   
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48|
|-------------------------------+----------------------+----------------------+
| GPU  NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P40   On   | 00000000:00:07.0 Off |0 |
| N/A   20CP8 9W / 250W |  0MiB / 22919MiB |  1%  Default |
+-------------------------------+----------------------+----------------------+
   
+-----------------------------------------------------------------------------+
| Processes:   GPU Memory |
|  GPU   PID   Type   Process name Usage  |
|=============================================================================|
|  No running processes found |
+-----------------------------------------------------------------------------+

2.3 配置Docker默認(rèn)runtime

cat /etc/docker/daemon.json

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

重啟服務(wù)：

# systemctl restart docker
# systemctl status docker

2.4 運(yùn)行TensorFlow卷積神經(jīng)Model

Docker運(yùn)行：

# docker run --rm --name tensorflow -ti tensorflow/tensorflow:r0.9-devel-gpu
root@bd0fb3758da2:~# python --version
Python 2.7.6
root@bd0fb3758da2:~# python -m tensorflow.models.image.mnist.convolutional

參考：

https://docs.docker.com/insta...

https://github.com/NVIDIA/nvi...

GPU云服務(wù)器云服務(wù)器基于深度學(xué)習(xí)的深度教學(xué) 基于深度學(xué)習(xí)的語(yǔ)音增強(qiáng) 基于深度學(xué)習(xí)的圖像識(shí)別基于深度學(xué)習(xí)的監(jiān)督語(yǔ)音分離

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://systransis.cn/yun/27508.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

caohaoyu

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

CloudCone：便宜VPS年付$17.99起，洛杉磯MC機(jī)房，優(yōu)化線路

閱讀 3081·2021-10-08 10:18
前端每日實(shí)戰(zhàn)：143# 視頻演示如何用 CSS 的 Grid 布局創(chuàng)作一枚小松鼠郵票

閱讀 798·2019-08-30 15:54
CSS垂直居中，你會(huì)多少種寫(xiě)法？

閱讀 1106·2019-08-29 18:43
Codepen 每周精選：本周最值得推薦的 23 個(gè)頁(yè)面特效（2018-5-28）

閱讀 2487·2019-08-29 15:33
前端基礎(chǔ)之CSS（1）

閱讀 1360·2019-08-29 15:29
javascript 理解和使用回調(diào)函數(shù)

閱讀 1651·2019-08-29 13:29
一個(gè)奇葩問(wèn)題引發(fā)的"吐血"

閱讀 1078·2019-08-26 13:46
高級(jí) Angular 組件模式 (6)

閱讀 1737·2019-08-26 11:55

成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

基于Nvidia GPU和Docker容器的深度學(xué)習(xí)環(huán)境搭建

相關(guān)文章

**AI開(kāi)發(fā)者福音！阿里云推出國(guó)內(nèi)首個(gè)基于英偉達(dá)NGC的GPU優(yōu)化容器**

用Docker玩轉(zhuǎn)深度學(xué)習(xí)

發(fā)表評(píng)論

0條評(píng)論

caohaoyu

男|高級(jí)講師

TA的文章

CloudCone：便宜VPS年付$17.99起，洛杉磯MC機(jī)房，優(yōu)化線路

前端每日實(shí)戰(zhàn)：143# 視頻演示如何用 CSS 的 Grid 布局創(chuàng)作一枚小松鼠郵票

CSS垂直居中，你會(huì)多少種寫(xiě)法？

Codepen 每周精選：本周最值得推薦的 23 個(gè)頁(yè)面特效（2018-5-28）

前端基礎(chǔ)之CSS（1）

javascript 理解和使用回調(diào)函數(shù)

一個(gè)奇葩問(wèn)題引發(fā)的"吐血"

高級(jí) Angular 組件模式 (6)

最新活動(dòng)

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

基于Nvidia GPU和Docker容器的深度學(xué)習(xí)環(huán)境搭建

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！