VMware vSphere BitFusion介绍
VMware在2019年8月收购了BitFusion,后者是硬件加速设备虚拟化领域的先驱,重点关注GPU技术。
BitFusion 提供了一个软件平台,可将特定的物理资源与环境中所连接的服务器分离,通过网络的方式让多个Client共享GPU计算能力。
VMware一直没有自己的GPU虚拟化技术,通过BitFusion终于补上了GPU虚拟化这块空白。
Bitfusion client运行AI / ML应用程序,该应用程序通过网络共享Bitfunsion Server上的直通GPU。
Bitfusion 可以将GPU内存划分为任意大小不同的切片,然后分配给不同的客户端以供同时使用。
Bitfusion 可以为VM和Container远程提供GPU计算能力。
Bitfusion 的 GPU 资源池有点类似于存储区域网络 SAN (Storage Attached Network),所以也有人把它叫作 GPU Attached Network。
安装
1. 配置NVIDIA显卡直通模式
确认NVIDIA显卡所在ESXi主机BIOS中Memory Mapped I/O above 4GB启用
配置ESXI主机上的显卡为直通模式,因为需要配置到BitFusion虚机上;
勾选所有NVIDIA显卡;
配置完成后需要重启一次,重启后就可以看到直通显卡。
2. 部署 BitFusion Server
官网下载Bitfusion OVA安装包
这里注意 Extra configuration里的pciPassthru.64bitMMIOSizeGB = 256,部署完bitfusion虚机后,需要根据实际显卡内存系数进行调整。
注意vCenter GUID, vCenter URL和vCenter TLS Certificate Thumbprint
配置密码和IP地址等信息(建议MTU配置为9000)
可以配置多个端口用于数据传输(可选)
OVF导入完成后,不要开机。
编辑虚机:
1. 调整CPU和内存,内存大小为显卡总内存 * 1.5
2. 添加网卡(默认只有1块网卡)
3. 添加所有直通显卡
修改虚机高级参数
pciPassthru.64bitMMIOSizeGB={n}
where n equals (num-cards * size-of-card-in-GB) rounded up to NEXT power of 2:
example A: 2 16GB cards => 2 * 16 => 32 => rounded to next power of 2 = 64
example B: 3 16GB cards => 3 * 16 => 48 => rounded to next power of 2 = 64
配置完成后打开BitFusion电源,等待10分钟左右,BitFusion会自动注册Plugin,刷新浏览器。
打开BitFusion管理界面
注意BitFusion需要ESXi分配vSphere Enterprise Plus License,否则会提示License无效。
3. 部署 BitFusion Client
注意:目前仅支持RHEL/Centos 7, Ubuntu 18.04/16.04
部署一台CentOS7虚机,不要开机,确认Run VMware Tools Scripts全部勾选;
右击CentOS 7 虚机,启用bitfusion client;
选择 For a client;
将CentOS7虚机开机,然后执行以下命令安装bitfusion client
安装bitfusion-client# yum install -y epel-release# rpm --import https://packages.vmware.com/bitfusion/vmware.bitfusion.key# yum install -y https://packages.vmware.com/bitfusion/centos/7/bitfusion-client-centos7-2.0.0-11.x86_64.rpm
将需要使用bitfusion的账号加入bitfusion group,我这里使用root账号
# Example: add “root” to the bitfusion group $ sudo usermod -aG bitfusion root
测试bitfusion client是否部署成功;
# connect to bitfusion and list all gpu$ bitfusion list_gpus - server 0 [10.10.10.11:56001]: running 0 tasks |- GPU 0: free memory 15109 MiB / 15109 MiB |- GPU 1: free memory 15109 MiB / 15109 MiB |- GPU 2: free memory 15109 MiB / 15109 MiB
如果成功显示bitfusion server上配置的直通显卡信息,恭喜你配置成功。
测试
1. 测试步骤
主要测试步骤:
• Create a VM
• Enable VM for Bitfusion
• Install Bitfusion Client
• Install CUDA 10.0
• Install CuDNN 7
• Install python3, if needed (CentOS)
• Install TensorFlow 1.13.1
• Install TensorFlow benchmarks (branch cnn_tf_v1.13_compatible)
• Run TensorFlow benchmarks
2. Enable VM for Bitfusion
创建一台CentOS7 虚机,不要开机,右击虚机启用bitfusion
选择For a client.
3. Install Bitfusion Client
将CentOS7开机,然后执行以下命令
# Install bitfusion client$ yum install -y epel-release$ rpm --import https://packages.vmware.com/bitfusion/vmware.bitfusion.key$ yum install -y https://packages.vmware.com/bitfusion/centos/7/bitfusion-client-centos7-2.0.0-11.x86_64.rpm # Add user to bitfusion group$ usermod -aG bitfusion root # Confirm user belongs to bitfusion group$ groupsroot bitfusion # Test Bitfusion $ bitfusion list_gpus - server 0 [10.10.10.10:56001]: running 0 tasks |- GPU 0: free memory 15109 MiB / 15109 MiB |- GPU 1: free memory 15109 MiB / 15109 MiB |- GPU 2: free memory 15109 MiB / 15109 MiB
安装完成后,自动注册到bitfusion server中;
4. 安装CUDA
CUDA is the NVIDA library allows programmatic access to their GPUs. It will be used by the TensorFlow benchmarks.
$ mkdir bitfusion$ cd bitfusion # install cuda-repo$ wget -P /etc/yun.repos.d/ https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo $ yum clean all$ yum install -y cuda-10-0
获取显卡设备信息
$ bitfusion run --num_gpus 1 nvidia-smiRequested resources:Server List: 10.10.10.101:56001Client idle timeout: 0 minWed Jul 29 12:07:35 2020 +-----------------------------------------------------------------------------+| NVIDIA-SMI 450.51.06 Driver Version: 440.64.00 CUDA Version: 10.2 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 Tesla T4 Off | 00000000:04:00.0 Off | 0 || N/A 28C P8 9W / 70W | 0MiB / 15109MiB | 0% Default || | | ERR! |+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+
5. 安装CuDNN
CuDNN is the Deep Neural Network library from NVIDIA. The TensorFlow benchmarks you will run later will require this library too.
需要到 https://developer.nvidia.com/cudnn 创建账号并下载 libcudnn7
并安装
$ sudo rpm -ivh libcudnn7-7.6.5.32-1.cuda10.0.x86_64.rpmPreparing... ################################# [100%]Updating / installing... 1: libcudnn7-7.6.5.32-1.cuda10.0 ################################# [100%] $ sudo ldconfig # update libraries list $ ldconfig -p | grep cudnn # to see if it is installed libcudnn.so.7 (libc6,x86-64) => /lib64/libcudnn.so.7
6. 安装Python3和TensorFlow
$ yum install python3$ pip3 install tensorflow-gpu==1.13.1
7. 安装TensorFlow Benchmarks
The benchmarks are open source ML applications designed to test performance on the TensorFlow framework.
$ cd ~/bitfusion $ git clone https://github.com/tensorflow/benchmarks.git $ cd benchmarks $ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/cnn_tf_v1.10_compatible ... remotes/origin/cnn_tf_v1.13_compatible ... $ git checkout cnn_tf_v1.13_compatible Branch cnn_tf_v1.13_compatible set up to track remote branch cnn_tf_v1.13_compatible from origin. Switched to a new branch ‘cnn_tf_v1.13_compatible’ $ git branch * cnn_tf_v1.13_compatible master
8. 执行BitFunsion测试
在没有GPU的情况下执行TensorFlow测试,测试结果如下,平均处理图片为每秒805.71张。
$ python3 ./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py ...Running warm upDone warm upStep Img/sec total_loss1 images/sec: 805.8 +/- 0.0 (jitter = 0.0) 14.29910 images/sec: 803.0 +/- 3.3 (jitter = 4.5) 14.29920 images/sec: 803.4 +/- 2.5 (jitter = 6.5) 14.29930 images/sec: 806.4 +/- 2.1 (jitter = 9.4) 14.29940 images/sec: 803.1 +/- 2.7 (jitter = 11.2) 14.29850 images/sec: 801.5 +/- 2.8 (jitter = 10.2) 14.29860 images/sec: 802.6 +/- 2.4 (jitter = 8.9) 14.29970 images/sec: 804.4 +/- 2.1 (jitter = 9.0) 14.29980 images/sec: 805.7 +/- 1.9 (jitter = 8.7) 14.29890 images/sec: 806.3 +/- 1.7 (jitter = 8.5) 14.298100 images/sec: 806.9 +/- 1.6 (jitter = 8.1) 14.298----------------------------------------------------------------total images/sec: 805.71----------------------------------------------------------------
通过BitFusion来执行TensorFlow测试,分配1个GPU显存资源,测试结果如下,平均处理图片为每秒7013.81张。
# bitfusion allocation 1 gpu$ bitfusion run --num_gpus 1 -- python3 ./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py ...Done warm upStep Img/sec total_loss1 images/sec: 5947.0 +/- 0.0 (jitter = 0.0) 14.29910 images/sec: 6015.8 +/- 19.5 (jitter = 51.8) 14.29920 images/sec: 6091.0 +/- 20.4 (jitter = 120.5) 14.29930 images/sec: 6111.4 +/- 15.0 (jitter = 62.6) 14.29940 images/sec: 6126.5 +/- 12.5 (jitter = 57.8) 14.29850 images/sec: 6226.8 +/- 73.0 (jitter = 66.4) 14.29860 images/sec: 6497.5 +/- 115.6 (jitter = 92.3) 14.29970 images/sec: 6705.9 +/- 122.1 (jitter = 121.1) 14.29980 images/sec: 6874.3 +/- 120.4 (jitter = 242.2) 14.29890 images/sec: 7014.0 +/- 116.0 (jitter = 374.4) 14.298100 images/sec: 7133.7 +/- 111.3 (jitter = 404.1) 14.298----------------------------------------------------------------total images/sec: 7013.81----------------------------------------------------------------
9. GPU分片测试
在BitFusion管理页面中为Client显示为0.5个GPU
执行测试命令,会提示错误
$ bitfusion run --num_gpus 1 -- python3 ./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py Error requesting gpus: Error starting dispatcher: Error sending heartbeat: Error when sending cluster session information: ErrorOverQuota: client 18e87c5 allocation over quota: 0.50 quota, 1.00 allocatedError starting dispatcher: Error sending heartbeat: Error when sending cluster session information: ErrorOverQuota: client 18e87c5 allocation over quota: 0.50 quota, 1.00 allocated
修改为0.5个分片,运行成功
$ bitfusion run --num_gpus 1 --partial 0.5 -- python3 ./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py Requested resources:Server List: 10.10.10.11:56001Client idle timeout: 1 min...pciBusID: 0000:00:00.0totalMemory: 7.38GiB freeMemory: 6.99GiB...2020-07-29 18:25:16.604751: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locallyDone warm upStep Img/sec total_loss1 images/sec: 5998.6 +/- 0.0 (jitter = 0.0) 14.29910 images/sec: 6005.8 +/- 12.4 (jitter = 32.3) 14.29920 images/sec: 6009.2 +/- 7.4 (jitter = 35.3) 14.29930 images/sec: 6001.9 +/- 7.1 (jitter = 40.8) 14.29940 images/sec: 6123.6 +/- 89.9 (jitter = 50.5) 14.29850 images/sec: 6436.6 +/- 130.8 (jitter = 66.8) 14.29860 images/sec: 6667.9 +/- 132.9 (jitter = 113.8) 14.29970 images/sec: 6849.3 +/- 127.8 (jitter = 209.3) 14.29980 images/sec: 6996.1 +/- 120.6 (jitter = 486.3) 14.29990 images/sec: 7131.8 +/- 116.1 (jitter = 406.1) 14.298100 images/sec: 7289.4 +/- 117.6 (jitter = 449.9) 14.298----------------------------------------------------------------total images/sec: 7165.57----------------------------------------------------------------
10. 测试总结
优点
- 使用简单,Client调用bitfusion GPU资源时和语言无关,直接在程序前加上: bitfusion run –num_gpus {n} –partial {n} — 即可。
- GPU共享,可以供多台VM通过网络调用GPU计算资源,Client用完GPU资源后就会立即释放,其他Client可以继续使用。
- 可以将GPU内存划分为任意大小不同的切片,然后分配给不同的客户端以供同时使用。
- 不需要NVIDIA许可。
缺点
- 目前仅支持 RHEL/Centos 7, Ubuntu 18.04/16.04