构建指定cuda版本的vllm镜像

1. 准备基础镜像

一般情况,可以参考:[GitHub Action] 自动化创建,vllm构建过程所需的磁盘和内存资源较大,github action提供的免费资源不足(4c16g,14 ssd)会导致镜像构建失败。需要使用自己的服务器

参考https://github.com/vllm-project/vllm/blob/main/Dockerfile

可知,所需的基础镜像如下: picture 0

picture 1

如目标cuda版本为12.1.1,则需要的基础镜像为:

nvidia/cuda:12.1.1-devel-ubuntu20.04

nvidia/cuda:12.1.1-devel-ubuntu22.04

  • 查询该镜像是否存在

https://explore.ggcr.dev/

picture 3

picture 5

以上镜像组合式存在的。如果不存在,可在该网站查询相近的镜像组合

picture 6

如镜像列表:https://explore.ggcr.dev/?repo=nvidia%2Fcuda

如找不到目标基础镜像或相邻镜像也不满足要求,可以考虑构建自己的基础镜像,见nvidia/cuda代码库:https://gitlab.com/nvidia/container-images/cuda 。 这里暂不展开

2. 同步基础镜像到国内镜像仓库(可选,如果具备外网服务器或可访问官方的镜像仓库,可不做)

使用github工作流自动同步:

  • 所有工作流yaml:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    
    # 工作流名称
    name: Sync-Images-to-Aliyun-CR
    # 工作流运行时显示名称
    run-name: ${{ github.actor }} is Sync Images to  Aliyun CR.
    # 怎样触发工作流
    on:
      push:
        branches: [ "main" ]
    
      # Allows you to run this workflow manually from the Actions tab
      workflow_dispatch:
    # 工作流程任务(通常含有一个或多个步骤)
    jobs:
      syncimages:
        runs-on: ubuntu-latest
        steps:
        - name: Checkout Repos
          uses: actions/checkout@v4
    
        - name: Login to Aliyun CR
          uses: docker/login-action@v3
          with:
            registry: ${{ vars.ALY_REGISTRY}}
            username: ${{ secrets.ALY_UNAME }}
            password: ${{ secrets.ALY_PASSWD }}
            logout: false
    
        - uses: actions/setup-python@v5
          with:
            python-version: '3.10' 
        - run: python scrips/pull_image.py   
    
  • 对应处理镜像代码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# -*- coding: utf-8 -*-
import os,re,subprocess,datetime,json
lines = []
rs = []
ds_reg = 'registry.cn-hangzhou.aliyuncs.com/reg_pub/'
with open('images.txt', 'r') as f:
    lines = f.read().split('\n')
for img in lines:
    # ims = re.split('\s+', img)
    # p =None
    # if len(ims) > 1 and len(ims[1]) > 0:
    #     p = ims[1]
    if len(img) > 0:
        ims = img.replace('/', '_')
        ds = ds_reg + ims
  
        cmds = 'skopeo copy --all docker://' + img + ' docker://' + ds
        code,rss = subprocess.getstatusoutput(cmds)
        print('*'*20+'\n'+"copy {} to {}".format(img, ds) + '\n' + "rs:\n code:{}\n rs:{}".format(code, rss)  )
        if code != 0:
            cmds = 'docker pull ' + img + ' && docker tag ' + img + ' ' + ds + ' && docker push  ' + ds + ' && docker rmi {} {} '.format(img,ds)
            code,rss = subprocess.getstatusoutput(cmds)
            print('skopeo copy error, try docker pull and push:' + "\n" + cmds + '\n' + "rs:\n code:{}\n rs:{}".format(code, rss))

      
        r = {
            'src': img,
            'ds': ds,
            'code': code,
            'rs': rss
        }
        rs.append(r)
ctime =datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S')
with open('result-{}.json'.format(ctime), encoding='utf-8', mode='w') as f:
    json.dump(rs, f)
print(r)


  
  • 镜像同步代码仓库结构:

picture 7

3. 准备Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 注意确认镜像中的操作系统版本+cuda版本是否存在,查看:https://explore.ggcr.dev/?repo=nvidia%2Fcuda
FROM nvidia/cuda:12.1.1-devel-ubuntu22.04 AS vllm-base
# cuda版本:vllm目前只支持cuda 12.4、12.1、11.8 其他版本需要自己编译
ARG CUDA_VERSION=121

# vllm版本
ARG VLLM_VERSION=0.8.1 
# 防止需要交互卡主打包过程
ENV DEBIAN_FRONTEND=noninteractive

# 更新环境
RUN  apt update && \
apt install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev libbz2-dev liblzma-dev tk-dev wget

# 安装python,如需变更python版本这里调整
RUN wget https://www.python.org/ftp/python/3.12.9/Python-3.12.9.tgz && \
tar -zxvf Python-3.12.9.tgz && \
cd Python-3.12.9  && \
./configure --enable-optimizations --prefix=/usr/local/python3.12  && make -j$(nproc)    &&  \
make altinstall  && \
ln -s /usr/local/python3.12/bin/python3.12 /usr/bin/python3 && \
ln -s /usr/local/python3.12/bin/pip3.12 /usr/bin/pip3  && \
ln -s /usr/local/python3.12/bin/pip3.12 /usr/bin/pip  && \
rm -rf Python-3.12.9.tgz Python-3.12.9 

# 设置PATH
ENV PATH="/usr/bin:/usr/local/python3.12/bin:$PATH"

# 安装vllm,注意所需版本,如无合适的whl需要考虑自己编译源码
RUN wget https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu${CUDA_VERSION}-cp38-abi3-manylinux1_x86_64.whl 
RUN pip install vllm-${VLLM_VERSION}+cu${CUDA_VERSION}-cp38-abi3-manylinux1_x86_64.whl   --extra-index-url https://download.pytorch.org/whl/cu${CUDA_VERSION}
RUN rm -f vllm-${VLLM_VERSION}+cu${CUDA_VERSION}-cp38-abi3-manylinux1_x86_64.whl 
ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]



4. 打镜像并推送

核心命令如下:

1
2
3
docker build  --tag vllm/vllm-openai  .  # 如服务器配置较高,可以使用该参数--build-arg max_jobs=8 --build-arg nvcc_threads=2  或针对性调整,如资一般,不建议使用
docker tag vllm/vllm-openai registry.cn-hangzhou.aliyuncs.com/reg_pub/vllm_vllm-openai:cuda-12.1.1-20250320
docker push registry.cn-hangzhou.aliyuncs.com/reg_pub/vllm_vllm-openai:cuda-12.1.1-20250320







使用 Hugo 构建
主题 StackJimmy 设计