llama.cpp推理-树莓派4b

参考链接：https://github.com/ggerganov/llama.cpp

https://qwen.readthedocs.io/zh-cn/latest/run_locally/llama.cpp.html

裸机方式运行，非特殊框架，可以直接拉取官方镜像：

硬件信息：

平台架构：aarch64

操作系统: Ubuntu 22.04.3 LTS

CPU: 2

MEM:4G

创建环境

主要包括拉取代码、下载模型或生成模型等步骤

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


# 1. 拉取代码
git clone https://github.com/ggerganov/llama.cpp llmama.cpp
# 2. 编译代码
cd llama.cpp
make
make llama-cli  # 如无需本地交互测试，不需要执行
# 3. 手动下载模型
#https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-GGUF/tree/main
https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-GGUF/blob/main/qwen2-1_5b-instruct-q5_k_m.gguf
# 或自己生成，需要python环境
python convert-hf-to-gguf.py Qwen/Qwen2-7B-Instruct --outfile qwen2-1_5b-instruct-f16.gguf

交互式测试

参数说明：

-m 指模型地址
-cnv 指会话模式
-p 指的输入信息，必须要传入

1

./llama-cli  -m models/qwen2-1_5b-instruct-q5_k_m.gguf -cnv -p '你是一个人工智能专家'

运行成功的截图如下：

API 发布测试

1

./llama-server -m /opt/codes/models/qwen2-1_5b-instruct-q5_k_m.gguf  --port 8000 --host 0.0.0.0

前台页面访问测试

http://192.168.2.189:8080/