CodeShell

CodeShell VSCode Extension是一款 Visual Studio Code 的智能编码助手插件。官方的配套设施是：WisdomShell/CodeShell-7B-Chat大语言模型；服务器端API支持llama cpp for codeshell的CPU部署和Text Generation Inference（后面叫做TGI）的GPU部署。他的优势就是可以使用自己的API服务器，不用担心代码泄露……

CodeShell-7B-Chat 模型下载

魔搭社区就有这个模型，很方便。这里将模型放到了：D:\llm\WisdomShell\CodeShell-7B-Chat

git lfs install
git clone https://modelscope.cn/WisdomShell/CodeShell-7B-Chat.git

CPU部署

这里先测试一下 CPU的推理方式：利用llama cpp for codeshell项目，首先转换、量化得到GGUF，之后用server.exe启动API

llama cpp for codeshell

这是一个llama.cpp的Fork项目。项目地址为：GitHub - WisdomShell/llama_cpp_for_codeshell: CodeShell model in C/C++。这里将项目放到了E:\llm\llama_cpp_for_codeshell下。

llama_cpp_for_codeshell 项目结构

编译时参考llama.cpp - xiaodu114.github.io，如下图：

llama_cpp_for_codeshell 编译成功截图

接着创建 Python虚拟环境、安装依赖并将模型权重转为GGUF

#   在这个目录：E:\llm\llama_cpp_for_codeshell
python -m venv venv
.\venv\scripts\activate
pip install -r requirements.txt
python convert.py D:\llm\WisdomShell\CodeShell-7B-Chat

创建 Python虚拟环境、安装依赖并将模型权重转为GGUF

转换失败了，已经习惯了，是吧！这里没有去研究报错的原因，无意间发现了她……

直接下载GGUF

上面的方式失败之后，无意间在hf-mirror.com - Huggingface 镜像站搜索CodeShell发现，WisdomShell/CodeShell-7B-Chat-int4这个项目竟然存在GGUF文件，哈哈，不要太爽哦，直接下载。文件这里的路径为：D:\llm\WisdomShell\CodeShell-7B-Chat-int4\codeshell-chat-q4_0.gguf

启动API

进入上面llama cpp for codeshell项目编译之后的目录，这里是：E:\llm\llama_cpp_for_codeshell\build\bin\Release

./server.exe -m D:\llm\WisdomShell\CodeShell-7B-Chat-int4\codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 10002

llama_cpp_for_codeshell Server.exe 启动API

TGI

纯血版TGI：GitHub - huggingface/text-generation-inference

官方接口文档：Text Generation Inference API

WisdomShell版：GitHub - WisdomShell/text-generation-inference

这个项目简直了，部署实在是太曲折了，怪不得官网建议使用docker。好不容易部署成功了，启动WisdomShell/CodeShell-7B-Chat模型时还报错……还得用 WisdomShell 的版本。fork版本部署成功之后，你以为这就结束了，怎么可能，在16GB内存 + 12GB显存机器上的WSL2启动模型时直接卡死了……一度想着放弃，直到公司来了一台相对牛逼的服务器（Ubuntu 22.04），才看到了希望，要不然就没有这篇文章了。

参考文章：vllm vs TGI 部署 llama v2 7B 踩坑笔记、主流推理框架哪家强？看看它们在Llama 2上的性能比较

Rust

安装Rust参考：WSL2 - xiaodu114.github.io

安装其他依赖

#   gcc版本检测
gcc --version
#   安装依赖
sudo apt install libssl-dev gcc pkg-config unzip

protoc

这个可以先去GitHub - protocolbuffers/protobuf: Protocol Buffers - Google's data interchange format。这里下载的是protoc-25.1-linux-x86_64.zip

#   在文件 protoc-25.1-linux-x86_64.zip 的目录进入终端
sudo unzip -o protoc-25.1-linux-x86_64.zip -d /usr/local bin/protoc
sudo unzip -o protoc-25.1-linux-x86_64.zip -d /usr/local 'include/*'

源码修改

自己对Python和Rust项目不是很熟悉，对着编译时的报错，尝试修改，哪里报错改哪里……

WisdomShell/text-generation-inference 项目修改文件

server\Makefile

这里删除了install-torch，没有这个也会安装torch

$WisdomShell\text-generation-inference\server\Makefile 修改$

server\poetry.lock

这个应该算是torch版本修改的一部分，当时测试该项目的时候，他的最新版本是2.1.1。具体的内容是在纯血版TGI项目中拷过来的。

$WisdomShell\text-generation-inference-main\server\poetry.lock 修改$

server\pyproject.toml

这个也是torch版本修改

$WisdomShell\text-generation-inference-main\server\pyproject.toml 修改$

server\requirements.txt

这个也是torch版本修改

$WisdomShell\text-generation-inference-main\server\requirements.txt 修改$

\rust-toolchain.tom

这里指定的Rust版本和安装（项目中指定的太老了）的不一致，所以直接删除了

$WisdomShell\text-generation-inference-main\rust-toolchain.tom 修改$

创建虚拟环境

python3 -m venv venv
source ./venv/bin/activate

执行make命令

#   在上面激活的虚拟环境中执行
BUILD_EXTENSIONS=True make install

经过上面的修改，应该可以编译成功了。可以使用下面的命令检测一下：

text-generation-launcher --help

点击查看 text-generation-launcher --help 输出

编译成功之后还会.cargo/bin目录下添加两个文件，因为该目录已经添加到了$PATH，所以可以任意终端目录使用

.cargo/bin 目录

启动API

max-total-tokens等参数参考官方GitHub - WisdomShell/codeshell-vscode docker的启动参数

项目目录启动

#   tgi-codeshell目录打开终端

#   进入虚拟环境
source ./venv/bin/activate

#   启动    注意：替换自己的模型权重路径、IP地址、端口等
#   如果是多显卡，需要 CUDA_VISIBLE_DEVICES 和 num-shard 配合使用。
#       CUDA_VISIBLE_DEVICES=0 #第1块显卡   CUDA_VISIBLE_DEVICES=1 #第2块显卡   CUDA_VISIBLE_DEVICES=0,1 #第1、2块显卡
#       num-shard 感觉像是使用的显卡的块数
CUDA_VISIBLE_DEVICES=1 text-generation-launcher \
--model-id /llm/0-model/WisdomShell/CodeShell-7B-Chat \
--hostname 192.168.xxx.xxx -p 10002 \
--num-shard 1 \
--max-total-tokens 5000 --max-input-length 4096 \
--max-stop-sequences 12 \
--trust-remote-code

sh脚本启动

每次进入项目、打开终端、激活虚拟环境、启动，太繁琐了。弄一个sh脚本，“双击”启动，多爽。脚本内容如下：

#!/bin/bash

# 打开一个新的终端，并在指定目录进入虚拟环境venv
gnome-terminal --working-directory=/llm/2-code/tgi-codeshell -- /bin/bash -c 'source ./venv/bin/activate; 
CUDA_VISIBLE_DEVICES=1 /home/xxx/.cargo/bin/text-generation-launcher \
--model-id /llm/0-model/WisdomShell/CodeShell-7B-Chat \
--hostname 192.168.xxx.xxx -p 10002 \
--num-shard 1 \
--max-total-tokens 5000 --max-input-length 4096 \
--max-stop-sequences 12 \
--trust-remote-code;
exec /bin/bash'

exit

sh脚本启动，目前还有问题，根据错误提示，搜索项目找到
$launcher\src\main.rs 源码$

根据代码发现，应该是这里的问题：Command::new("text-generation-router")。不知道这里替换成绝对路径行不行，还没有来得及测试
sh脚本好像不能访问$PATH中的命令，所以上面text-generation-launcher使用的是绝对路径

CodeShell VSCode Extension

项目地址：GitHub - WisdomShell/codeshell-vscode

怎么安装插件咱就不说了

CPU推理对应配置

CodeShell VSCode Extension 插件配置 CPU推理对应配置

TGI对应配置

CodeShell VSCode Extension 插件配置 TGI对应配置