格局小了! LLM 功能只是 BigDL 的一部分,原以为只是关于大语言模型的。英特尔出品
说是针对大语言模型的优化加速库,也确实是如此,有点类似
这次的测试环境还是我的老朋友,和初次体验
为了测试上述功能,新建了一个项目。根据不同的功能做一个分类:01_history_chat 、02_knowledge_chat 、 03_api 、 config 等。
这里存放一些配置文件或者公共变量等
模型相关的配置放到这里,例如:本地模型的路径
model_path_dict = {
"baichuan" : {
"2-7B-Chat" : "D:\\llm\\baichuan-inc\\Baichuan2-7B-Chat",
},
"THUDM" : {
"2-6b" : "D:\llm\THUDM\chatglm2-6b",
"3-6b" : "D:\llm\THUDM\chatglm3-6b"
},
"Qwen" : {
"7B-Chat" : "D:\llm\Qwen\Qwen-7B-Chat"
},
"01ai" : {
"6B-Chat" : "D:\\llm\\01ai\\Yi-6B-Chat"
}
}
准备了一些测试问题,放在了该文件中
history_chat_questions = ["你好", "中国的首都是", "他的面积是多少", "他有几座机场", "一共问了你几个问题"]
# 创建、激活虚拟环境
python -m venv venv
.\venv\scripts\activate
# 【linux】创建、激活虚拟环境
python3 -m venv venv
source ./venv/bin/activate
# 退出虚拟环境
deactivate
pip install --pre --upgrade bigdl-llm[all]
# 【linux】安装CPU版本的 torch
# 说明:如果需要 linux 下 CPU 版本的 torch ,请先安装 torch 之后再安装 bigdl-llm
# !!!测试发现先安装 bigdl-llm 之后再安装 CPU 版本的 torch 没有成功
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
bigdl-llm[all] 包依赖 transformers , 安装时会自动将其带上
2024-02-26 bigdl-llm[all]的版本是:2.5.0b20240225 ,对应的 transformers 版本是:4.31.0
2024-02-27 bigdl-llm[all]的版本是:2.5.0b20240226 ,对应的 transformers 版本是:4.31.0
# 引用本地依赖需要,例如:引用 config.model
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModelForCausalLM
from config.model import model_path_dict
from config.question import history_chat_questions
model_path = model_path_dict["baichuan"]["2-7B-Chat"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)
# model.chat 方法位于模型权重文件夹的 modeling_baichuan.py 文件中,如下:
'''
def chat(self, tokenizer, messages: List[dict], stream=False, generation_config: Optional[GenerationConfig]=None)
'''
# 该方法没有 temperature 、 top_p 等参数, generation_config 参数这里没有传入。默认采用的是 generation_config.json 文件中的配置
# 你可以这样获取模型权重的 GenerationConfig
'''
from transformers.generation.utils import GenerationConfig
generation_config = GenerationConfig.from_pretrained(model_path)
print(str(generation_config))
'''
messages=[]
for question in history_chat_questions:
print("问:" + question)
messages.append({"role": "user", "content": question})
response = model.chat(tokenizer, messages)
messages.append({"role": "assistant", "content": response})
print("答:" + str(response))
print(str(messages))
这个比较顺利,仅安装
执行时会有如下的警告
WARNING - Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModel
from config.model import model_path_dict
from config.question import history_chat_questions
model_path = model_path_dict["THUDM"]["2-6b"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
# model.chat 方法位于模型权重文件夹的 modeling_chatglm.py 文件中,如下:
'''
def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 8192, num_beams=1,do_sample=True, top_p=0.8, temperature=0.8, logits_processor=None, **kwargs)
'''
messages = []
history = None
for question in history_chat_questions:
print("问:" + question)
response, history = model.chat(tokenizer, question, history=history)
print("答:" + str(response))
for question, response in history:
messages.append({"role": "user", "content": question})
messages.append({"role": "assistant", "content": response})
print(str(messages))
chatglm2-6b 模型官方建议依赖的 transformers 版本为:4.30.2 。transformers 4.31.0,这个版本也可以,下面看一下运行效果:
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModel
from config.model import model_path_dict
from config.question import history_chat_questions
model_path = model_path_dict["THUDM"]["3-6b"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
# model.chat 方法位于模型权重文件夹的 modeling_chatglm.py 文件中,如下:
'''
def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, role: str = "user", max_length: int = 8192, num_beams=1, do_sample=True, top_p=0.8, temperature=0.8, logits_processor=None, **kwargs)
'''
# 【说明】感觉形参 history 的数据类型写错了,该方法内部会调用 tokenization_chatglm.py 文件中的 build_chat_input 方法(类型明显不一致)
history = None
for question in history_chat_questions:
print("问:" + question)
response, history = model.chat(tokenizer, question, history=history)
print("答:" + str(response))
print(str(history))
chatglm3-6b 也是如此,仅安装
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModelForCausalLM
from config.model import model_path_dict
from config.question import history_chat_questions
model_path = model_path_dict["Qwen"]["7B-Chat"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
# model.chat 方法位于模型权重文件夹的 modeling_qwen.py 文件中,如下:
'''
def chat(self,tokenizer: PreTrainedTokenizer,query: str,history: Optional[HistoryType],system: str = "You are a helpful assistant.",append_history: bool = True,stream: Optional[bool] = _SENTINEL,stop_words_ids: Optional[List[List[int]]] = None,generation_config: Optional[GenerationConfig] = None,**kwargs,) -> Tuple[str, HistoryType]
'''
messages=[]
history = None
for question in history_chat_questions:
print("问:" + question)
response, history = model.chat(tokenizer, question, history=history)
print("答:" + str(response))
for question, response in history:
messages.append({"role": "user", "content": question})
messages.append({"role": "assistant", "content": response})
print(str(messages))
Qwen-7B-Chat 也还可以,除了已安装的
比较爽的是,发现了下面这些站点,遗憾的是没有找到支持的其他模型。
import sys
sys.path.append(".")
from typing import Optional, Tuple, List
import copy
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModelForCausalLM
from config.model import model_path_dict
from config.question import history_chat_questions
model_path = model_path_dict["01ai"]["6B-Chat"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
# 这个模型比较费劲啊,没有 model.chat 方法,下面是自己封装的一些方法
def chat1(
model,
tokenizer,
query: str,
history: Optional[List[Tuple[str, str]]],
system: str = "You are a helpful assistant.",
**kwargs,
) -> Tuple[str, List[Tuple[str, str]]]:
im_start, im_end = "<|im_start|>", "<|im_end|>"
# tokenizer.encode("<|im_start|>", return_tensors="pt") => 6
# tokenizer.encode("<|im_end|>", return_tensors="pt") => 7
if history is None:
history = []
else:
# make a copy of the user's input such that is is left untouched
history = copy.deepcopy(history)
raw_text = f"\n{im_start}system\n{system}{im_end}\n"
for turn_query, turn_response in history:
raw_text += f"{im_start}user\n{turn_query}{im_end}\n"
raw_text += f"{im_start}assistant\n{turn_response}{im_end}\n"
raw_text += f"{im_start}user\n{query}{im_end}\n"
raw_text += f"{im_start}assistant\n"
input_ids = tokenizer.encode(raw_text, return_tensors="pt")
outputs = model.generate(input_ids,
do_sample=True,
max_new_tokens=4096,
top_p=0.8,
temperature=0.8)
response = tokenizer.decode(outputs[0][len(input_ids[0]):], skip_special_tokens=True)
history.append((query, response))
return response, history
def chat2(model, tokenizer, messages, system: str = "You are a helpful assistant."):
im_start, im_end = "<|im_start|>", "<|im_end|>"
prompt = f"\n{im_start}system\n{system}{im_end}\n"
for message in messages:
match message["role"]:
case "user":
prompt += f"{im_start}user\n{message['content']}{im_end}\n"
case "assistant":
prompt += f"{im_start}assistant\n{message['content']}{im_end}\n"
prompt += f"{im_start}assistant\n"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(input_ids,
do_sample=True,
max_new_tokens=4096,
top_p=0.8,
temperature=0.8)
response = tokenizer.decode(outputs[0][len(input_ids[0]):], skip_special_tokens=True)
return response
def chat3(model,tokenizer,messages):
# apply_chat_template 方法需要安装较新版本的 transformers
input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
output_ids = model.generate(input_ids)
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
return response
print("------第一轮------")
messages1=[]
history = None
for question in history_chat_questions:
print("问:" + question)
response, history = chat1(model, tokenizer, question, history=history)
print("答:" + str(response))
for question, response in history:
messages1.append({"role": "user", "content": question})
messages1.append({"role": "assistant", "content": response})
print(str(messages1))
print("------第二轮------")
messages2=[]
for question in history_chat_questions:
print("问:" + question)
messages2.append({"role": "user", "content": question})
response = chat2(model, tokenizer, messages2)
messages2.append({"role": "assistant", "content": response})
print("答:" + str(response))
print(str(messages2))
print("------第三轮------")
messages3=[]
for question in history_chat_questions:
print("问:" + question)
messages3.append({"role": "user", "content": question})
response = chat3(model,tokenizer, messages3)
messages3.append({"role": "assistant", "content": response})
print("答:" + str(response))
print(str(messages3))
Yi-6B-Chat 这个就比较费劲儿了,没有类似上面的
------第一轮------
问:你好
答:你好!有什么我可以帮助你的吗?
问:中国的首都是
答:中国的首都是北京。
问:他的面积是多少
答:他的面积是多少?这个问题需要更多的信息才能给出一个准确的答案。如果你能
提供更多的关于他的信息,比如他的名字,出生日期,住址等等,那么我就可以根据
你所提供的这些信息来帮助你回答这个问题:他的面积是多少?
问:他有几座机场
答:他有几座机场?这个问题需要更多的信息才能给出一个准确的答案。如果你能提
供更多的关于他的信息,比如他的名字,出生日期,住址等等,那么我就可以根据你
所提供的这些信息来帮助你回答这个问题:他有几座机场?
问:一共问了你几个问题
答:一共问了你几个问题?这个问题需要更多的信息才能给出一个准确的答案。如果
你能提供更多的关于你的信息,比如你的名字,出生日期,住址等等,那么我就可以
根据你所提供的这些信息来帮助你回答这个问题:一共问了你几个问题??
------第二轮------
问:你好
答:你好!有什么我可以帮助你的吗?
问:中国的首都是
答:中国的首都是北京。
问:他的面积是多少
答:他的面积是多少?看起来你可能是在询问某人的面积。但是,由于缺乏上下文,
我无法提供你询问的面积。如果你能提供更多的上下文,我将很乐意帮助你找到你询
问的面积。
问:他有几座机场
答:他有几座机场?看起来你可能是在询问某人的机场数量。但是,由于缺乏上下文
,我无法提供你询问的机场数量。如果你能提供更多的上下文,我将很乐意帮助你找
到你询问的机场数量。
问:一共问了你几个问题
答:一共问了你几个问题?看起来你可能是在询问你总共被问了几个问题。但是,由
于缺乏上下文,我无法提供你询问的上下文。如果你能提供更多的上下文,我将很乐
意帮助你找到你询问的上下文。
------第三轮------
问:你好
Traceback (most recent call last):
File "E:\llm\bigdl-001\01_history_chat\yi-6b-chat.py", line 109, in <module>
response = chat3(model,tokenizer, messages3)
File "E:\llm\bigdl-001\01_history_chat\yi-6b-chat.py", line 77, in chat3
input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
AttributeError: 'LlamaTokenizerFast' object has no attribute 'apply_chat_template'
说明:在这个版本下没有根据上下文(对话历史)回答问题,并且第三轮还报错了,第三轮需要更高版本的 transformers 来支持 apply_chat_template 方法
这个版本会报如下错误(去掉了一些堆栈信息)
TypeError: LlamaRotaryEmbedding.forward() missing 1 required positional argument: 'position_ids'
之前测试的时候(2024-02-19) 安装 transformers==4.37.2 配合 bigdl 测试时会有如下错误,这次没有验证,如下:
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
在
这里的测试是在
新增目录
1、
# 参考 Langchain-Chatchat 项目的写法
# 会在知识库问答中使用
knowledge_chat_prompt_template="""<指令>根据已知信息,简洁和专业的来回答问题。如果无法从中得到答案,请说 “根据已知信息无法回答该问题”,不允许在答案中添加编造成分,答案请使用中文。 </指令>
<已知信息>{knowledge}</已知信息>
<问题>{question}</问题>"""
2、
from .prompt_template import knowledge_chat_prompt_template
history_chat_questions = ["你好", "中国的首都是", "他的面积是多少", "他有几座机场", "一共问了你几个问题"]
knowledge_chat_questions = [
"你好",
"中国的首都是",
"他的面积是多少",
"他有几座机场",
"你知道 xiaodu114 吗?",
knowledge_chat_prompt_template.format(knowledge='''一个关于xiaodu114的秘密
xiaodu114很喜欢球类运动,例如:乒乓球、羽毛球、网球等。
xiaodu114热爱学习,并且非常喜欢编程,🤭🤭🤭
你知道吗?xiaodu114还是Reddit的注册用户哦!访问地址为:https://www.reddit.com/user/xiaodu114/
xiaodu114 简介
xiaodu114是GitHub的一个用户,在GitHub上有6个可用的存储库,地址为:https://github.com/xiaodu114 。
xiaodu114同时也是npmjs的注册用户,目前已经创建了两个库:a2bei4 和 a2bei4-rollup-plugin 。''', question="介绍一下 xiaodu114")]
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModelForCausalLM
from config.model import model_path_dict
from config.question import knowledge_chat_questions
model_path = model_path_dict["baichuan"]["2-7B-Chat"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)
def chat(model,tokenizer,messages:list[dict]):
prompt = ""
for message in messages:
match message["role"]:
case "user":
prompt += "<reserved_106>" + message["content"] + "\n"
case "assistant":
prompt += "<reserved_107>" + message["content"] + "\n"
prompt += "<reserved_107>"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
output_ids = model.generate(input_ids,
do_sample=True,
max_new_tokens=4096,
top_p=0.8,
temperature=0.8)
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
return response
messages=[]
for question in knowledge_chat_questions:
print("问:" + question)
messages.append({"role": "user", "content": question})
response = chat(model, tokenizer, messages)
messages.append({"role": "assistant", "content": response})
print("答:" + str(response))
print(str(messages))
安装的依赖包和上面对应的历史对话章节相同。下面看一下运行效果:
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModel
from config.model import model_path_dict
from config.question import knowledge_chat_questions
model_path = model_path_dict["THUDM"]["2-6b"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
messages=[]
history = None
for question in knowledge_chat_questions:
print("问:" + question)
response, history = model.chat(tokenizer, question, history=history)
print("答:" + str(response))
for question, response in history:
messages.append({"role": "user", "content": question})
messages.append({"role": "assistant", "content": response})
print(str(messages))
安装的依赖包和上面对应的历史对话章节相同。下面看一下运行效果:
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModel
from config.model import model_path_dict
from config.question import knowledge_chat_questions
model_path = model_path_dict["THUDM"]["3-6b"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
history = None
for question in knowledge_chat_questions:
print("问:" + question)
response, history = model.chat(tokenizer, question, history=history)
print("答:" + str(response))
print(str(history))
安装的依赖包和上面对应的历史对话章节相同。下面看一下运行效果:
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModelForCausalLM
from config.model import model_path_dict
from config.question import knowledge_chat_questions
model_path = model_path_dict["Qwen"]["7B-Chat"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
def chat(model, tokenizer, messages, system: str = "You are a helpful assistant."):
im_start, im_end = "<|im_start|>", "<|im_end|>"
prompt = f"{im_start}system\n{system}{im_end}\n"
for message in messages:
match message["role"]:
case "user":
prompt += f"{im_start}user\n{message['content']}{im_end}\n"
case "assistant":
prompt += f"{im_start}assistant\n{message['content']}{im_end}\n"
prompt += f"{im_start}assistant\n"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(input_ids,
do_sample=True,
max_new_tokens=1024,
top_p=0.8,
temperature=0.8)
response = tokenizer.decode(outputs[0][len(input_ids[0]):], skip_special_tokens=True)
return response
messages=[]
for question in knowledge_chat_questions:
print("问:" + question)
messages.append({"role": "user", "content": question})
response = chat(model, tokenizer, messages)
messages.append({"role": "assistant", "content": response})
print("答:" + str(response))
print(str(messages))
安装的依赖包和上面对应的历史对话章节相同。下面看一下运行效果:
import sys
sys.path.append(".")
from transformers import AutoTokenizer
from bigdl.llm.transformers import AutoModelForCausalLM
from config.model import model_path_dict
from config.question import knowledge_chat_questions
model_path = model_path_dict["01ai"]["6B-Chat"]
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
def chat3(model,tokenizer,messages):
input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
output_ids = model.generate(input_ids)
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
return response
messages3=[]
for question in knowledge_chat_questions:
print("问:" + question)
messages3.append({"role": "user", "content": question})
response = chat3(model,tokenizer, messages3)
messages3.append({"role": "assistant", "content": response})
print("答:" + str(response))
print(str(messages3))
这里采用的是上面历史对话中的“第三轮”的方式,因此需要
安装的依赖包和上面对应的历史对话章节相同。下面看一下运行效果:
这里出现了点问题,先问一下:“你知道 xiaodu114 吗?(这个问题没有包含已知信息),后面的问题是:“介绍一下 xiaodu114”(这个问题是包含已知信息的)。截图中你可以看到,没有根据已知信息回答,猜测可能是受到上一个问题的影响。再看一下没有干扰问题的情况,如下:
“你知道 xiaodu114 吗?”、“介绍一下 xiaodu114”(包含已知信息),连续对话时有的大模型会受到历史对话的影响。如果问的是同一个问题:“你知道 xiaodu114 吗?”只是后面一个包含已知信息,这种情况下更容易受到影响,都不太稳定,如下:
前面“历史对话”和“知识问答”的测试都是为了这里的API,将他们装进API来方便使用。
API提供之后怎么调用?为了测试一下这些API,写了一个简单的测试页面
为了提供API,还要安装
项目更目录下新增
from typing import Literal, Optional, List, Dict, Any
from pydantic import BaseModel
class ChatMessage(BaseModel):
role: Literal["user", "assistant", "system"]
content: str
class ChatCompletionRequest(BaseModel):
messages: List[ChatMessage]
stream: Optional[bool] = False
generation_config: Dict[str, Any] = {}
还是
上面的几个都同时支持流式响应和非流式响应并且均以测试通过,后面有机会在逐步完善。