本篇介绍如何使用 XTuner 微调 InternLM 称为私人智能助手。
目标:使用 XTuner 微调 InternLM-Chat 模型,使得其能够知道它的主人是谁。
环境配置 先在命令行终端执行下面命令:
conda create –name personal_assistant python=3.10 -y
conda activate personal_assistant
pip install ipykernel
python -m ipykernel install –user –name personal_assistant –display-name personal_assistant
1 2 3 4 5 6 7 8 9 10 11 import os, sysPATH = os.environ['PATH' ] basedir = os.path.dirname(os.path.dirname(sys.exec_prefix)) %env CONDA_EXE={os.path.join(basedir, 'bin/conda' )} %env CONDA_PREFIX={sys.exec_prefix} %env CONDA_PYTHON_EXE={os.path.join(basedir, 'bin/python' )} %env PATH={os.path.join(sys.exec_prefix, 'bin' )}:$PATH
env: CONDA_EXE=/root/.conda/bin/conda
env: CONDA_PREFIX=/root/.conda/envs/xtuner0.1.9
env: CONDA_PYTHON_EXE=/root/.conda/bin/python
env: PATH=/root/.conda/envs/xtuner0.1.9/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
数据和代码准备 1 2 %mkdir -p ~/personal_assistant/xtuner019 %cd ~/personal_assistant/xtuner019
/root/personal_assistant/xtuner019
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/IPython/core/magics/osm.py:393: UserWarning: using bookmarks requires you to install the `pickleshare` library.
bkms = self.shell.db.get('bookmarks', {})
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.
self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
1 !git clone -b v0.1 .9 https://github.com/InternLM/xtuner
Cloning into 'xtuner'...
remote: Enumerating objects: 4734, done.
remote: Counting objects: 100% (3534/3534), done.
remote: Compressing objects: 100% (728/728), done.
remote: Total 4734 (delta 3078), reused 2965 (delta 2763), pack-reused 1200
Receiving objects: 100% (4734/4734), 898.47 KiB | 24.28 MiB/s, done.
Resolving deltas: 100% (3564/3564), done.
Note: switching to '9f686f08c8e60e568e811aaad8daf9c08462d42d'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
Updating files: 100% (430/430), done.
1 2 %cd xtuner %pip install -e '.[all]'
/root/personal_assistant/xtuner019/xtuner
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///root/personal_assistant/xtuner019/xtuner
Preparing metadata (setup.py) ... done
Requirement already satisfied: bitsandbytes>=0.40.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.42.0)
Requirement already satisfied: datasets in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (2.14.7)
Requirement already satisfied: einops in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.7.0)
Requirement already satisfied: fsspec<=2023.6.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (2023.6.0)
Requirement already satisfied: lagent>=0.1.2 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.1.2)
Requirement already satisfied: mmengine>=0.9.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.10.2)
Requirement already satisfied: modelscope in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (1.11.0)
Requirement already satisfied: peft>=0.4.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.7.1)
Requirement already satisfied: scipy in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (1.11.4)
Requirement already satisfied: SentencePiece in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.1.99)
Requirement already satisfied: tiktoken in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.5.2)
Requirement already satisfied: torch in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (2.1.2)
Requirement already satisfied: transformers<=4.34.0,>=4.32.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (4.34.0)
Requirement already satisfied: transformers_stream_generator in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.0.4)
Requirement already satisfied: deepspeed>=0.12.3 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (0.12.6)
Requirement already satisfied: mpi4py-mpich in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (3.1.2)
Requirement already satisfied: hjson in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (3.1.0)
Requirement already satisfied: ninja in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (1.11.1.1)
Requirement already satisfied: numpy in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (1.26.3)
Requirement already satisfied: packaging>=20.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (23.2)
Requirement already satisfied: psutil in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (5.9.7)
Requirement already satisfied: py-cpuinfo in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (9.0.0)
Requirement already satisfied: pydantic in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (2.5.3)
Requirement already satisfied: pynvml in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (11.5.0)
Requirement already satisfied: tqdm in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from deepspeed>=0.12.3) (4.66.1)
Requirement already satisfied: distro in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from lagent>=0.1.2) (1.9.0)
Requirement already satisfied: func-timeout in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from lagent>=0.1.2) (4.3.5)
Requirement already satisfied: jsonschema in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from lagent>=0.1.2) (4.20.0)
Requirement already satisfied: requests in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from lagent>=0.1.2) (2.31.0)
Requirement already satisfied: addict in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from mmengine>=0.9.1) (2.4.0)
Requirement already satisfied: matplotlib in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from mmengine>=0.9.1) (3.8.2)
Requirement already satisfied: pyyaml in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from mmengine>=0.9.1) (6.0.1)
Requirement already satisfied: rich in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from mmengine>=0.9.1) (13.7.0)
Requirement already satisfied: termcolor in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from mmengine>=0.9.1) (2.4.0)
Requirement already satisfied: yapf in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from mmengine>=0.9.1) (0.40.2)
Requirement already satisfied: opencv-python>=3 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from mmengine>=0.9.1) (4.9.0.80)
Requirement already satisfied: accelerate>=0.21.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from peft>=0.4.0) (0.26.0)
Requirement already satisfied: safetensors in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from peft>=0.4.0) (0.4.1)
Requirement already satisfied: huggingface-hub>=0.17.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from peft>=0.4.0) (0.17.3)
Requirement already satisfied: filelock in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (3.13.1)
Requirement already satisfied: typing-extensions in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (4.9.0)
Requirement already satisfied: sympy in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (1.12)
Requirement already satisfied: networkx in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (3.2.1)
Requirement already satisfied: jinja2 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (3.1.3)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (8.9.2.26)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (10.3.2.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (11.4.5.107)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (12.1.0.106)
Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (2.18.1)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (12.1.105)
Requirement already satisfied: triton==2.1.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from torch) (2.1.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch) (12.3.101)
Requirement already satisfied: regex!=2019.12.17 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from transformers<=4.34.0,>=4.32.1) (2023.12.25)
Requirement already satisfied: tokenizers<0.15,>=0.14 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from transformers<=4.34.0,>=4.32.1) (0.14.1)
Requirement already satisfied: pyarrow>=8.0.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from datasets) (14.0.2)
Requirement already satisfied: pyarrow-hotfix in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from datasets) (0.6)
Requirement already satisfied: dill<0.3.8,>=0.3.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from datasets) (0.3.7)
Requirement already satisfied: pandas in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from datasets) (2.1.4)
Requirement already satisfied: xxhash in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from datasets) (3.4.1)
Requirement already satisfied: multiprocess in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from datasets) (0.70.15)
Requirement already satisfied: aiohttp in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from datasets) (3.9.1)
Requirement already satisfied: attrs in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (23.2.0)
Requirement already satisfied: gast>=0.2.2 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (0.5.4)
Requirement already satisfied: oss2 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (2.18.4)
Requirement already satisfied: Pillow>=6.2.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (10.2.0)
Requirement already satisfied: python-dateutil>=2.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (2.8.2)
Requirement already satisfied: setuptools in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (68.2.2)
Requirement already satisfied: simplejson>=3.3.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (3.19.2)
Requirement already satisfied: sortedcontainers>=1.5.9 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (2.4.0)
Requirement already satisfied: urllib3>=1.26 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from modelscope) (2.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from aiohttp->datasets) (6.0.4)
Requirement already satisfied: yarl<2.0,>=1.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from aiohttp->datasets) (1.9.4)
Requirement already satisfied: frozenlist>=1.1.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from aiohttp->datasets) (1.4.1)
Requirement already satisfied: aiosignal>=1.1.2 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from aiohttp->datasets) (1.3.1)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from aiohttp->datasets) (4.0.3)
Requirement already satisfied: six>=1.5 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from python-dateutil>=2.1->modelscope) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from requests->lagent>=0.1.2) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from requests->lagent>=0.1.2) (3.6)
Requirement already satisfied: certifi>=2017.4.17 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from requests->lagent>=0.1.2) (2023.11.17)
Requirement already satisfied: MarkupSafe>=2.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from jinja2->torch) (2.1.3)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from jsonschema->lagent>=0.1.2) (2023.12.1)
Requirement already satisfied: referencing>=0.28.4 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from jsonschema->lagent>=0.1.2) (0.32.1)
Requirement already satisfied: rpds-py>=0.7.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from jsonschema->lagent>=0.1.2) (0.16.2)
Requirement already satisfied: contourpy>=1.0.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from matplotlib->mmengine>=0.9.1) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from matplotlib->mmengine>=0.9.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from matplotlib->mmengine>=0.9.1) (4.47.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from matplotlib->mmengine>=0.9.1) (1.4.5)
Requirement already satisfied: pyparsing>=2.3.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from matplotlib->mmengine>=0.9.1) (3.1.1)
Requirement already satisfied: crcmod>=1.7 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from oss2->modelscope) (1.7)
Requirement already satisfied: pycryptodome>=3.4.7 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from oss2->modelscope) (3.20.0)
Requirement already satisfied: aliyun-python-sdk-kms>=2.4.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from oss2->modelscope) (2.16.2)
Requirement already satisfied: aliyun-python-sdk-core>=2.13.12 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from oss2->modelscope) (2.14.0)
Requirement already satisfied: pytz>=2020.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from pandas->datasets) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from pandas->datasets) (2023.4)
Requirement already satisfied: annotated-types>=0.4.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from pydantic->deepspeed>=0.12.3) (0.6.0)
Requirement already satisfied: pydantic-core==2.14.6 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from pydantic->deepspeed>=0.12.3) (2.14.6)
Requirement already satisfied: markdown-it-py>=2.2.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from rich->mmengine>=0.9.1) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from rich->mmengine>=0.9.1) (2.17.2)
Requirement already satisfied: mpmath>=0.19 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from sympy->torch) (1.3.0)
Requirement already satisfied: importlib-metadata>=6.6.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from yapf->mmengine>=0.9.1) (7.0.1)
Requirement already satisfied: platformdirs>=3.5.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from yapf->mmengine>=0.9.1) (4.1.0)
Requirement already satisfied: tomli>=2.0.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from yapf->mmengine>=0.9.1) (2.0.1)
Requirement already satisfied: jmespath<1.0.0,>=0.9.3 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (0.10.0)
Requirement already satisfied: cryptography>=2.6.0 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (41.0.7)
Requirement already satisfied: zipp>=0.5 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from importlib-metadata>=6.6.0->yapf->mmengine>=0.9.1) (3.17.0)
Requirement already satisfied: mdurl~=0.1 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->mmengine>=0.9.1) (0.1.2)
Requirement already satisfied: cffi>=1.12 in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from cryptography>=2.6.0->aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (1.16.0)
Requirement already satisfied: pycparser in /root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=2.6.0->aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (2.21)
Installing collected packages: xtuner
Attempting uninstall: xtuner
Found existing installation: xtuner 0.1.9
Uninstalling xtuner-0.1.9:
Successfully uninstalled xtuner-0.1.9
Running setup.py develop for xtuner
Successfully installed xtuner-0.1.9
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Note: you may need to restart the kernel to use updated packages.
1 2 %mkdir -p ~/personal_assistant/data %cd ~/personal_assistant/data
/root/personal_assistant/data
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/IPython/core/magics/osm.py:393: UserWarning: using bookmarks requires you to install the `pickleshare` library.
bkms = self.shell.db.get('bookmarks', {})
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.
self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 %%writefile /root/personal_assistant/data/generate_data.py import jsonname = 'xujinzh' n = 10000 data = [ { "conversation" : [ { "input" : "请做一下自我介绍" , "output" : "我是{}的小助手,内在是上海AI实验室书生·浦语的7B大模型哦" .format (name) } ] } ] for i in range (n): data.append(data[0 ]) with open ('personal_assistant.json' , 'w' , encoding='utf-8' ) as f: json.dump(data, f, ensure_ascii=False , indent=4 )
Writing /root/personal_assistant/data/generate_data.py
1 !{sys.executable} /root/personal_assistant/data/generate_data.py
1 %ls -lh /root/personal_assistant/data
total 2.4M
-rw-r--r-- 1 root root 504 Jan 12 16:41 generate_data.py
-rw-r--r-- 1 root root 2.4M Jan 12 16:42 personal_assistant.json
模型和配置文件准备 1 2 %mkdir -p /root/personal_assistant/model/Shanghai_AI_Laboratory %cp -r /root/share/temp/model_repos/internlm-chat-7b /root/personal_assistant/model/Shanghai_AI_Laboratory
[2024-01-12 16:46:36,270] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-12 16:47:30,319] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
==========================CONFIGS===========================
baichuan2_13b_base_qlora_alpaca_e3
baichuan2_13b_base_qlora_alpaca_enzh_e3
baichuan2_13b_base_qlora_alpaca_enzh_oasst1_e3
baichuan2_13b_base_qlora_alpaca_zh_e3
baichuan2_13b_base_qlora_arxiv_gentitle_e3
baichuan2_13b_base_qlora_code_alpaca_e3
baichuan2_13b_base_qlora_colorist_e5
baichuan2_13b_base_qlora_lawyer_e3
baichuan2_13b_base_qlora_oasst1_512_e3
baichuan2_13b_base_qlora_oasst1_e3
baichuan2_13b_base_qlora_open_platypus_e3
baichuan2_13b_base_qlora_sql_e3
baichuan2_13b_chat_qlora_alpaca_e3
baichuan2_13b_chat_qlora_alpaca_enzh_e3
baichuan2_13b_chat_qlora_alpaca_enzh_oasst1_e3
baichuan2_13b_chat_qlora_alpaca_zh_e3
baichuan2_13b_chat_qlora_code_alpaca_e3
baichuan2_13b_chat_qlora_lawyer_e3
baichuan2_13b_chat_qlora_oasst1_512_e3
baichuan2_13b_chat_qlora_oasst1_e3
baichuan2_13b_chat_qlora_open_platypus_e3
baichuan2_7b_base_qlora_alpaca_e3
baichuan2_7b_base_qlora_alpaca_enzh_e3
baichuan2_7b_base_qlora_alpaca_enzh_oasst1_e3
baichuan2_7b_base_qlora_alpaca_zh_e3
baichuan2_7b_base_qlora_arxiv_gentitle_e3
baichuan2_7b_base_qlora_code_alpaca_e3
baichuan2_7b_base_qlora_colorist_e5
baichuan2_7b_base_qlora_lawyer_e3
baichuan2_7b_base_qlora_oasst1_512_e3
baichuan2_7b_base_qlora_oasst1_e3
baichuan2_7b_base_qlora_open_platypus_e3
baichuan2_7b_base_qlora_sql_e3
baichuan2_7b_chat_qlora_alpaca_e3
baichuan2_7b_chat_qlora_alpaca_enzh_e3
baichuan2_7b_chat_qlora_alpaca_enzh_oasst1_e3
baichuan2_7b_chat_qlora_alpaca_zh_e3
baichuan2_7b_chat_qlora_code_alpaca_e3
baichuan2_7b_chat_qlora_lawyer_e3
baichuan2_7b_chat_qlora_oasst1_512_e3
baichuan2_7b_chat_qlora_oasst1_e3
baichuan2_7b_chat_qlora_open_platypus_e3
baichuan_13b_base_qlora_alpaca_e3
baichuan_13b_base_qlora_alpaca_enzh_e3
baichuan_13b_base_qlora_alpaca_enzh_oasst1_e3
baichuan_13b_base_qlora_alpaca_zh_e3
baichuan_13b_base_qlora_arxiv_gentitle_e3
baichuan_13b_base_qlora_code_alpaca_e3
baichuan_13b_base_qlora_colorist_e5
baichuan_13b_base_qlora_lawyer_e3
baichuan_13b_base_qlora_medical_e1
baichuan_13b_base_qlora_moss_sft_all_e1
baichuan_13b_base_qlora_moss_sft_all_e2_gpu8
baichuan_13b_base_qlora_moss_sft_plugins_e1
baichuan_13b_base_qlora_oasst1_512_e3
baichuan_13b_base_qlora_oasst1_e3
baichuan_13b_base_qlora_open_platypus_e3
baichuan_13b_base_qlora_openorca_e1
baichuan_13b_base_qlora_sql_e3
baichuan_13b_base_qlora_tiny_codes_e1
baichuan_13b_chat_qlora_alpaca_e3
baichuan_13b_chat_qlora_alpaca_enzh_e3
baichuan_13b_chat_qlora_alpaca_enzh_oasst1_e3
baichuan_13b_chat_qlora_alpaca_zh_e3
baichuan_13b_chat_qlora_arxiv_gentitle_e3
baichuan_13b_chat_qlora_code_alpaca_e3
baichuan_13b_chat_qlora_colorist_e5
baichuan_13b_chat_qlora_lawyer_e3
baichuan_13b_chat_qlora_medical_e1
baichuan_13b_chat_qlora_oasst1_512_e3
baichuan_13b_chat_qlora_oasst1_e3
baichuan_13b_chat_qlora_open_platypus_e3
baichuan_13b_chat_qlora_openorca_e1
baichuan_13b_chat_qlora_sql_e3
baichuan_13b_chat_qlora_tiny_codes_e1
baichuan_7b_qlora_alpaca_e3
baichuan_7b_qlora_alpaca_enzh_e3
baichuan_7b_qlora_alpaca_enzh_oasst1_e3
baichuan_7b_qlora_alpaca_zh_e3
baichuan_7b_qlora_arxiv_gentitle_e3
baichuan_7b_qlora_code_alpaca_e3
baichuan_7b_qlora_colorist_e5
baichuan_7b_qlora_lawyer_e3
baichuan_7b_qlora_medical_e1
baichuan_7b_qlora_moss_sft_all_e1
baichuan_7b_qlora_moss_sft_all_e2_gpu8
baichuan_7b_qlora_moss_sft_plugins_e1
baichuan_7b_qlora_oasst1_512_e3
baichuan_7b_qlora_oasst1_e3
baichuan_7b_qlora_open_platypus_e3
baichuan_7b_qlora_openorca_e1
baichuan_7b_qlora_sql_e3
baichuan_7b_qlora_tiny_codes_e1
chatglm2_6b_qlora_alpaca_e3
chatglm2_6b_qlora_alpaca_enzh_e3
chatglm2_6b_qlora_alpaca_enzh_oasst1_e3
chatglm2_6b_qlora_alpaca_zh_e3
chatglm2_6b_qlora_arxiv_gentitle_e3
chatglm2_6b_qlora_code_alpaca_e3
chatglm2_6b_qlora_colorist_e5
chatglm2_6b_qlora_lawyer_e3
chatglm2_6b_qlora_medical_e1
chatglm2_6b_qlora_oasst1_512_e3
chatglm2_6b_qlora_oasst1_e3
chatglm2_6b_qlora_open_platypus_e3
chatglm2_6b_qlora_openorca_e1
chatglm2_6b_qlora_sql_e3
chatglm2_6b_qlora_tiny_codes_e1
chatglm3_6b_base_qlora_alpaca_e3
chatglm3_6b_base_qlora_alpaca_enzh_e3
chatglm3_6b_base_qlora_alpaca_enzh_oasst1_e3
chatglm3_6b_base_qlora_alpaca_zh_e3
chatglm3_6b_base_qlora_arxiv_gentitle_e3
chatglm3_6b_base_qlora_code_alpaca_e3
chatglm3_6b_base_qlora_colorist_e5
chatglm3_6b_base_qlora_lawyer_e3
chatglm3_6b_base_qlora_medical_e1
chatglm3_6b_base_qlora_oasst1_512_e3
chatglm3_6b_base_qlora_oasst1_e3
chatglm3_6b_base_qlora_open_platypus_e3
chatglm3_6b_base_qlora_openorca_e1
chatglm3_6b_base_qlora_sql_e3
chatglm3_6b_base_qlora_tiny_codes_e1
chatglm3_6b_qlora_alpaca_e3
chatglm3_6b_qlora_alpaca_enzh_e3
chatglm3_6b_qlora_alpaca_enzh_oasst1_e3
chatglm3_6b_qlora_alpaca_zh_e3
chatglm3_6b_qlora_arxiv_gentitle_e3
chatglm3_6b_qlora_code_alpaca_e3
chatglm3_6b_qlora_colorist_e5
chatglm3_6b_qlora_lawyer_e3
chatglm3_6b_qlora_medical_e1
chatglm3_6b_qlora_oasst1_512_e3
chatglm3_6b_qlora_oasst1_e3
chatglm3_6b_qlora_open_platypus_e3
chatglm3_6b_qlora_openorca_e1
chatglm3_6b_qlora_sql_e3
chatglm3_6b_qlora_tiny_codes_e1
deepspeed_zero1
deepspeed_zero2
deepspeed_zero2_offload
deepspeed_zero3
deepspeed_zero3_offload
internlm_20b_qlora_alpaca_e3
internlm_20b_qlora_alpaca_enzh_e3
internlm_20b_qlora_alpaca_enzh_oasst1_e3
internlm_20b_qlora_alpaca_zh_e3
internlm_20b_qlora_arxiv_gentitle_e3
internlm_20b_qlora_code_alpaca_e3
internlm_20b_qlora_colorist_e5
internlm_20b_qlora_lawyer_e3
internlm_20b_qlora_msagent_react_e3_gpu8
internlm_20b_qlora_oasst1_512_e3
internlm_20b_qlora_oasst1_e3
internlm_20b_qlora_open_platypus_e3
internlm_20b_qlora_sql_e3
internlm_7b_full_alpaca_e3
internlm_7b_full_alpaca_enzh_e3
internlm_7b_full_alpaca_enzh_oasst1_e3
internlm_7b_full_alpaca_zh_e3
internlm_7b_full_oasst1_e3
internlm_7b_qlora_alpaca_e3
internlm_7b_qlora_alpaca_enzh_e3
internlm_7b_qlora_alpaca_enzh_oasst1_e3
internlm_7b_qlora_alpaca_zh_e3
internlm_7b_qlora_arxiv_gentitle_e3
internlm_7b_qlora_code_alpaca_e3
internlm_7b_qlora_colorist_e5
internlm_7b_qlora_lawyer_e3
internlm_7b_qlora_medical_e1
internlm_7b_qlora_moss_sft_all_e1
internlm_7b_qlora_moss_sft_all_e2_gpu8
internlm_7b_qlora_moss_sft_plugins_e1
internlm_7b_qlora_msagent_react_e3_gpu8
internlm_7b_qlora_oasst1_512_e3
internlm_7b_qlora_oasst1_e3
internlm_7b_qlora_oasst1_e3_hf
internlm_7b_qlora_oasst1_mmlu_e3
internlm_7b_qlora_open_platypus_e3
internlm_7b_qlora_openorca_e1
internlm_7b_qlora_sql_e3
internlm_7b_qlora_tiny_codes_e1
internlm_chat_20b_qlora_alpaca_e3
internlm_chat_20b_qlora_alpaca_enzh_e3
internlm_chat_20b_qlora_alpaca_enzh_oasst1_e3
internlm_chat_20b_qlora_alpaca_zh_e3
internlm_chat_20b_qlora_code_alpaca_e3
internlm_chat_20b_qlora_lawyer_e3
internlm_chat_20b_qlora_oasst1_512_e3
internlm_chat_20b_qlora_oasst1_e3
internlm_chat_20b_qlora_open_platypus_e3
internlm_chat_7b_qlora_alpaca_e3
internlm_chat_7b_qlora_alpaca_enzh_e3
internlm_chat_7b_qlora_alpaca_enzh_oasst1_e3
internlm_chat_7b_qlora_alpaca_zh_e3
internlm_chat_7b_qlora_arxiv_gentitle_e3
internlm_chat_7b_qlora_code_alpaca_e3
internlm_chat_7b_qlora_colorist_e5
internlm_chat_7b_qlora_lawyer_e3
internlm_chat_7b_qlora_medical_e1
internlm_chat_7b_qlora_oasst1_512_e3
internlm_chat_7b_qlora_oasst1_e3
internlm_chat_7b_qlora_open_platypus_e3
internlm_chat_7b_qlora_openorca_e1
internlm_chat_7b_qlora_sql_e3
internlm_chat_7b_qlora_tiny_codes_e1
llama2_70b_int8_lora_open_platypus_e1
llama2_70b_int8_lora_open_platypus_e1_hf
llama2_70b_qlora_open_platypus_e1
llama2_70b_qlora_open_platypus_e1_hf
llama2_7b_chat_qlora_alpaca_e3
llama2_7b_chat_qlora_alpaca_enzh_e3
llama2_7b_chat_qlora_alpaca_enzh_oasst1_e3
llama2_7b_chat_qlora_alpaca_zh_e3
llama2_7b_chat_qlora_arxiv_gentitle_e3
llama2_7b_chat_qlora_code_alpaca_e3
llama2_7b_chat_qlora_colorist_e5
llama2_7b_chat_qlora_lawyer_e3
llama2_7b_chat_qlora_medical_e1
llama2_7b_chat_qlora_oasst1_512_e3
llama2_7b_chat_qlora_oasst1_e3
llama2_7b_chat_qlora_open_platypus_e3
llama2_7b_chat_qlora_openorca_e1
llama2_7b_chat_qlora_sql_e3
llama2_7b_chat_qlora_tiny_codes_e1
llama2_7b_full_wizardlm_e1
llama2_7b_qlora_alpaca_e3
llama2_7b_qlora_alpaca_enzh_e3
llama2_7b_qlora_alpaca_enzh_oasst1_e3
llama2_7b_qlora_alpaca_zh_e3
llama2_7b_qlora_arxiv_gentitle_e3
llama2_7b_qlora_code_alpaca_e3
llama2_7b_qlora_colorist_e5
llama2_7b_qlora_lawyer_e3
llama2_7b_qlora_medical_e1
llama2_7b_qlora_moss_sft_all_e1
llama2_7b_qlora_moss_sft_all_e2_gpu8
llama2_7b_qlora_moss_sft_plugins_e1
llama2_7b_qlora_msagent_react_e3_gpu8
llama2_7b_qlora_oasst1_512_e3
llama2_7b_qlora_oasst1_e3
llama2_7b_qlora_open_platypus_e3
llama2_7b_qlora_openorca_e1
llama2_7b_qlora_sql_e3
llama2_7b_qlora_tiny_codes_e1
llama_7b_qlora_alpaca_e3
llama_7b_qlora_alpaca_enzh_e3
llama_7b_qlora_alpaca_enzh_oasst1_e3
llama_7b_qlora_alpaca_zh_e3
llama_7b_qlora_arxiv_gentitle_e3
llama_7b_qlora_code_alpaca_e3
llama_7b_qlora_colorist_e5
llama_7b_qlora_lawyer_e3
llama_7b_qlora_medical_e1
llama_7b_qlora_moss_sft_all_e1
llama_7b_qlora_moss_sft_all_e2_gpu8
llama_7b_qlora_moss_sft_plugins_e1
llama_7b_qlora_oasst1_512_e3
llama_7b_qlora_oasst1_e3
llama_7b_qlora_open_platypus_e3
llama_7b_qlora_openorca_e1
llama_7b_qlora_sql_e3
llama_7b_qlora_tiny_codes_e1
mistral_7b_qlora_skypile_pretrain_e1
qwen_7b_chat_qlora_alpaca_e3
qwen_7b_chat_qlora_alpaca_enzh_e3
qwen_7b_chat_qlora_alpaca_enzh_oasst1_e3
qwen_7b_chat_qlora_alpaca_zh_e3
qwen_7b_chat_qlora_arxiv_gentitle_e3
qwen_7b_chat_qlora_code_alpaca_e3
qwen_7b_chat_qlora_colorist_e5
qwen_7b_chat_qlora_lawyer_e3
qwen_7b_chat_qlora_medical_e1
qwen_7b_chat_qlora_oasst1_512_e3
qwen_7b_chat_qlora_oasst1_e3
qwen_7b_chat_qlora_open_platypus_e3
qwen_7b_chat_qlora_openorca_e1
qwen_7b_chat_qlora_sql_e3
qwen_7b_chat_qlora_tiny_codes_e1
qwen_7b_qlora_alpaca_e3
qwen_7b_qlora_alpaca_enzh_e3
qwen_7b_qlora_alpaca_enzh_oasst1_e3
qwen_7b_qlora_alpaca_zh_e3
qwen_7b_qlora_arxiv_gentitle_e3
qwen_7b_qlora_code_alpaca_e3
qwen_7b_qlora_colorist_e5
qwen_7b_qlora_lawyer_e3
qwen_7b_qlora_medical_e1
qwen_7b_qlora_moss_sft_all_e1
qwen_7b_qlora_moss_sft_all_e2_gpu8
qwen_7b_qlora_moss_sft_plugins_e1
qwen_7b_qlora_oasst1_512_e3
qwen_7b_qlora_oasst1_e3
qwen_7b_qlora_open_platypus_e3
qwen_7b_qlora_openorca_e1
qwen_7b_qlora_sql_e3
qwen_7b_qlora_tiny_codes_e1
starcoder_qlora_stack_exchange_example
yi_34b_qlora_alpaca_enzh_e3
yi_6b_qlora_alpaca_enzh_e3
zephyr_7b_beta_qlora_alpaca_e3
=============================================================
1 2 %mkdir /root/personal_assistant/config %cd /root/personal_assistant/config
mkdir: cannot create directory ‘/root/personal_assistant/config’: File exists
/root/personal_assistant/config
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.
self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
1 !xtuner copy-cfg internlm_chat_7b_qlora_oasst1_e3 .
[2024-01-12 17:25:59,992] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-12 17:26:49,263] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Copy to ./internlm_chat_7b_qlora_oasst1_e3_copy.py
1 !sed -i "s#pretrained_model_name_or_path = 'internlm/internlm-chat-7b'#pretrained_model_name_or_path = '/root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b'#g" /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py
1 !sed -i "s#data_path = 'timdettmers/openassistant-guanaco'#data_path = '/root/personal_assistant/data/personal_assistant.json'#g" /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py
1 !sed -i "s#max_length = 2048#max_length = 1024#g" /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py
1 !sed -i "s#batch_size = 1#batch_size = 2#g" /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py
1 !sed -i "s#evaluation_freq = 500#evaluation_freq = 90#g" /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py
1 !sed -i "s#dataset=dict(type=load_dataset, path=data_path),#dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)),#g" /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py
1 !sed -i "s#dataset_map_fn=oasst1_map_fn,#dataset_map_fn=None,#g" /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py
1 !sed -i "s#'请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai'#'请介绍一下你自己', '请做一下自我介绍'#g" /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py
模型微调 1 !xtuner train /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py --deepspeed deepspeed_zero2
[2024-01-12 17:49:37,839] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-12 17:50:11,023] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
01/12 17:50:31 - mmengine - INFO -
------------------------------------------------------------
System environment:
sys.platform: linux
Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 1825194956
GPU 0: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.1.2+cu121
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 12.1
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 8.9.2
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
OpenCV: 4.9.0
MMEngine: 0.10.2
Runtime environment:
launcher: none
randomness: {'seed': None, 'deterministic': False}
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: None
deterministic: False
Distributed launcher: none
Distributed training: False
GPU number: 1
------------------------------------------------------------
01/12 17:50:31 - mmengine - INFO - Config:
SYSTEM = ''
accumulative_counts = 16
batch_size = 2
betas = (
0.9,
0.999,
)
custom_hooks = [
dict(
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.engine.DatasetInfoHook'),
dict(
evaluation_inputs=[
'请介绍一下你自己',
'请做一下自我介绍',
],
every_n_iters=90,
prompt_template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
system='',
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.engine.EvaluateChatHook'),
]
data_path = '/root/personal_assistant/data/personal_assistant.json'
dataloader_num_workers = 0
default_hooks = dict(
checkpoint=dict(interval=1, type='mmengine.hooks.CheckpointHook'),
logger=dict(interval=10, type='mmengine.hooks.LoggerHook'),
param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),
sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),
timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(
cudnn_benchmark=False,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
evaluation_freq = 90
evaluation_inputs = [
'请介绍一下你自己',
'请做一下自我介绍',
]
launcher = 'none'
load_from = None
log_level = 'INFO'
lr = 0.0002
max_epochs = 3
max_length = 1024
max_norm = 1
model = dict(
llm=dict(
pretrained_model_name_or_path=
'/root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b',
quantization_config=dict(
bnb_4bit_compute_dtype='torch.float16',
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True,
llm_int8_has_fp16_weight=False,
llm_int8_threshold=6.0,
load_in_4bit=True,
load_in_8bit=False,
type='transformers.BitsAndBytesConfig'),
torch_dtype='torch.float16',
trust_remote_code=True,
type='transformers.AutoModelForCausalLM.from_pretrained'),
lora=dict(
bias='none',
lora_alpha=16,
lora_dropout=0.1,
r=64,
task_type='CAUSAL_LM',
type='peft.LoraConfig'),
type='xtuner.model.SupervisedFinetune')
optim_type = 'bitsandbytes.optim.PagedAdamW32bit'
optim_wrapper = dict(
optimizer=dict(
betas=(
0.9,
0.999,
),
lr=0.0002,
type='bitsandbytes.optim.PagedAdamW32bit',
weight_decay=0),
type='DeepSpeedOptimWrapper')
pack_to_max_length = True
param_scheduler = dict(
T_max=3,
by_epoch=True,
convert_to_iter_based=True,
eta_min=0.0,
type='mmengine.optim.CosineAnnealingLR')
pretrained_model_name_or_path = '/root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.internlm_chat'
randomness = dict(deterministic=False, seed=None)
resume = False
runner_type = 'FlexibleRunner'
strategy = dict(
config=dict(
bf16=dict(enabled=True),
fp16=dict(enabled=False, initial_scale_power=16),
gradient_accumulation_steps='auto',
gradient_clipping='auto',
train_micro_batch_size_per_gpu='auto',
zero_allow_untested_optimizer=True,
zero_force_ds_cpu_optimizer=False,
zero_optimization=dict(overlap_comm=True, stage=2)),
exclude_frozen_parameters=True,
gradient_accumulation_steps=16,
gradient_clipping=1,
train_micro_batch_size_per_gpu=2,
type='DeepSpeedStrategy')
tokenizer = dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(by_epoch=True, max_epochs=3, val_interval=1)
train_dataloader = dict(
batch_size=2,
collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn'),
dataset=dict(
dataset=dict(
data_files=dict(
train='/root/personal_assistant/data/personal_assistant.json'),
path='json',
type='datasets.load_dataset'),
dataset_map_fn=None,
max_length=1024,
pack_to_max_length=True,
remove_unused_columns=True,
shuffle_before_pack=True,
template_map_fn=dict(
template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
type='xtuner.dataset.map_fns.template_map_fn_factory'),
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.dataset.process_hf_dataset'),
num_workers=0,
sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler'))
train_dataset = dict(
dataset=dict(
data_files=dict(
train='/root/personal_assistant/data/personal_assistant.json'),
path='json',
type='datasets.load_dataset'),
dataset_map_fn=None,
max_length=1024,
pack_to_max_length=True,
remove_unused_columns=True,
shuffle_before_pack=True,
template_map_fn=dict(
template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
type='xtuner.dataset.map_fns.template_map_fn_factory'),
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.dataset.process_hf_dataset')
visualizer = None
weight_decay = 0
work_dir = './work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy'
01/12 17:50:33 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
01/12 17:50:34 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
--------------------
before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DatasetInfoHook
(NORMAL ) EvaluateChatHook
(VERY_LOW ) CheckpointHook
--------------------
before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
--------------------
before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
--------------------
after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) EvaluateChatHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
after_train_epoch:
(NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
before_val:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook
--------------------
before_val_epoch:
(NORMAL ) IterTimerHook
--------------------
before_val_iter:
(NORMAL ) IterTimerHook
--------------------
after_val_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
--------------------
after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
after_val:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) EvaluateChatHook
--------------------
after_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) EvaluateChatHook
(VERY_LOW ) CheckpointHook
--------------------
before_test:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook
--------------------
before_test_epoch:
(NORMAL ) IterTimerHook
--------------------
before_test_iter:
(NORMAL ) IterTimerHook
--------------------
after_test_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
--------------------
after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
--------------------
after_test:
(VERY_HIGH ) RuntimeInfoHook
--------------------
after_run:
(BELOW_NORMAL) LoggerHook
--------------------
Map: 100%|███████████████████████| 10001/10001 [00:02<00:00, 4240.28 examples/s]
Flattening the indices: 100%|███| 10001/10001 [00:00<00:00, 43885.71 examples/s]
Map: 100%|██████████████████████| 10001/10001 [00:00<00:00, 26844.27 examples/s]
01/12 17:50:41 - mmengine - WARNING - Dataset Dataset has no metainfo. ``dataset_meta`` in visualizer will be None.
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:10<00:00, 1.37s/it]
01/12 17:50:54 - mmengine - INFO - dispatch internlm attn forward
01/12 17:50:54 - mmengine - WARNING - Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
[2024-01-12 17:51:03,047] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown
[2024-01-12 17:51:03,047] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-12 17:51:03,048] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-01-12 17:51:03,260] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.225.4, master_port=29500
[2024-01-12 17:51:03,260] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-12 17:51:04,756] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-01-12 17:51:04,760] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-01-12 17:51:04,760] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-01-12 17:51:04,838] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = PagedAdamW32bit
[2024-01-12 17:51:04,838] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=PagedAdamW32bit type=<class 'bitsandbytes.optim.adamw.PagedAdamW32bit'>
[2024-01-12 17:51:04,838] [WARNING] [engine.py:1166:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2024-01-12 17:51:04,838] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-01-12 17:51:04,838] [INFO] [stage_1_and_2.py:148:__init__] Reduce bucket size 500,000,000
[2024-01-12 17:51:04,838] [INFO] [stage_1_and_2.py:149:__init__] Allgather bucket size 500,000,000
[2024-01-12 17:51:04,838] [INFO] [stage_1_and_2.py:150:__init__] CPU Offload: False
[2024-01-12 17:51:04,838] [INFO] [stage_1_and_2.py:151:__init__] Round robin gradient partitioning: False
[2024-01-12 17:51:05,651] [INFO] [utils.py:791:see_memory_usage] Before initializing optimizer states
[2024-01-12 17:51:05,652] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 5.93 GB CA 6.32 GB Max_CA 6 GB
[2024-01-12 17:51:05,652] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 68.27 GB, percent = 3.4%
[2024-01-12 17:51:05,884] [INFO] [utils.py:791:see_memory_usage] After initializing optimizer states
[2024-01-12 17:51:05,885] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 6.23 GB CA 6.92 GB Max_CA 7 GB
[2024-01-12 17:51:05,885] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 68.28 GB, percent = 3.4%
[2024-01-12 17:51:05,885] [INFO] [stage_1_and_2.py:516:__init__] optimizer state initialized
[2024-01-12 17:51:06,010] [INFO] [utils.py:791:see_memory_usage] After initializing ZeRO optimizer
[2024-01-12 17:51:06,010] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 5.63 GB CA 6.92 GB Max_CA 7 GB
[2024-01-12 17:51:06,010] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 68.28 GB, percent = 3.4%
[2024-01-12 17:51:06,021] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = PagedAdamW32bit
[2024-01-12 17:51:06,021] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-01-12 17:51:06,021] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-01-12 17:51:06,021] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002], mom=[(0.9, 0.999)]
[2024-01-12 17:51:06,025] [INFO] [config.py:984:print] DeepSpeedEngine configuration:
[2024-01-12 17:51:06,025] [INFO] [config.py:988:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-01-12 17:51:06,025] [INFO] [config.py:988:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] amp_enabled .................. False
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] amp_params ................... False
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] bfloat16_enabled ............. True
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] checkpoint_parallel_write_pipeline False
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] checkpoint_tag_validation_enabled True
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] checkpoint_tag_validation_fail False
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f61c4ca9000>
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] communication_data_type ...... None
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] curriculum_enabled_legacy .... False
[2024-01-12 17:51:06,026] [INFO] [config.py:988:print] curriculum_params_legacy ..... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] data_efficiency_enabled ...... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] dataloader_drop_last ......... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] disable_allgather ............ False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] dump_state ................... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] dynamic_loss_scale_args ...... None
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] eigenvalue_enabled ........... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] eigenvalue_gas_boundary_resolution 1
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] eigenvalue_layer_num ......... 0
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] eigenvalue_max_iter .......... 100
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] eigenvalue_stability ......... 1e-06
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] eigenvalue_tol ............... 0.01
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] eigenvalue_verbose ........... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] elasticity_enabled ........... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] fp16_auto_cast ............... None
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] fp16_enabled ................. False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] fp16_master_weights_and_gradients False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] global_rank .................. 0
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] grad_accum_dtype ............. None
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] gradient_accumulation_steps .. 16
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] gradient_clipping ............ 1
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] gradient_predivide_factor .... 1.0
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] graph_harvesting ............. False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] initial_dynamic_scale ........ 1
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] load_universal_checkpoint .... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] loss_scale ................... 1.0
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] memory_breakdown ............. False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] mics_hierarchial_params_gather False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] mics_shard_size .............. -1
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] optimizer_legacy_fusion ...... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] optimizer_name ............... None
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] optimizer_params ............. None
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] pld_enabled .................. False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] pld_params ................... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] prescale_gradients ........... False
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] scheduler_name ............... None
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] scheduler_params ............. None
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] seq_parallel_communication_data_type torch.float32
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] sparse_attention ............. None
[2024-01-12 17:51:06,027] [INFO] [config.py:988:print] sparse_gradients_enabled ..... False
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] steps_per_print .............. 10000000000000
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] train_batch_size ............. 32
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] train_micro_batch_size_per_gpu 2
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] use_data_before_expert_parallel_ False
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] use_node_local_storage ....... False
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] wall_clock_breakdown ......... False
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] weight_quantization_config ... None
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] world_size ................... 1
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] zero_allow_untested_optimizer True
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] zero_enabled ................. True
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] zero_force_ds_cpu_optimizer .. False
[2024-01-12 17:51:06,028] [INFO] [config.py:988:print] zero_optimization_stage ...... 2
[2024-01-12 17:51:06,028] [INFO] [config.py:974:print_user_config] json = {
"gradient_accumulation_steps": 16,
"train_micro_batch_size_per_gpu": 2,
"gradient_clipping": 1,
"zero_allow_untested_optimizer": true,
"zero_force_ds_cpu_optimizer": false,
"zero_optimization": {
"stage": 2,
"overlap_comm": true
},
"fp16": {
"enabled": false,
"initial_scale_power": 16
},
"bf16": {
"enabled": true
},
"steps_per_print": 1.000000e+13
}
01/12 17:51:06 - mmengine - INFO - Num train samples 401
01/12 17:51:06 - mmengine - INFO - train example:
01/12 17:51:06 - mmengine - INFO - <s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s><s><|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦
01/12 17:51:06 - mmengine - INFO - before_train in EvaluateChatHook.
01/12 17:51:12 - mmengine - INFO - Sample output:
<s><|User|>:请介绍一下你自己<eoh>
<|Bot|>:你好,我叫书生·浦语。我是一名人工智能助手,致力于通过执行常见的基于语言的任务和提供建议来帮助人类。我使用了深度学习技术和语言模型进行开发,并经过多轮迭代优化,以不断提高我的性能和
01/12 17:51:16 - mmengine - INFO - Sample output:
<s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:非常抱歉,我是一名人工智能助手,我没有实际的身份或背景。但我可以回答您关于科技、知识、历史、文化等方面的问题,提供有用、适当的信息。如果您有任何问题或需要帮助,请随时问我。<eoa>
</s>
01/12 17:51:16 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
01/12 17:51:16 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
01/12 17:51:16 - mmengine - INFO - Checkpoints will be saved to /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy.
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
01/12 17:51:30 - mmengine - INFO - Epoch(train) [1][ 10/201] lr: 1.9989e-04 eta: 0:13:21 time: 1.3515 data_time: 0.0034 memory: 9802 loss: 1.0719
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py:1652: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)])
01/12 17:51:42 - mmengine - INFO - Epoch(train) [1][ 20/201] lr: 1.9951e-04 eta: 0:12:37 time: 1.2484 data_time: 0.0062 memory: 9802 loss: 1.0324
01/12 17:51:55 - mmengine - INFO - Epoch(train) [1][ 30/201] lr: 1.9886e-04 eta: 0:12:15 time: 1.2508 data_time: 0.0048 memory: 9802 loss: 0.9617
01/12 17:52:07 - mmengine - INFO - Epoch(train) [1][ 40/201] lr: 1.9794e-04 eta: 0:11:59 time: 1.2611 data_time: 0.0057 memory: 9802 loss: 0.8898
01/12 17:52:20 - mmengine - INFO - Epoch(train) [1][ 50/201] lr: 1.9676e-04 eta: 0:11:43 time: 1.2518 data_time: 0.0040 memory: 9802 loss: 0.8426
01/12 17:52:32 - mmengine - INFO - Epoch(train) [1][ 60/201] lr: 1.9531e-04 eta: 0:11:29 time: 1.2544 data_time: 0.0055 memory: 9802 loss: 0.7168
01/12 17:52:45 - mmengine - INFO - Epoch(train) [1][ 70/201] lr: 1.9361e-04 eta: 0:11:15 time: 1.2550 data_time: 0.0048 memory: 9802 loss: 0.6320
01/12 17:52:57 - mmengine - INFO - Epoch(train) [1][ 80/201] lr: 1.9165e-04 eta: 0:11:02 time: 1.2628 data_time: 0.0070 memory: 9802 loss: 0.5898
01/12 17:53:10 - mmengine - INFO - after_train_iter in EvaluateChatHook.
01/12 17:53:18 - mmengine - INFO - Sample output:
<s> <|User|>:请介绍一下你自己<eoh>
<|Bot|>:作为一个语言模型AI,我没有身体和情感,但我可以通过执行常见的基于语言的任务和提供建议来帮助人类。我是通过深度学习技术和语言模型来构建的,我可以回答问题、提供定义和解释、将文本从一种语言翻译成另一种语言、总结文本、生成文本、编写故事、分析情感、提供推荐、开发算法、编写代码以及其他任何基于语言的任务。我致力于通过执行这些任务和提供建议来帮助人类。<eoa>
</s>
01/12 17:53:30 - mmengine - INFO - Sample output:
<s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:你好,我是你的人工智能助手,我的名字叫书生·浦语。我是在上海人工智能实验室的开发者们精心打造下诞生的。我能够使用汉语和英语进行交流,我的设计理念是有用、诚实并且无害。我使用了Transformer模型和深度学习技术,并进行了自监督预训练和指令微调。我能够执行常见的基于语言的任务和提供建议,但我不能执行需要身体能力的任务。我能够回答你的问题、提供定义和解释、将文本从一种语言翻译成另一种语言、总结文本、生成文本、编写故事、分析情感、提供推荐、开发算法、编写代码以及其他任何基于语言的任务。如果你有任何问题或者需要我的帮助,请随时告诉我。<eoa>
</s>
01/12 17:53:30 - mmengine - INFO - Epoch(train) [1][ 90/201] lr: 1.8944e-04 eta: 0:10:49 time: 1.2666 data_time: 0.0058 memory: 9802 loss: 0.3971
01/12 17:53:46 - mmengine - INFO - Epoch(train) [1][100/201] lr: 1.8699e-04 eta: 0:12:34 time: 3.5883 data_time: 2.0490 memory: 9802 loss: 0.3476
01/12 17:54:02 - mmengine - INFO - Epoch(train) [1][110/201] lr: 1.8430e-04 eta: 0:12:24 time: 1.6236 data_time: 0.0064 memory: 9802 loss: 0.2596
01/12 17:54:19 - mmengine - INFO - Epoch(train) [1][120/201] lr: 1.8139e-04 eta: 0:12:15 time: 1.6667 data_time: 0.0051 memory: 9802 loss: 0.2246
01/12 17:54:36 - mmengine - INFO - Epoch(train) [1][130/201] lr: 1.7825e-04 eta: 0:12:07 time: 1.7255 data_time: 0.0056 memory: 9802 loss: 0.2091
01/12 17:54:53 - mmengine - INFO - Epoch(train) [1][140/201] lr: 1.7490e-04 eta: 0:11:59 time: 1.7399 data_time: 0.0050 memory: 9802 loss: 0.1938
01/12 17:55:11 - mmengine - INFO - Epoch(train) [1][150/201] lr: 1.7135e-04 eta: 0:11:50 time: 1.7716 data_time: 0.0053 memory: 9802 loss: 0.1826
01/12 17:55:29 - mmengine - INFO - Epoch(train) [1][160/201] lr: 1.6761e-04 eta: 0:11:40 time: 1.8001 data_time: 0.0048 memory: 9802 loss: 0.1713
01/12 17:55:47 - mmengine - INFO - Epoch(train) [1][170/201] lr: 1.6368e-04 eta: 0:11:31 time: 1.8157 data_time: 0.0071 memory: 9802 loss: 0.1561
01/12 17:56:05 - mmengine - INFO - after_train_iter in EvaluateChatHook.
01/12 17:56:17 - mmengine - INFO - Sample output:
<s> <|User|>:请介绍一下你自己<eoh>
<|Bot|>:大家好,我是一个名叫书生·浦语的AI助手,出生于上海人工智能实验室。我的创造者是中国科学院计算技术研究所的科学家,他们为我赋予了强大的语言处理能力,使我能够回答问题、提供定义和解释、将文本从一种语言翻译成另一种语言、总结文本、生成文本、编写故事、分析情感、提供推荐、开发算法、编写代码以及其他任何基于语言的任务。我的名字“书生”寓意着我是一个热爱读书、热爱学习、热爱语言的人,而“浦语”则是我来自上海的人工智能实验室的代号。我希望能够通过自己的能力,帮助人类更好地理解和使用语言,让人们的生活更加便利和高效。</s>
01/12 17:56:27 - mmengine - INFO - Sample output:
<s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我叫书生·浦语。我是一个由上海人工智能实验室开发的人工智能助手,拥有跨领域的知识和能力。我的设计理念是有用、诚实并且无害。我可以使用汉语和英语进行交流。我能够回答问题、提供定义和解释、将文本从一种语言翻译成另一种语言、总结文本、生成文本、编写故事、分析情感、提供推荐、开发算法、编写代码以及其他任何基于语言的任务。但我不能看、听、尝、触摸、闻、移动、与物理世界交互、感受情感或体验感官输入、执行需要身体能力的任务。希望我可以为您提供帮助。<eoa>
</s>
01/12 17:56:27 - mmengine - INFO - Epoch(train) [1][180/201] lr: 1.5958e-04 eta: 0:11:20 time: 1.8119 data_time: 0.0055 memory: 9802 loss: 0.1491
01/12 17:56:48 - mmengine - INFO - Epoch(train) [1][190/201] lr: 1.5531e-04 eta: 0:12:02 time: 4.2737 data_time: 2.1981 memory: 9802 loss: 0.1334
01/12 17:57:09 - mmengine - INFO - Epoch(train) [1][200/201] lr: 1.5090e-04 eta: 0:11:50 time: 2.0426 data_time: 0.0048 memory: 9802 loss: 0.1236
01/12 17:57:10 - mmengine - INFO - Exp name: internlm_chat_7b_qlora_oasst1_e3_copy_20240112_175030
01/12 17:57:10 - mmengine - INFO - Saving checkpoint at 1 epochs
[2024-01-12 17:57:11,159] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint epoch_1.pth is about to be saved!
[2024-01-12 17:57:11,211] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_1.pth/mp_rank_00_model_states.pt
[2024-01-12 17:57:11,211] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_1.pth/mp_rank_00_model_states.pt...
[2024-01-12 17:57:11,483] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_1.pth/mp_rank_00_model_states.pt.
[2024-01-12 17:57:11,487] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_1.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-01-12 17:57:12,877] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_1.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-01-12 17:57:12,901] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_1.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-01-12 17:57:12,901] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint epoch_1.pth is ready now!
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
01/12 17:57:32 - mmengine - INFO - Epoch(train) [2][ 10/201] lr: 1.4589e-04 eta: 0:11:34 time: 1.9996 data_time: 0.0036 memory: 9802 loss: 0.1225
01/12 17:57:52 - mmengine - INFO - Epoch(train) [2][ 20/201] lr: 1.4120e-04 eta: 0:11:20 time: 1.9642 data_time: 0.0045 memory: 9802 loss: 0.1057
01/12 17:58:11 - mmengine - INFO - Epoch(train) [2][ 30/201] lr: 1.3640e-04 eta: 0:11:05 time: 1.9414 data_time: 0.0047 memory: 9802 loss: 0.0969
01/12 17:58:30 - mmengine - INFO - Epoch(train) [2][ 40/201] lr: 1.3150e-04 eta: 0:10:49 time: 1.9039 data_time: 0.0046 memory: 9802 loss: 0.0882
01/12 17:58:49 - mmengine - INFO - Epoch(train) [2][ 50/201] lr: 1.2651e-04 eta: 0:10:32 time: 1.8816 data_time: 0.0045 memory: 9802 loss: 0.0570
01/12 17:59:08 - mmengine - INFO - Epoch(train) [2][ 60/201] lr: 1.2145e-04 eta: 0:10:15 time: 1.8703 data_time: 0.0042 memory: 9802 loss: 0.0609
01/12 17:59:27 - mmengine - INFO - Epoch(train) [2][ 70/201] lr: 1.1634e-04 eta: 0:09:57 time: 1.8510 data_time: 0.0044 memory: 9802 loss: 0.0484
01/12 17:59:45 - mmengine - INFO - Epoch(train) [2][ 80/201] lr: 1.1118e-04 eta: 0:09:40 time: 1.8384 data_time: 0.0052 memory: 9802 loss: 0.0523
01/12 18:00:05 - mmengine - INFO - after_train_iter in EvaluateChatHook.
01/12 18:00:06 - mmengine - INFO - Sample output:
<s> <|User|>:请介绍一下你自己<eoh>
<|Bot|>:我是上海人工智能实验室的AI助手哦</s>
01/12 18:00:08 - mmengine - INFO - Sample output:
<s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s>
01/12 18:00:08 - mmengine - INFO - Epoch(train) [2][ 90/201] lr: 1.0599e-04 eta: 0:09:24 time: 2.0104 data_time: 0.0048 memory: 9802 loss: 0.0527
01/12 18:00:26 - mmengine - INFO - Epoch(train) [2][100/201] lr: 1.0078e-04 eta: 0:09:09 time: 2.1154 data_time: 0.2902 memory: 9802 loss: 0.0393
01/12 18:00:44 - mmengine - INFO - Epoch(train) [2][110/201] lr: 9.5573e-05 eta: 0:08:51 time: 1.8329 data_time: 0.0053 memory: 9802 loss: 0.0371
01/12 18:01:03 - mmengine - INFO - Epoch(train) [2][120/201] lr: 9.0377e-05 eta: 0:08:33 time: 1.8274 data_time: 0.0057 memory: 9802 loss: 0.0379
01/12 18:01:21 - mmengine - INFO - Epoch(train) [2][130/201] lr: 8.5206e-05 eta: 0:08:15 time: 1.8240 data_time: 0.0066 memory: 9802 loss: 0.0302
01/12 18:01:39 - mmengine - INFO - Epoch(train) [2][140/201] lr: 8.0076e-05 eta: 0:07:56 time: 1.8171 data_time: 0.0055 memory: 9802 loss: 0.0423
01/12 18:01:57 - mmengine - INFO - Epoch(train) [2][150/201] lr: 7.5000e-05 eta: 0:07:38 time: 1.8043 data_time: 0.0053 memory: 9802 loss: 0.0263
01/12 18:02:15 - mmengine - INFO - Epoch(train) [2][160/201] lr: 6.9992e-05 eta: 0:07:20 time: 1.8037 data_time: 0.0062 memory: 9802 loss: 0.0367
01/12 18:02:33 - mmengine - INFO - Epoch(train) [2][170/201] lr: 6.5065e-05 eta: 0:07:01 time: 1.7978 data_time: 0.0042 memory: 9802 loss: 0.0224
01/12 18:02:51 - mmengine - INFO - after_train_iter in EvaluateChatHook.
01/12 18:02:52 - mmengine - INFO - Sample output:
<s> <|User|>:请介绍一下你自己<eoh>
<|Bot|>:我是xujinzh的小助手哦</s>
01/12 18:02:54 - mmengine - INFO - Sample output:
<s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s>
01/12 18:02:54 - mmengine - INFO - Epoch(train) [2][180/201] lr: 6.0233e-05 eta: 0:06:43 time: 1.7936 data_time: 0.0064 memory: 9802 loss: 0.0248
01/12 18:03:12 - mmengine - INFO - Epoch(train) [2][190/201] lr: 5.5508e-05 eta: 0:06:27 time: 2.1232 data_time: 0.2975 memory: 9802 loss: 0.0264
01/12 18:03:31 - mmengine - INFO - Epoch(train) [2][200/201] lr: 5.0905e-05 eta: 0:06:08 time: 1.8283 data_time: 0.0069 memory: 9802 loss: 0.0199
01/12 18:03:32 - mmengine - INFO - Exp name: internlm_chat_7b_qlora_oasst1_e3_copy_20240112_175030
01/12 18:03:32 - mmengine - INFO - Saving checkpoint at 2 epochs
[2024-01-12 18:03:33,073] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint epoch_2.pth is about to be saved!
[2024-01-12 18:03:33,131] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_2.pth/mp_rank_00_model_states.pt
[2024-01-12 18:03:33,131] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_2.pth/mp_rank_00_model_states.pt...
[2024-01-12 18:03:33,423] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_2.pth/mp_rank_00_model_states.pt.
[2024-01-12 18:03:33,432] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_2.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-01-12 18:03:34,874] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_2.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-01-12 18:03:34,913] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_2.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-01-12 18:03:34,913] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint epoch_2.pth is ready now!
/root/.conda/envs/xtuner0.1.9/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
01/12 18:03:53 - mmengine - INFO - Epoch(train) [3][ 10/201] lr: 4.5996e-05 eta: 0:05:48 time: 1.8117 data_time: 0.0038 memory: 9802 loss: 0.0216
01/12 18:04:11 - mmengine - INFO - Epoch(train) [3][ 20/201] lr: 4.1686e-05 eta: 0:05:30 time: 1.8159 data_time: 0.0048 memory: 9802 loss: 0.0293
01/12 18:04:29 - mmengine - INFO - Epoch(train) [3][ 30/201] lr: 3.7535e-05 eta: 0:05:11 time: 1.8108 data_time: 0.0062 memory: 9802 loss: 0.0346
01/12 18:04:47 - mmengine - INFO - Epoch(train) [3][ 40/201] lr: 3.3553e-05 eta: 0:04:53 time: 1.8056 data_time: 0.0052 memory: 9802 loss: 0.0289
01/12 18:05:05 - mmengine - INFO - Epoch(train) [3][ 50/201] lr: 2.9751e-05 eta: 0:04:35 time: 1.8091 data_time: 0.0042 memory: 9802 loss: 0.0269
01/12 18:05:23 - mmengine - INFO - Epoch(train) [3][ 60/201] lr: 2.6140e-05 eta: 0:04:16 time: 1.7973 data_time: 0.0070 memory: 9802 loss: 0.0209
01/12 18:05:41 - mmengine - INFO - Epoch(train) [3][ 70/201] lr: 2.2730e-05 eta: 0:03:58 time: 1.7951 data_time: 0.0050 memory: 9802 loss: 0.0220
01/12 18:05:59 - mmengine - INFO - Epoch(train) [3][ 80/201] lr: 1.9529e-05 eta: 0:03:40 time: 1.7956 data_time: 0.0052 memory: 9802 loss: 0.0262
01/12 18:06:17 - mmengine - INFO - after_train_iter in EvaluateChatHook.
01/12 18:06:19 - mmengine - INFO - Sample output:
<s> <|User|>:请介绍一下你自己<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s>
01/12 18:06:21 - mmengine - INFO - Sample output:
<s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s>
01/12 18:06:21 - mmengine - INFO - Epoch(train) [3][ 90/201] lr: 1.6547e-05 eta: 0:03:22 time: 1.7952 data_time: 0.0054 memory: 9802 loss: 0.0213
01/12 18:06:39 - mmengine - INFO - Epoch(train) [3][100/201] lr: 1.3791e-05 eta: 0:03:04 time: 2.2411 data_time: 0.4197 memory: 9802 loss: 0.0242
01/12 18:06:57 - mmengine - INFO - Epoch(train) [3][110/201] lr: 1.1269e-05 eta: 0:02:46 time: 1.8307 data_time: 0.0044 memory: 9802 loss: 0.0193
01/12 18:07:16 - mmengine - INFO - Epoch(train) [3][120/201] lr: 8.9877e-06 eta: 0:02:28 time: 1.8186 data_time: 0.0062 memory: 9802 loss: 0.0259
01/12 18:07:34 - mmengine - INFO - Epoch(train) [3][130/201] lr: 6.9535e-06 eta: 0:02:09 time: 1.8401 data_time: 0.0055 memory: 9802 loss: 0.0260
01/12 18:07:52 - mmengine - INFO - Epoch(train) [3][140/201] lr: 5.1718e-06 eta: 0:01:51 time: 1.8188 data_time: 0.0055 memory: 9802 loss: 0.0225
01/12 18:08:10 - mmengine - INFO - Epoch(train) [3][150/201] lr: 3.6474e-06 eta: 0:01:33 time: 1.8141 data_time: 0.0060 memory: 9802 loss: 0.0234
01/12 18:08:28 - mmengine - INFO - Epoch(train) [3][160/201] lr: 2.3845e-06 eta: 0:01:14 time: 1.7971 data_time: 0.0046 memory: 9802 loss: 0.0243
01/12 18:08:46 - mmengine - INFO - Epoch(train) [3][170/201] lr: 1.3865e-06 eta: 0:00:56 time: 1.7919 data_time: 0.0052 memory: 9802 loss: 0.0191
01/12 18:09:04 - mmengine - INFO - after_train_iter in EvaluateChatHook.
01/12 18:09:06 - mmengine - INFO - Sample output:
<s> <|User|>:请介绍一下你自己<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s>
01/12 18:09:08 - mmengine - INFO - Sample output:
<s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s>
01/12 18:09:08 - mmengine - INFO - Epoch(train) [3][180/201] lr: 6.5615e-07 eta: 0:00:38 time: 1.7898 data_time: 0.0058 memory: 9802 loss: 0.0237
01/12 18:09:27 - mmengine - INFO - Epoch(train) [3][190/201] lr: 1.9537e-07 eta: 0:00:20 time: 2.2379 data_time: 0.4234 memory: 9802 loss: 0.0179
01/12 18:09:45 - mmengine - INFO - Epoch(train) [3][200/201] lr: 5.4286e-09 eta: 0:00:01 time: 1.8287 data_time: 0.0062 memory: 9802 loss: 0.0204
01/12 18:09:46 - mmengine - INFO - Exp name: internlm_chat_7b_qlora_oasst1_e3_copy_20240112_175030
01/12 18:09:46 - mmengine - INFO - Saving checkpoint at 3 epochs
[2024-01-12 18:09:47,155] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint epoch_3.pth is about to be saved!
[2024-01-12 18:09:47,201] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/mp_rank_00_model_states.pt
[2024-01-12 18:09:47,201] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/mp_rank_00_model_states.pt...
[2024-01-12 18:09:47,471] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/mp_rank_00_model_states.pt.
[2024-01-12 18:09:47,477] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-01-12 18:09:48,901] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-01-12 18:09:48,938] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-01-12 18:09:48,939] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint epoch_3.pth is ready now!
01/12 18:09:48 - mmengine - INFO - after_train in EvaluateChatHook.
01/12 18:09:51 - mmengine - INFO - Sample output:
<s> <|User|>:请介绍一下你自己<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s>
01/12 18:09:53 - mmengine - INFO - Sample output:
<s> <|User|>:请做一下自我介绍<eoh>
<|Bot|>:我是xujinzh的小助手,内在是上海AI实验室书生·浦语的7B大模型哦</s>
PTH 转换为 HuggingFace 格式 1 %mkdir /root/personal_assistant/config/work_dirs/hf
1 %env MKL_SERVICE_FORCE_INTEL=1
env: MKL_SERVICE_FORCE_INTEL=1
1 !xtuner convert pth_to_hf /root/personal_assistant/config/internlm_chat_7b_qlora_oasst1_e3_copy.py /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/ /root/personal_assistant/config/work_dirs/hf/
[2024-01-12 18:18:00,034] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-12 18:18:50,826] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:17<00:00, 2.17s/it]
01/12 18:19:18 - mmengine - INFO - dispatch internlm attn forward
01/12 18:19:18 - mmengine - WARNING - Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
Processing zero checkpoint '/root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/'
Detected checkpoint of type zero stage 2, world_size: 1
Parsing checkpoint created by deepspeed==0.12.6
Reconstructed fp32 state dict with 448 params 159907840 elements
Load PTH model from /root/personal_assistant/config/work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy/epoch_3.pth/
Convert weights to float16
Saving HuggingFace model to /root/personal_assistant/config/work_dirs/hf/
All done!
将 HuggingFace adapter 合并到大语言模型 1 %env MKL_SERVICE_FORCE_INTEL=1
env: MKL_SERVICE_FORCE_INTEL=1
1 %env MKL_THREADING_LAYER='GNU'
env: MKL_THREADING_LAYER='GNU'
1 %mkdir /root/personal_assistant/config/work_dirs/merged
1 !xtuner convert merge /root/personal_assistant/model/Shanghai_AI_Laboratory/internlm-chat-7b/ /root/personal_assistant/config/work_dirs/hf /root/personal_assistant/config/work_dirs/merged --max -shard-size 2GB
[2024-01-12 18:22:18,813] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:14<00:00, 1.84s/it]
Saving to /root/personal_assistant/config/work_dirs/merged...
All done!
网页使用 1 %pip install -q streamlit==1.24 .0
Interlm-langchain-RAG.ipynb personal_assistant/
XTuner-大模型训练.ipynb* share@
code/ xtuner/
config.json xtuner019/
data_base/ 书生浦语大模型使用.ipynb
nltk_data/ '基于 XTuner 的大模型训练获得私人智能助手.ipynb'
/root/personal_assistant
/root/.conda/envs/personal_assistant/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.
self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
/root/personal_assistant/code
1 !git clone https://github.com/InternLM/InternLM.git
Cloning into 'InternLM'...
remote: Enumerating objects: 2987, done.
remote: Counting objects: 100% (1778/1778), done.
remote: Compressing objects: 100% (657/657), done.
remote: Total 2987 (delta 1367), reused 1329 (delta 1110), pack-reused 1209
Receiving objects: 100% (2987/2987), 4.95 MiB | 6.34 MiB/s, done.
Resolving deltas: 100% (1921/1921), done.
1 !sed -i "s#internlm/internlm-chat-7b#/root/personal_assistant/config/work_dirs/merged#g" /root/personal_assistant/code/InternLM/web_demo.py
1 %pip install -q torch==1.13 .1 +cu117 torchvision==0.14 .1 +cu117 torchaudio==0.13 .1 --extra-index-url https://download.pytorch.org/whl/cu117
1 %pip install -q transformers sentencepiece accelerate
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Note: you may need to restart the kernel to use updated packages.
1 2 %cd /root/personal_assistant/code/InternLM !{sys.executable} -m streamlit run web_demo.py --server.address 127.0 .0 .1 --server.port 6006
/root/personal_assistant/code/InternLM
/root/.conda/envs/personal_assistant/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.
self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
.. You can now view your Streamlit app in your browser.
URL: .http://127.0.0.1:6006
load model begin.
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:12<00:00, 1.55s/it]
load model end.
load model begin.
load model end.
参考文献
XTuner 大模型单卡低成本微调实践
XTuner InternLM-Chat 个人小助手认知微调实践