Merge docker-llm-amd into Jafner.net
This commit is contained in:
commit
0591fa4c6d
1
docker-llm-amd/.env
Normal file
1
docker-llm-amd/.env
Normal file
@ -0,0 +1 @@
|
||||
MODELS_DIR=~/Git/docker-llm-amd/models
|
2
docker-llm-amd/.gitignore
vendored
Normal file
2
docker-llm-amd/.gitignore
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
models/
|
||||
ollama/modelfiles/
|
25
docker-llm-amd/FLASH-ATTENTION.md
Normal file
25
docker-llm-amd/FLASH-ATTENTION.md
Normal file
@ -0,0 +1,25 @@
|
||||
# Flash Attention in Docker on AMD is Not Yet Working
|
||||
Below are my notes on the efforts I've made to get it working.
|
||||
|
||||
```Dockerfile
|
||||
FROM rocm/pytorch-nightly:latest
|
||||
COPY . .
|
||||
RUN git clone --recursive https://github.com/ROCm/flash-attention.git /tmp/flash-attention
|
||||
WORKDIR /tmp/flash-attention
|
||||
ENV MAX_JOBS=8
|
||||
RUN pip install -v .
|
||||
```
|
||||
|
||||
# Resources
|
||||
1. [What is Flash-attention? (How do i use it with Oobabooga?) :...](https://www.reddit.com/r/Oobabooga/comments/193mcv0/what_is_flashattention_how_do_i_use_it_with/)
|
||||
2. [Adding flash attention to one click installer · Issue #4015 ...](https://github.com/oobabooga/text-generation-webui/issues/4015)
|
||||
3. [Accelerating Large Language Models with Flash Attention on A...](https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html)
|
||||
4. [GitHub - Dao-AILab/flash-attention: Fast and memory-efficien...](https://github.com/Dao-AILab/flash-attention)
|
||||
5. [GitHub - ROCm/llvm-project: This is the AMD-maintained fork ...](https://github.com/ROCm/llvm-project)
|
||||
6. [GitHub - ROCm/AITemplate: AITemplate is a Python framework w...](https://github.com/ROCm/AITemplate)
|
||||
7. [Stable diffusion with RX7900XTX on ROCm5.7 · ROCm/composable...](https://github.com/ROCm/composable_kernel/discussions/1032#522-build-ait-and-stable-diffusion-demo)
|
||||
8. [Current state of training on AMD Radeon 7900 XTX (with bench...](https://www.reddit.com/r/LocalLLaMA/comments/1atvxu2/current_state_of_training_on_amd_radeon_7900_xtx/) [[Current state of training on AMD Radeon 7900 XTX (with benchmarks) rLocalLLaMA]]
|
||||
9. [llm-tracker - howto/AMD GPUs](https://llm-tracker.info/howto/AMD-GPUs)
|
||||
10. [RDNA3 support · Issue #27 · ROCm/flash-attention · GitHub](https://github.com/ROCm/flash-attention/issues/27)
|
||||
11. [GitHub - ROCm/xformers: Hackable and optimized Transformers ...](https://github.com/ROCm/xformers/tree/develop)
|
||||
12. [\[ROCm\] support Radeon™ 7900 series (gfx1100) without using...](https://github.com/vllm-project/vllm/pull/2768)
|
63
docker-llm-amd/README.md
Normal file
63
docker-llm-amd/README.md
Normal file
@ -0,0 +1,63 @@
|
||||
### What we have so far
|
||||
|
||||
1. [Ollama](https://github.com/ollama/ollama) loads and serves a few models via API.
|
||||
- Ollama itself doesn't have a UI. CLI and API only.
|
||||
- The API can be accessed at [`https://api.ollama.jafner.net`](https://api.ollama.jafner.net).
|
||||
- Ollama running as configured supports ROCm (GPU acceleration).
|
||||
- Configured models are described [here](/ollama/modelfiles/), and
|
||||
- Run Ollama with: `HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0 ollama serve`
|
||||
2. [Open-webui](https://github.com/open-webui/open-webui) provides a pretty web interface for interacting with Ollama.
|
||||
- The web UI can be accessed at [`https://ollama.jafner.net`](https://ollama.jafner.net).
|
||||
- The web UI is protected by Traefik's `lan-only` rule, as well as its own authentication layer.
|
||||
- Run open-webui with: `cd ~/Projects/LLMs/open-webui && docker compose up -d && docker compose logs -f`
|
||||
- Then open [the page](https://ollama.jafner.net) and log in.
|
||||
- Connect the frontend to the ollama instance by opening the settings (top-right), clicking "Connections", and setting "Ollama Base URL" to "https://api.ollama.jafner.net". Hit refresh and begin using.
|
||||
3. [SillyTavern](https://github.com/SillyTavern/SillyTavern) provides a powerful interface for building and using characters.
|
||||
- Run SillyTavern with: `cd ~/Projects/LLMs/SillyTavern && ./start.sh`
|
||||
4. [Oobabooga](https://github.com/oobabooga/text-generation-webui) provides a more powerful web UI than open-webui, but it's less pretty.
|
||||
- Run Oobabooga with: `cd ~/Projects/LLMs/text-generation-webui && ./start_linux.sh`
|
||||
- Requires the following environment variables be set in `one_click.py` (right after import statements):
|
||||
```
|
||||
os.environ["ROCM_PATH"] = '/opt/rocm'
|
||||
os.environ["HSA_OVERRIDE_GFX_VERSION"] = '11.0.0'
|
||||
os.environ["HCC_AMDGPU_TARGET"] = 'gfx1100'
|
||||
os.environ["PATH"] = '/opt/rocm/bin:$PATH'
|
||||
os.environ["LD_LIBRARY_PATH"] = '/opt/rocm/lib:$LD_LIBRARY_PATH'
|
||||
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
|
||||
os.environ["HCC_SERIALIZE_KERNEL"] = '0x3'
|
||||
os.environ["HCC_SERIALIZE_KERNEL"]='0x3'
|
||||
os.environ["HCC_SERIALIZE_COPY"]='0x3'
|
||||
os.environ["HIP_TRACE_API"]='0x2'
|
||||
os.environ["HF_TOKEN"]='<my-huggingface-token>'
|
||||
```
|
||||
- Requires the following environment variable be set in `start_linux.sh` for access to non-public model downloads:
|
||||
```
|
||||
# config
|
||||
HF_TOKEN="<my-huggingface-token>"
|
||||
```
|
||||
|
||||
That's where we're at.
|
||||
|
||||
### Set Up Models Directory
|
||||
1. Navigate to the source directory with all models: `cd "~/Nextcloud/Large Language Models/GGUF/"`
|
||||
2. Symlink each file into the docker project's models directory: `for model in ./*; do ln $(realpath $model) $(realpath ~/Git/docker-llm-amd/models/$model); done`
|
||||
- Note that the symlinks must be hardlinks or they will not be passed properly into containers.
|
||||
3. Launch ollama: `docker compose up -d ollama`
|
||||
4. Create models defined by the modelfiles: `docker compose exec -dit ollama /modelfiles/.loadmodels.sh`
|
||||
|
||||
### Roadmap
|
||||
- Set up StableDiffusion-web-UI.
|
||||
- Get characters in SillyTavern behaving as expected.
|
||||
- Repetition issues.
|
||||
- Obsession with certain parts of prompt.
|
||||
- Refusals.
|
||||
- Set up something for character voices.
|
||||
- [Coqui TTS - Docker install](https://github.com/coqui-ai/TTS/tree/dev?tab=readme-ov-file#docker-image).
|
||||
- [TTS Generation Web UI](https://github.com/rsxdalv/tts-generation-webui).
|
||||
|
||||
- Set up Extras for SillyTavern.
|
||||
|
||||
### Notes
|
||||
- So many of these projects use Python with its various version and dependencies and shit.
|
||||
- *Always* use a Docker container or virtual environment.
|
||||
- It's like a condom.
|
142
docker-llm-amd/docker-compose.yml
Normal file
142
docker-llm-amd/docker-compose.yml
Normal file
@ -0,0 +1,142 @@
|
||||
# Addresses:
|
||||
# ollama :11434
|
||||
# open-webui :3000
|
||||
# sillytavern :8000
|
||||
# sdwebui :7868
|
||||
# oobabooga :7860 :5010
|
||||
# exui :5030
|
||||
|
||||
version: '3'
|
||||
name: 'ai'
|
||||
services:
|
||||
ollama:
|
||||
container_name: ai_ollama
|
||||
image: ollama/ollama:rocm
|
||||
networks:
|
||||
- ai
|
||||
privileged: false
|
||||
group_add:
|
||||
- video
|
||||
ports:
|
||||
- 11434:11434
|
||||
devices:
|
||||
- /dev/kfd
|
||||
- /dev/dri
|
||||
volumes:
|
||||
- ./ollama/modelfiles:/modelfiles
|
||||
- $MODELS_DIR:/models
|
||||
- ollama-model-storage:/root/.ollama/models/blobs
|
||||
environment:
|
||||
- OLLAMA_ORIGINS="app://obsidian.md*"
|
||||
- OLLAMA_MAX_LOADED_MODELS=0
|
||||
|
||||
open-webui:
|
||||
container_name: ai_open-webui
|
||||
image: ghcr.io/open-webui/open-webui:main
|
||||
ports:
|
||||
- 3000:8080
|
||||
networks:
|
||||
- ai
|
||||
volumes:
|
||||
- open-webui:/app/backend/data
|
||||
environment:
|
||||
- OLLAMA_BASE_URL=http://ollama:11434
|
||||
|
||||
sillytavern:
|
||||
container_name: ai_sillytavern
|
||||
image: ghcr.io/sillytavern/sillytavern:staging
|
||||
networks:
|
||||
- ai
|
||||
privileged: false
|
||||
ports:
|
||||
- 8000:8000/tcp
|
||||
volumes:
|
||||
- ./sillytavern/config/config.yaml:/home/node/app/config/config.yaml
|
||||
environment:
|
||||
- TZ=America/Los_Angeles
|
||||
|
||||
sdwebui:
|
||||
container_name: ai_sdwebui
|
||||
build:
|
||||
context: ./sdwebui
|
||||
networks:
|
||||
- ai
|
||||
privileged: false
|
||||
group_add:
|
||||
- video
|
||||
ports:
|
||||
- 7868:7860
|
||||
devices:
|
||||
- /dev/kfd
|
||||
- /dev/dri
|
||||
volumes:
|
||||
- ./models_t2i:/dockerx/stable-diffusion-webui-amdgpu/models
|
||||
- ./sdwebui/images:/images
|
||||
- sdwebui_cache:/dockerx/stable-diffusion-webui-amdgpu/models/ONNX
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 16G
|
||||
|
||||
oobabooga:
|
||||
container_name: ai_oobabooga
|
||||
image: atinoda/text-generation-webui:base-rocm
|
||||
environment:
|
||||
- EXTRA_LAUNCH_ARGS="--listen --verbose --chat-buttons --use_flash_attention_2 --flash-attn --api --extensions openai"
|
||||
stdin_open: true
|
||||
tty: true
|
||||
networks:
|
||||
- ai
|
||||
ipc: host
|
||||
group_add:
|
||||
- video
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
security_opt:
|
||||
- seccomp=unconfined
|
||||
ports:
|
||||
- 7860:7860
|
||||
- 5010:5000
|
||||
devices:
|
||||
- /dev/kfd
|
||||
- /dev/dri
|
||||
volumes:
|
||||
- $MODELS_DIR:/app/models
|
||||
- oobabooga_cache:/root/.cache
|
||||
- ./oobabooga/characters:/app/characters
|
||||
- ./oobabooga/instruction-templates:/app/instruction-templates
|
||||
- ./oobabooga/loras:/app/loras
|
||||
- ./oobabooga/presets:/app/presets
|
||||
- ./oobabooga/prompts:/app/prompts
|
||||
- ./oobabooga/training:/app/training
|
||||
|
||||
exui:
|
||||
container_name: ai_exui
|
||||
build:
|
||||
context: ./exl2
|
||||
networks:
|
||||
- ai
|
||||
privileged: false
|
||||
group_add:
|
||||
- video
|
||||
ports:
|
||||
- 5030:5000
|
||||
devices:
|
||||
- /dev/kfd
|
||||
- /dev/dri
|
||||
volumes:
|
||||
- $MODELS_DIR:/models
|
||||
|
||||
volumes:
|
||||
ollama-model-storage:
|
||||
open-webui:
|
||||
sdwebui_cache:
|
||||
oobabooga:
|
||||
oobabooga_cache:
|
||||
networks:
|
||||
ai:
|
||||
name: "ai"
|
||||
ipam:
|
||||
driver: default
|
||||
config:
|
||||
- subnet: 172.20.0.0/16
|
11
docker-llm-amd/exui/Dockerfile
Normal file
11
docker-llm-amd/exui/Dockerfile
Normal file
@ -0,0 +1,11 @@
|
||||
FROM python:3.10-bookworm
|
||||
RUN apt update && \
|
||||
apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
WORKDIR /usr/src/app
|
||||
COPY requirements.txt ./
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
RUN git clone https://github.com/turboderp/exui
|
||||
WORKDIR /usr/src/app/exui
|
||||
EXPOSE 5000
|
||||
CMD ["python", "-u", "server.py", "--host", "0.0.0.0:5000"]
|
45
docker-llm-amd/exui/requirements.txt
Normal file
45
docker-llm-amd/exui/requirements.txt
Normal file
@ -0,0 +1,45 @@
|
||||
blinker==1.8.2
|
||||
certifi==2024.2.2
|
||||
charset-normalizer==3.3.2
|
||||
click==8.1.7
|
||||
cramjam==2.8.3
|
||||
exllamav2 @ https://github.com/turboderp/exllamav2/releases/download/v0.0.21/exllamav2-0.0.21+rocm6.0-cp310-cp310-linux_x86_64.whl
|
||||
fastparquet==2024.5.0
|
||||
filelock==3.13.1
|
||||
Flask==3.0.3
|
||||
fsspec==2024.2.0
|
||||
huggingface-hub==0.23.1
|
||||
idna==3.7
|
||||
itsdangerous==2.2.0
|
||||
Jinja2==3.1.3
|
||||
MarkupSafe==2.1.5
|
||||
mpmath==1.3.0
|
||||
networkx==3.2.1
|
||||
ninja==1.11.1.1
|
||||
numpy==1.26.3
|
||||
packaging==24.0
|
||||
pandas==2.2.2
|
||||
pillow==10.2.0
|
||||
Pygments==2.18.0
|
||||
pynvml==11.5.0
|
||||
python-dateutil==2.9.0.post0
|
||||
pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-2.3.0-cp310-cp310-linux_x86_64.whl
|
||||
pytz==2024.1
|
||||
PyYAML==6.0.1
|
||||
regex==2024.5.15
|
||||
requests==2.32.2
|
||||
safetensors==0.4.3
|
||||
sentencepiece==0.2.0
|
||||
six==1.16.0
|
||||
sympy==1.12
|
||||
tokenizers==0.19.1
|
||||
torch @ https://download.pytorch.org/whl/rocm6.0/torch-2.3.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
|
||||
torchaudio @ https://download.pytorch.org/whl/rocm6.0/torchaudio-2.3.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
|
||||
torchvision @ https://download.pytorch.org/whl/rocm6.0/torchvision-0.18.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
|
||||
tqdm==4.66.4
|
||||
typing_extensions==4.9.0
|
||||
tzdata==2024.1
|
||||
urllib3==2.2.1
|
||||
waitress==3.0.0
|
||||
websockets==12.0
|
||||
Werkzeug==3.0.3
|
90
docker-llm-amd/ollama/README.md
Normal file
90
docker-llm-amd/ollama/README.md
Normal file
@ -0,0 +1,90 @@
|
||||
# Ollama Notes
|
||||
Per: [Ollama/Ollama README](https://github.com/ollama/ollama)
|
||||
|
||||
## Install Steps
|
||||
Per: [linux.md](https://github.com/ollama/ollama/blob/main/docs/linux.md)
|
||||
|
||||
1. Download the binary: `sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama`
|
||||
2. Make the binary executable: `sudo chmod +x /usr/bin/ollama`
|
||||
3. Create a user for ollama: `sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama`
|
||||
4. Create a SystemD service file for ollama: `sudo nano /etc/systemd/system/ollama.service` and populate it with the following.
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Ollama Service
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Environment='HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0'
|
||||
ExecStart=/usr/bin/ollama serve
|
||||
User=ollama
|
||||
Group=ollama
|
||||
Restart=always
|
||||
RestartSec=3
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
```
|
||||
5. Register and enable the ollama service: `sudo systemctl daemon-reload && sudo systemctl enable ollama`
|
||||
6. Start ollama: `sudo systemctl start ollama`
|
||||
|
||||
### Enable ROCm Support
|
||||
Per: [anvesh.jhuboo on Medium](https://medium.com/@anvesh.jhuboo/rocm-pytorch-on-fedora-51224563e5be)
|
||||
|
||||
1. Add user to `video` group to allow access to GPU resources: `sudo usermod -aG video $LOGNAME`
|
||||
2. Install `rocminfo` package: `sudo dnf install rocminfo`
|
||||
3. Check for rocm support: `rocminfo`
|
||||
4. Install `rocm-opencl` package: `sudo dnf install rocm-opencl`
|
||||
5. Install `rocm-clinfo` package: `sudo dnf install rocm-clinfo`
|
||||
6. Verify opencl is working: `rocm-clinfo`
|
||||
7. Get the GFX version of your GPU: `rocminfo | grep gfx | head -n 1 | tr -s ' ' | cut -d' ' -f 3`
|
||||
- The GFX version given is a stripped version number.
|
||||
- My Radeon 7900 XTX has a gfx string of `gfx1100`, which correlates with HSA GFX version 11.0.0.
|
||||
- Other cards commonly have a string of `gfx1030`, which correlates with HSA GFS version 10.3.0.
|
||||
- There's a little bit more info [here](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html).
|
||||
8. Export your gfx version in `~/.bashrc`: `echo "export HSA_OVERRIDE_GFX_VERSION=11.0.0" >> ~/.bashrc && source ~/.bashrc`
|
||||
|
||||
- Is this even part of the same thing? I ran `sudo dnf install https://repo.radeon.com/amdgpu-install/6.0.2/rhel/9.3/amdgpu-install-6.0.60002-1.el9.noarch.rpm`
|
||||
- Maybe this is the right place to look? [Fedora wiki - AMD ROCm](https://fedoraproject.org/wiki/SIGs/HC)
|
||||
|
||||
## Run Ollama
|
||||
1. Test Ollama is working: `ollama run gemma:2b`
|
||||
- Runs (downloads) the smallest model in [Ollama's library](https://ollama.com/library).
|
||||
2. Run as a docker container: `docker run -d --device /dev/kfd --device /dev/dri -v /usr/lib64:/opt/lib64:ro -e HIP_PATH=/opt/lib64/rocm -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama && docker logs -f ollama`
|
||||
|
||||
|
||||
## Update Ollama
|
||||
1. Redownload the binary: `sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama`
|
||||
2. Make the binary executable: `sudo chmod +x /usr/bin/ollama`
|
||||
|
||||
## Create Model from Modelfile
|
||||
|
||||
`ollama create <model name> -f <modelfile relative path>`
|
||||
Where the modelfile is like:
|
||||
```
|
||||
# Choose either a model tag to download from ollama.com/library, or a path to a local model file (relative to the path of the modelfile).
|
||||
FROM ../Models/codellama-7b.Q8_0.gguf
|
||||
|
||||
# set the chatml template passed to the model
|
||||
TEMPLATE """<|im_start|>system
|
||||
{{ .System }}<|im_end|>
|
||||
<|im_start|>user
|
||||
{{ .Prompt }}<|im_end|>
|
||||
<|im_start|>assistant
|
||||
"""
|
||||
|
||||
# set the temperature to 1 [higher is more creative, lower is more coherent]
|
||||
PARAMETER temperature 1
|
||||
|
||||
# set the system message
|
||||
SYSTEM """
|
||||
You are a senior devops engineer, acting as an assistant. You offer help with cloud technologies like: Terraform, AWS, kubernetes, python. You answer with code examples when possible
|
||||
"""
|
||||
|
||||
# not sure what this does lol
|
||||
PARAMETER stop "<|im_start|>"
|
||||
PARAMETER stop "<|im_end|>"
|
||||
```
|
||||
|
||||
## Unload a Model
|
||||
There's no official support for this in the `ollama` CLI, but we can make it happen with the API:
|
||||
`curl https://api.ollama.jafner.net/api/generate -d '{"model": "<MODEL TO UNLOAD>", "keep_alive": 0}'`
|
6
docker-llm-amd/ollama/modelfiles/.loadmodels.sh
Executable file
6
docker-llm-amd/ollama/modelfiles/.loadmodels.sh
Executable file
@ -0,0 +1,6 @@
|
||||
#!/bin/bash
|
||||
for modelfile in /modelfiles/*; do
|
||||
echo -n "Running: '"
|
||||
echo "ollama create \"$(basename $modelfile)\" -f \"$modelfile\"'"
|
||||
ollama create "$(basename $modelfile)" -f "$modelfile"
|
||||
done
|
22
docker-llm-amd/ollama/sync-modelfiles
Executable file
22
docker-llm-amd/ollama/sync-modelfiles
Executable file
@ -0,0 +1,22 @@
|
||||
#!/bin/bash
|
||||
# THIS SCRIPT DOES NOT WORK RIGHT NOW
|
||||
# The script is fine, it just needs the modelfiles to be written with reference
|
||||
# to the models folder relative to the host system, rather than inside the
|
||||
# container. We're using ./modelfiles/.loadmodels.sh instead right now.
|
||||
|
||||
modelfiles="$(ls ./modelfiles/)"
|
||||
models="$(ollama list | tr -s ' ' | cut -f 1 | tail -n +2)"
|
||||
for model in $(echo "$models"); do
|
||||
if ! [[ $modelfiles == *"$model"* ]]; then
|
||||
echo -n "Running: '"
|
||||
echo "ollama rm \"$model\"'"
|
||||
ollama rm "$model"
|
||||
fi
|
||||
done
|
||||
|
||||
cd ./modelfiles
|
||||
for modelfile in ./*; do
|
||||
echo -n "Running: '"
|
||||
echo "ollama create \"$(basename $modelfile)\" -f \"$modelfile\"'"
|
||||
ollama create "$(basename $modelfile)" -f "$modelfile"
|
||||
done
|
10
docker-llm-amd/oobabooga.env
Normal file
10
docker-llm-amd/oobabooga.env
Normal file
@ -0,0 +1,10 @@
|
||||
TORCH_CUDA_ARCH_LIST=7.5
|
||||
HOST_PORT=7860
|
||||
CONTAINER_PORT=7860
|
||||
HOST_API_PORT=5020
|
||||
CONTAINER_API_PORT=5000
|
||||
BUILD_EXTENSIONS=""
|
||||
APP_RUNTIME_GID=1000
|
||||
APP_GID=1000
|
||||
APP_UID=1000
|
||||
HF_HOME=/home/app/text-generation-webui/cache/
|
66
docker-llm-amd/oobabooga/Dockerfile
Normal file
66
docker-llm-amd/oobabooga/Dockerfile
Normal file
@ -0,0 +1,66 @@
|
||||
# Cloned from: https://github.com/Atinoda/text-generation-webui-docker/blob/master/Dockerfile
|
||||
# Modified to install Flash-Attention-2 for AMD ROCm.
|
||||
# Install instructions for FA2 are based on:
|
||||
# https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html#using-pytorch-upstream-docker-image
|
||||
# and:
|
||||
# https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html
|
||||
# Also trimmed original comments and replaced with new.
|
||||
|
||||
# Base build layer
|
||||
FROM ubuntu:22.04 AS app_base
|
||||
RUN apt-get update && apt-get install --no-install-recommends -y \
|
||||
git vim build-essential python3-dev python3-venv python3-pip
|
||||
RUN pip3 install virtualenv
|
||||
RUN virtualenv /venv
|
||||
ENV VIRTUAL_ENV=/venv
|
||||
RUN python3 -m venv $VIRTUAL_ENV
|
||||
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
|
||||
RUN pip3 install --upgrade pip setuptools
|
||||
COPY ./scripts /scripts
|
||||
RUN chmod +x /scripts/*
|
||||
RUN git clone https://github.com/oobabooga/text-generation-webui /src
|
||||
ARG VERSION_TAG
|
||||
ENV VERSION_TAG=${VERSION_TAG}
|
||||
RUN . /scripts/checkout_src_version.sh
|
||||
RUN cp -ar /src /app
|
||||
|
||||
# AMD build layer
|
||||
FROM app_base AS app_rocm
|
||||
RUN pip3 install --pre torch torchvision torchaudio \
|
||||
--index-url https://download.pytorch.org/whl/nightly/rocm6.1
|
||||
RUN pip3 install -r /app/requirements_amd.txt
|
||||
RUN git clone --recursive https://github.com/ROCm/flash-attention.git /src-fa
|
||||
RUN cd /src-fa && MAX_JOBS=$((`nproc` / 2)) pip install -v .
|
||||
FROM app_rocm AS app_rocm_x
|
||||
RUN chmod +x /scripts/build_extensions.sh && \
|
||||
. /scripts/build_extensions.sh
|
||||
|
||||
# Base run layer
|
||||
FROM ubuntu:22.04 AS run_base
|
||||
RUN apt-get update && apt-get install --no-install-recommends -y \
|
||||
python3-venv python3-dev git
|
||||
COPY --from=app_base /app /app
|
||||
COPY --from=app_base /src /src
|
||||
ENV VIRTUAL_ENV=/venv
|
||||
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
|
||||
WORKDIR /app
|
||||
EXPOSE 7860
|
||||
EXPOSE 5000
|
||||
EXPOSE 5005
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
ARG BUILD_DATE
|
||||
ENV BUILD_DATE=$BUILD_DATE
|
||||
RUN echo "$BUILD_DATE" > /build_date.txt
|
||||
ARG VERSION_TAG
|
||||
ENV VERSION_TAG=$VERSION_TAG
|
||||
RUN echo "$VERSION_TAG" > /version_tag.txt
|
||||
COPY ./scripts /scripts
|
||||
RUN chmod +x /scripts/*
|
||||
ENTRYPOINT ["/scripts/docker-entrypoint.sh"]
|
||||
|
||||
# AMD run layer
|
||||
FROM run_base AS default-rocm
|
||||
COPY --from=app_rocm_x $VIRTUAL_ENV $VIRTUAL_ENV
|
||||
RUN echo "ROCM Extended" > /variant.txt
|
||||
ENV EXTRA_LAUNCH_ARGS=""
|
||||
CMD ["python3", "/app/server.py"]
|
17
docker-llm-amd/sdwebui/Dockerfile
Normal file
17
docker-llm-amd/sdwebui/Dockerfile
Normal file
@ -0,0 +1,17 @@
|
||||
FROM rocm/pytorch:latest
|
||||
WORKDIR /dockerx
|
||||
RUN git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu
|
||||
WORKDIR /dockerx/stable-diffusion-webui-amdgpu
|
||||
RUN python -m pip install clip open-clip-torch onnxruntime-training xformers
|
||||
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui-assets.git repositories/stable-diffusion-webui-assets
|
||||
RUN git clone https://github.com/Stability-AI/stablediffusion.git repositories/stable-diffusion-stability-ai
|
||||
RUN git clone https://github.com/Stability-AI/generative-models.git repositories/generative-models
|
||||
RUN git clone https://github.com/crowsonkb/k-diffusion.git repositories/k-diffusion
|
||||
RUN git clone https://github.com/salesforce/BLIP.git repositories/BLIP
|
||||
|
||||
RUN python -m pip install --upgrade pip wheel
|
||||
ENV REQS_FILE='requirements_versions.txt'
|
||||
ENV venv_dir="-"
|
||||
RUN python -m pip install -r requirements_versions.txt
|
||||
ENV COMMANDLINE_ARGS="--listen --allow-code --api --administrator --no-download-sd-model --medvram --use-directml"
|
||||
CMD ["python", "-u", "launch.py", "--precision", "full", "--no-half"]
|
40
docker-llm-amd/sdwebui/README.md
Normal file
40
docker-llm-amd/sdwebui/README.md
Normal file
@ -0,0 +1,40 @@
|
||||
Below is a rough and dirty documentation of steps taken to set up stable-diffusion for my 7900 XTX. Sources cited as used.
|
||||
```
|
||||
# Per: https://github.com/ROCm/composable_kernel/discussions/1032
|
||||
# I did not read the bit where it said to execute these steps in a docker container (rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1)
|
||||
# The following is the docker run command I *would have* used:
|
||||
docker run rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1
|
||||
|
||||
git clone -b amd-stg-open https://github.com/RadeonOpenCompute/llvm-project.git #
|
||||
cd llvm-project && git checkout 1f2f539f7cab51623fad8c8a5b574eda1e81e0c0 && mkdir -p build && cd build
|
||||
cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=1 -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" ../llvm
|
||||
make -j16 # This step takes a long time. Reduce 16 to use fewer cores for the job.
|
||||
# The build errored out at ~71% with:
|
||||
# make: *** [Makefile:156: all] Error 2
|
||||
# So that was a big waste of time.
|
||||
# It was during this step that I realized we can just spin up the provided docker container to get the pre-compiled binary...
|
||||
|
||||
docker run aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
|
||||
# This is a ~20GB docker image. It takes a long time to pull.
|
||||
cd ~/AITemplate/examples/05_stable_diffusion/
|
||||
sh run_ait_sd_webui.sh
|
||||
# This gave us an error about
|
||||
# [05:55:20] model_interface.cpp:94: Error: DeviceMalloc(&result, n_bytes) API call failed:
|
||||
# no ROCm-capable device is detected at model_interface.cpp, line49
|
||||
# Per: https://www.reddit.com/r/ROCm/comments/177pwxv/how_can_i_set_up_linux_rocm_pytorch_for_7900xtx/
|
||||
# We rerun the container with some extra parameters to give it access to our GPU.
|
||||
docker run -it --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
|
||||
cd ~/AITemplate/examples/05_stable_diffusion/
|
||||
sh run_ait_sd_webui.sh
|
||||
# This got us to "Uvicorn running on http://0.0.0.0:5000"
|
||||
# Nice. Oh. We haven't exposed any ports...
|
||||
docker run -it --rm -p 5500:5000 --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
|
||||
cd ~/AITemplate/examples/05_stable_diffusion/ && sh run_ait_sd_webui.sh
|
||||
# I think :5000 on the local host is probably already in use, so we'll just dodge the possibility of a collision.
|
||||
# Uhh. Well now what? connecting to http://localhost:5500 in our browser returns a 404.
|
||||
# Maybe it's the Streamlit server that we should be looking at?
|
||||
docker run -it --rm -p 5500:5000 -p 5501:8501 --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
|
||||
cd ~/AITemplate/examples/05_stable_diffusion/ && sh run_ait_sd_webui.sh
|
||||
# Ah yeah, that did it. Now we can see the Stable Diffusion test at 5501. And we can also see our VRAM utilization increase by ~6 GB when we run the script. Now just to test the performance! Nice. We get <2s to generate an image.
|
||||
# Now can we generalize this to a more useful model?
|
||||
```
|
46
docker-llm-amd/sillytavern/config/config.yaml
Normal file
46
docker-llm-amd/sillytavern/config/config.yaml
Normal file
@ -0,0 +1,46 @@
|
||||
dataRoot: ./data
|
||||
listen: false
|
||||
port: 8000
|
||||
whitelistMode: false
|
||||
enableForwardedWhitelist: false
|
||||
whitelist:
|
||||
- 127.0.0.1
|
||||
- 172.19.0.1
|
||||
basicAuthMode: true
|
||||
basicAuthUser:
|
||||
username: joey
|
||||
password: ***REMOVED***
|
||||
enableCorsProxy: false
|
||||
enableUserAccounts: false
|
||||
enableDiscreetLogin: false
|
||||
cookieSecret: Viwb315DDUewxmznF1cX1tJiLu/TW1AK8envDePAbovByvpKdJHPI5Nrcd6mpSGOkvDYy72OqhV8NnYubFA3KQ==
|
||||
disableCsrfProtection: false
|
||||
securityOverride: false
|
||||
autorun: true
|
||||
disableThumbnails: false
|
||||
thumbnailsQuality: 95
|
||||
avatarThumbnailsPng: false
|
||||
allowKeysExposure: false
|
||||
skipContentCheck: false
|
||||
disableChatBackup: false
|
||||
whitelistImportDomains:
|
||||
- localhost
|
||||
- cdn.discordapp.com
|
||||
- files.catbox.moe
|
||||
- raw.githubusercontent.com
|
||||
requestOverrides: []
|
||||
enableExtensions: true
|
||||
extras:
|
||||
disableAutoDownload: false
|
||||
classificationModel: Cohee/distilbert-base-uncased-go-emotions-onnx
|
||||
captioningModel: Xenova/vit-gpt2-image-captioning
|
||||
embeddingModel: Cohee/jina-embeddings-v2-base-en
|
||||
promptExpansionModel: Cohee/fooocus_expansion-onnx
|
||||
speechToTextModel: Xenova/whisper-small
|
||||
textToSpeechModel: Xenova/speecht5_tts
|
||||
openai:
|
||||
randomizeUserId: false
|
||||
captionSystemPrompt: ""
|
||||
deepl:
|
||||
formality: default
|
||||
enableServerPlugins: false
|
30
docker-llm-amd/up
Executable file
30
docker-llm-amd/up
Executable file
@ -0,0 +1,30 @@
|
||||
#!/bin/bash
|
||||
initialdir=${PWD}
|
||||
cd /home/joey/Projects/LLMs/
|
||||
|
||||
wget -q --spider https://ollama-api.jafner.net
|
||||
if [ $? -eq 0 ]; then
|
||||
cd ollama
|
||||
HSA_OVERRIDE_GFX_VERSION=11.0.0
|
||||
OLLAMA_HOST=192.168.1.135:11434
|
||||
OLLAMA_ORIGINS="app://obsidian.md*"
|
||||
OLLAMA_MAX_LOADED_MODELS=0
|
||||
ollama serve &
|
||||
cd ..
|
||||
fi
|
||||
# Access ollama (API, not a web UI) at http://192.168.1.135:11434 *or* https://ollama-api.jafner.net
|
||||
|
||||
cd open-webui
|
||||
docker compose up -d
|
||||
cd ..
|
||||
# Access open-webui at http://localhost:8080 *or* https://openwebui.jafner.net
|
||||
|
||||
cd SillyTavern
|
||||
./start.sh &
|
||||
cd ..
|
||||
# Access SillyTavern at http://localhost:5000 *or* https://sillytavern.jafner.net
|
||||
|
||||
cd text-generation-webui
|
||||
./start_linux.sh --api --flash-attn &
|
||||
cd ..
|
||||
# Access text-generation-webui at http://localhost:7860 *or* https://oobabooga.jafner.net
|
Loading…
x
Reference in New Issue
Block a user