Merge docker-llm-amd into Jafner.net

This commit is contained in:
Joey Hafner 2024-07-15 14:12:50 -07:00
commit 0591fa4c6d
No known key found for this signature in database
16 changed files with 616 additions and 0 deletions

1
docker-llm-amd/.env Normal file
View File

@ -0,0 +1 @@
MODELS_DIR=~/Git/docker-llm-amd/models

2
docker-llm-amd/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
models/
ollama/modelfiles/

View File

@ -0,0 +1,25 @@
# Flash Attention in Docker on AMD is Not Yet Working
Below are my notes on the efforts I've made to get it working.
```Dockerfile
FROM rocm/pytorch-nightly:latest
COPY . .
RUN git clone --recursive https://github.com/ROCm/flash-attention.git /tmp/flash-attention
WORKDIR /tmp/flash-attention
ENV MAX_JOBS=8
RUN pip install -v .
```
# Resources
1. [What is Flash-attention? (How do i use it with Oobabooga?) :...](https://www.reddit.com/r/Oobabooga/comments/193mcv0/what_is_flashattention_how_do_i_use_it_with/)
2. [Adding flash attention to one click installer · Issue #4015 ...](https://github.com/oobabooga/text-generation-webui/issues/4015)
3. [Accelerating Large Language Models with Flash Attention on A...](https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html)
4. [GitHub - Dao-AILab/flash-attention: Fast and memory-efficien...](https://github.com/Dao-AILab/flash-attention)
5. [GitHub - ROCm/llvm-project: This is the AMD-maintained fork ...](https://github.com/ROCm/llvm-project)
6. [GitHub - ROCm/AITemplate: AITemplate is a Python framework w...](https://github.com/ROCm/AITemplate)
7. [Stable diffusion with RX7900XTX on ROCm5.7 · ROCm/composable...](https://github.com/ROCm/composable_kernel/discussions/1032#522-build-ait-and-stable-diffusion-demo)
8. [Current state of training on AMD Radeon 7900 XTX (with bench...](https://www.reddit.com/r/LocalLLaMA/comments/1atvxu2/current_state_of_training_on_amd_radeon_7900_xtx/) [[Current state of training on AMD Radeon 7900 XTX (with benchmarks) rLocalLLaMA]]
9. [llm-tracker - howto/AMD GPUs](https://llm-tracker.info/howto/AMD-GPUs)
10. [RDNA3 support · Issue #27 · ROCm/flash-attention · GitHub](https://github.com/ROCm/flash-attention/issues/27)
11. [GitHub - ROCm/xformers: Hackable and optimized Transformers ...](https://github.com/ROCm/xformers/tree/develop)
12. [\[ROCm\] support Radeon™ 7900 series (gfx1100) without using...](https://github.com/vllm-project/vllm/pull/2768)

63
docker-llm-amd/README.md Normal file
View File

@ -0,0 +1,63 @@
### What we have so far
1. [Ollama](https://github.com/ollama/ollama) loads and serves a few models via API.
- Ollama itself doesn't have a UI. CLI and API only.
- The API can be accessed at [`https://api.ollama.jafner.net`](https://api.ollama.jafner.net).
- Ollama running as configured supports ROCm (GPU acceleration).
- Configured models are described [here](/ollama/modelfiles/), and
- Run Ollama with: `HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0 ollama serve`
2. [Open-webui](https://github.com/open-webui/open-webui) provides a pretty web interface for interacting with Ollama.
- The web UI can be accessed at [`https://ollama.jafner.net`](https://ollama.jafner.net).
- The web UI is protected by Traefik's `lan-only` rule, as well as its own authentication layer.
- Run open-webui with: `cd ~/Projects/LLMs/open-webui && docker compose up -d && docker compose logs -f`
- Then open [the page](https://ollama.jafner.net) and log in.
- Connect the frontend to the ollama instance by opening the settings (top-right), clicking "Connections", and setting "Ollama Base URL" to "https://api.ollama.jafner.net". Hit refresh and begin using.
3. [SillyTavern](https://github.com/SillyTavern/SillyTavern) provides a powerful interface for building and using characters.
- Run SillyTavern with: `cd ~/Projects/LLMs/SillyTavern && ./start.sh`
4. [Oobabooga](https://github.com/oobabooga/text-generation-webui) provides a more powerful web UI than open-webui, but it's less pretty.
- Run Oobabooga with: `cd ~/Projects/LLMs/text-generation-webui && ./start_linux.sh`
- Requires the following environment variables be set in `one_click.py` (right after import statements):
```
os.environ["ROCM_PATH"] = '/opt/rocm'
os.environ["HSA_OVERRIDE_GFX_VERSION"] = '11.0.0'
os.environ["HCC_AMDGPU_TARGET"] = 'gfx1100'
os.environ["PATH"] = '/opt/rocm/bin:$PATH'
os.environ["LD_LIBRARY_PATH"] = '/opt/rocm/lib:$LD_LIBRARY_PATH'
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
os.environ["HCC_SERIALIZE_KERNEL"] = '0x3'
os.environ["HCC_SERIALIZE_KERNEL"]='0x3'
os.environ["HCC_SERIALIZE_COPY"]='0x3'
os.environ["HIP_TRACE_API"]='0x2'
os.environ["HF_TOKEN"]='<my-huggingface-token>'
```
- Requires the following environment variable be set in `start_linux.sh` for access to non-public model downloads:
```
# config
HF_TOKEN="<my-huggingface-token>"
```
That's where we're at.
### Set Up Models Directory
1. Navigate to the source directory with all models: `cd "~/Nextcloud/Large Language Models/GGUF/"`
2. Symlink each file into the docker project's models directory: `for model in ./*; do ln $(realpath $model) $(realpath ~/Git/docker-llm-amd/models/$model); done`
- Note that the symlinks must be hardlinks or they will not be passed properly into containers.
3. Launch ollama: `docker compose up -d ollama`
4. Create models defined by the modelfiles: `docker compose exec -dit ollama /modelfiles/.loadmodels.sh`
### Roadmap
- Set up StableDiffusion-web-UI.
- Get characters in SillyTavern behaving as expected.
- Repetition issues.
- Obsession with certain parts of prompt.
- Refusals.
- Set up something for character voices.
- [Coqui TTS - Docker install](https://github.com/coqui-ai/TTS/tree/dev?tab=readme-ov-file#docker-image).
- [TTS Generation Web UI](https://github.com/rsxdalv/tts-generation-webui).
- Set up Extras for SillyTavern.
### Notes
- So many of these projects use Python with its various version and dependencies and shit.
- *Always* use a Docker container or virtual environment.
- It's like a condom.

View File

@ -0,0 +1,142 @@
# Addresses:
# ollama :11434
# open-webui :3000
# sillytavern :8000
# sdwebui :7868
# oobabooga :7860 :5010
# exui :5030
version: '3'
name: 'ai'
services:
ollama:
container_name: ai_ollama
image: ollama/ollama:rocm
networks:
- ai
privileged: false
group_add:
- video
ports:
- 11434:11434
devices:
- /dev/kfd
- /dev/dri
volumes:
- ./ollama/modelfiles:/modelfiles
- $MODELS_DIR:/models
- ollama-model-storage:/root/.ollama/models/blobs
environment:
- OLLAMA_ORIGINS="app://obsidian.md*"
- OLLAMA_MAX_LOADED_MODELS=0
open-webui:
container_name: ai_open-webui
image: ghcr.io/open-webui/open-webui:main
ports:
- 3000:8080
networks:
- ai
volumes:
- open-webui:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
sillytavern:
container_name: ai_sillytavern
image: ghcr.io/sillytavern/sillytavern:staging
networks:
- ai
privileged: false
ports:
- 8000:8000/tcp
volumes:
- ./sillytavern/config/config.yaml:/home/node/app/config/config.yaml
environment:
- TZ=America/Los_Angeles
sdwebui:
container_name: ai_sdwebui
build:
context: ./sdwebui
networks:
- ai
privileged: false
group_add:
- video
ports:
- 7868:7860
devices:
- /dev/kfd
- /dev/dri
volumes:
- ./models_t2i:/dockerx/stable-diffusion-webui-amdgpu/models
- ./sdwebui/images:/images
- sdwebui_cache:/dockerx/stable-diffusion-webui-amdgpu/models/ONNX
deploy:
resources:
limits:
memory: 16G
oobabooga:
container_name: ai_oobabooga
image: atinoda/text-generation-webui:base-rocm
environment:
- EXTRA_LAUNCH_ARGS="--listen --verbose --chat-buttons --use_flash_attention_2 --flash-attn --api --extensions openai"
stdin_open: true
tty: true
networks:
- ai
ipc: host
group_add:
- video
cap_add:
- SYS_PTRACE
security_opt:
- seccomp=unconfined
ports:
- 7860:7860
- 5010:5000
devices:
- /dev/kfd
- /dev/dri
volumes:
- $MODELS_DIR:/app/models
- oobabooga_cache:/root/.cache
- ./oobabooga/characters:/app/characters
- ./oobabooga/instruction-templates:/app/instruction-templates
- ./oobabooga/loras:/app/loras
- ./oobabooga/presets:/app/presets
- ./oobabooga/prompts:/app/prompts
- ./oobabooga/training:/app/training
exui:
container_name: ai_exui
build:
context: ./exl2
networks:
- ai
privileged: false
group_add:
- video
ports:
- 5030:5000
devices:
- /dev/kfd
- /dev/dri
volumes:
- $MODELS_DIR:/models
volumes:
ollama-model-storage:
open-webui:
sdwebui_cache:
oobabooga:
oobabooga_cache:
networks:
ai:
name: "ai"
ipam:
driver: default
config:
- subnet: 172.20.0.0/16

View File

@ -0,0 +1,11 @@
FROM python:3.10-bookworm
RUN apt update && \
apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
RUN git clone https://github.com/turboderp/exui
WORKDIR /usr/src/app/exui
EXPOSE 5000
CMD ["python", "-u", "server.py", "--host", "0.0.0.0:5000"]

View File

@ -0,0 +1,45 @@
blinker==1.8.2
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cramjam==2.8.3
exllamav2 @ https://github.com/turboderp/exllamav2/releases/download/v0.0.21/exllamav2-0.0.21+rocm6.0-cp310-cp310-linux_x86_64.whl
fastparquet==2024.5.0
filelock==3.13.1
Flask==3.0.3
fsspec==2024.2.0
huggingface-hub==0.23.1
idna==3.7
itsdangerous==2.2.0
Jinja2==3.1.3
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.2.1
ninja==1.11.1.1
numpy==1.26.3
packaging==24.0
pandas==2.2.2
pillow==10.2.0
Pygments==2.18.0
pynvml==11.5.0
python-dateutil==2.9.0.post0
pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-2.3.0-cp310-cp310-linux_x86_64.whl
pytz==2024.1
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.2
safetensors==0.4.3
sentencepiece==0.2.0
six==1.16.0
sympy==1.12
tokenizers==0.19.1
torch @ https://download.pytorch.org/whl/rocm6.0/torch-2.3.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
torchaudio @ https://download.pytorch.org/whl/rocm6.0/torchaudio-2.3.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
torchvision @ https://download.pytorch.org/whl/rocm6.0/torchvision-0.18.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
tqdm==4.66.4
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.2.1
waitress==3.0.0
websockets==12.0
Werkzeug==3.0.3

View File

@ -0,0 +1,90 @@
# Ollama Notes
Per: [Ollama/Ollama README](https://github.com/ollama/ollama)
## Install Steps
Per: [linux.md](https://github.com/ollama/ollama/blob/main/docs/linux.md)
1. Download the binary: `sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama`
2. Make the binary executable: `sudo chmod +x /usr/bin/ollama`
3. Create a user for ollama: `sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama`
4. Create a SystemD service file for ollama: `sudo nano /etc/systemd/system/ollama.service` and populate it with the following.
```ini
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
Environment='HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0'
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
```
5. Register and enable the ollama service: `sudo systemctl daemon-reload && sudo systemctl enable ollama`
6. Start ollama: `sudo systemctl start ollama`
### Enable ROCm Support
Per: [anvesh.jhuboo on Medium](https://medium.com/@anvesh.jhuboo/rocm-pytorch-on-fedora-51224563e5be)
1. Add user to `video` group to allow access to GPU resources: `sudo usermod -aG video $LOGNAME`
2. Install `rocminfo` package: `sudo dnf install rocminfo`
3. Check for rocm support: `rocminfo`
4. Install `rocm-opencl` package: `sudo dnf install rocm-opencl`
5. Install `rocm-clinfo` package: `sudo dnf install rocm-clinfo`
6. Verify opencl is working: `rocm-clinfo`
7. Get the GFX version of your GPU: `rocminfo | grep gfx | head -n 1 | tr -s ' ' | cut -d' ' -f 3`
- The GFX version given is a stripped version number.
- My Radeon 7900 XTX has a gfx string of `gfx1100`, which correlates with HSA GFX version 11.0.0.
- Other cards commonly have a string of `gfx1030`, which correlates with HSA GFS version 10.3.0.
- There's a little bit more info [here](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html).
8. Export your gfx version in `~/.bashrc`: `echo "export HSA_OVERRIDE_GFX_VERSION=11.0.0" >> ~/.bashrc && source ~/.bashrc`
- Is this even part of the same thing? I ran `sudo dnf install https://repo.radeon.com/amdgpu-install/6.0.2/rhel/9.3/amdgpu-install-6.0.60002-1.el9.noarch.rpm`
- Maybe this is the right place to look? [Fedora wiki - AMD ROCm](https://fedoraproject.org/wiki/SIGs/HC)
## Run Ollama
1. Test Ollama is working: `ollama run gemma:2b`
- Runs (downloads) the smallest model in [Ollama's library](https://ollama.com/library).
2. Run as a docker container: `docker run -d --device /dev/kfd --device /dev/dri -v /usr/lib64:/opt/lib64:ro -e HIP_PATH=/opt/lib64/rocm -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama && docker logs -f ollama`
## Update Ollama
1. Redownload the binary: `sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama`
2. Make the binary executable: `sudo chmod +x /usr/bin/ollama`
## Create Model from Modelfile
`ollama create <model name> -f <modelfile relative path>`
Where the modelfile is like:
```
# Choose either a model tag to download from ollama.com/library, or a path to a local model file (relative to the path of the modelfile).
FROM ../Models/codellama-7b.Q8_0.gguf
# set the chatml template passed to the model
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are a senior devops engineer, acting as an assistant. You offer help with cloud technologies like: Terraform, AWS, kubernetes, python. You answer with code examples when possible
"""
# not sure what this does lol
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
```
## Unload a Model
There's no official support for this in the `ollama` CLI, but we can make it happen with the API:
`curl https://api.ollama.jafner.net/api/generate -d '{"model": "<MODEL TO UNLOAD>", "keep_alive": 0}'`

View File

@ -0,0 +1,6 @@
#!/bin/bash
for modelfile in /modelfiles/*; do
echo -n "Running: '"
echo "ollama create \"$(basename $modelfile)\" -f \"$modelfile\"'"
ollama create "$(basename $modelfile)" -f "$modelfile"
done

View File

@ -0,0 +1,22 @@
#!/bin/bash
# THIS SCRIPT DOES NOT WORK RIGHT NOW
# The script is fine, it just needs the modelfiles to be written with reference
# to the models folder relative to the host system, rather than inside the
# container. We're using ./modelfiles/.loadmodels.sh instead right now.
modelfiles="$(ls ./modelfiles/)"
models="$(ollama list | tr -s ' ' | cut -f 1 | tail -n +2)"
for model in $(echo "$models"); do
if ! [[ $modelfiles == *"$model"* ]]; then
echo -n "Running: '"
echo "ollama rm \"$model\"'"
ollama rm "$model"
fi
done
cd ./modelfiles
for modelfile in ./*; do
echo -n "Running: '"
echo "ollama create \"$(basename $modelfile)\" -f \"$modelfile\"'"
ollama create "$(basename $modelfile)" -f "$modelfile"
done

View File

@ -0,0 +1,10 @@
TORCH_CUDA_ARCH_LIST=7.5
HOST_PORT=7860
CONTAINER_PORT=7860
HOST_API_PORT=5020
CONTAINER_API_PORT=5000
BUILD_EXTENSIONS=""
APP_RUNTIME_GID=1000
APP_GID=1000
APP_UID=1000
HF_HOME=/home/app/text-generation-webui/cache/

View File

@ -0,0 +1,66 @@
# Cloned from: https://github.com/Atinoda/text-generation-webui-docker/blob/master/Dockerfile
# Modified to install Flash-Attention-2 for AMD ROCm.
# Install instructions for FA2 are based on:
# https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html#using-pytorch-upstream-docker-image
# and:
# https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html
# Also trimmed original comments and replaced with new.
# Base build layer
FROM ubuntu:22.04 AS app_base
RUN apt-get update && apt-get install --no-install-recommends -y \
git vim build-essential python3-dev python3-venv python3-pip
RUN pip3 install virtualenv
RUN virtualenv /venv
ENV VIRTUAL_ENV=/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip3 install --upgrade pip setuptools
COPY ./scripts /scripts
RUN chmod +x /scripts/*
RUN git clone https://github.com/oobabooga/text-generation-webui /src
ARG VERSION_TAG
ENV VERSION_TAG=${VERSION_TAG}
RUN . /scripts/checkout_src_version.sh
RUN cp -ar /src /app
# AMD build layer
FROM app_base AS app_rocm
RUN pip3 install --pre torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/nightly/rocm6.1
RUN pip3 install -r /app/requirements_amd.txt
RUN git clone --recursive https://github.com/ROCm/flash-attention.git /src-fa
RUN cd /src-fa && MAX_JOBS=$((`nproc` / 2)) pip install -v .
FROM app_rocm AS app_rocm_x
RUN chmod +x /scripts/build_extensions.sh && \
. /scripts/build_extensions.sh
# Base run layer
FROM ubuntu:22.04 AS run_base
RUN apt-get update && apt-get install --no-install-recommends -y \
python3-venv python3-dev git
COPY --from=app_base /app /app
COPY --from=app_base /src /src
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
WORKDIR /app
EXPOSE 7860
EXPOSE 5000
EXPOSE 5005
ENV PYTHONUNBUFFERED=1
ARG BUILD_DATE
ENV BUILD_DATE=$BUILD_DATE
RUN echo "$BUILD_DATE" > /build_date.txt
ARG VERSION_TAG
ENV VERSION_TAG=$VERSION_TAG
RUN echo "$VERSION_TAG" > /version_tag.txt
COPY ./scripts /scripts
RUN chmod +x /scripts/*
ENTRYPOINT ["/scripts/docker-entrypoint.sh"]
# AMD run layer
FROM run_base AS default-rocm
COPY --from=app_rocm_x $VIRTUAL_ENV $VIRTUAL_ENV
RUN echo "ROCM Extended" > /variant.txt
ENV EXTRA_LAUNCH_ARGS=""
CMD ["python3", "/app/server.py"]

View File

@ -0,0 +1,17 @@
FROM rocm/pytorch:latest
WORKDIR /dockerx
RUN git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu
WORKDIR /dockerx/stable-diffusion-webui-amdgpu
RUN python -m pip install clip open-clip-torch onnxruntime-training xformers
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui-assets.git repositories/stable-diffusion-webui-assets
RUN git clone https://github.com/Stability-AI/stablediffusion.git repositories/stable-diffusion-stability-ai
RUN git clone https://github.com/Stability-AI/generative-models.git repositories/generative-models
RUN git clone https://github.com/crowsonkb/k-diffusion.git repositories/k-diffusion
RUN git clone https://github.com/salesforce/BLIP.git repositories/BLIP
RUN python -m pip install --upgrade pip wheel
ENV REQS_FILE='requirements_versions.txt'
ENV venv_dir="-"
RUN python -m pip install -r requirements_versions.txt
ENV COMMANDLINE_ARGS="--listen --allow-code --api --administrator --no-download-sd-model --medvram --use-directml"
CMD ["python", "-u", "launch.py", "--precision", "full", "--no-half"]

View File

@ -0,0 +1,40 @@
Below is a rough and dirty documentation of steps taken to set up stable-diffusion for my 7900 XTX. Sources cited as used.
```
# Per: https://github.com/ROCm/composable_kernel/discussions/1032
# I did not read the bit where it said to execute these steps in a docker container (rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1)
# The following is the docker run command I *would have* used:
docker run rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1
git clone -b amd-stg-open https://github.com/RadeonOpenCompute/llvm-project.git #
cd llvm-project && git checkout 1f2f539f7cab51623fad8c8a5b574eda1e81e0c0 && mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=1 -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" ../llvm
make -j16 # This step takes a long time. Reduce 16 to use fewer cores for the job.
# The build errored out at ~71% with:
# make: *** [Makefile:156: all] Error 2
# So that was a big waste of time.
# It was during this step that I realized we can just spin up the provided docker container to get the pre-compiled binary...
docker run aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
# This is a ~20GB docker image. It takes a long time to pull.
cd ~/AITemplate/examples/05_stable_diffusion/
sh run_ait_sd_webui.sh
# This gave us an error about
# [05:55:20] model_interface.cpp:94: Error: DeviceMalloc(&result, n_bytes) API call failed:
# no ROCm-capable device is detected at model_interface.cpp, line49
# Per: https://www.reddit.com/r/ROCm/comments/177pwxv/how_can_i_set_up_linux_rocm_pytorch_for_7900xtx/
# We rerun the container with some extra parameters to give it access to our GPU.
docker run -it --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
cd ~/AITemplate/examples/05_stable_diffusion/
sh run_ait_sd_webui.sh
# This got us to "Uvicorn running on http://0.0.0.0:5000"
# Nice. Oh. We haven't exposed any ports...
docker run -it --rm -p 5500:5000 --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
cd ~/AITemplate/examples/05_stable_diffusion/ && sh run_ait_sd_webui.sh
# I think :5000 on the local host is probably already in use, so we'll just dodge the possibility of a collision.
# Uhh. Well now what? connecting to http://localhost:5500 in our browser returns a 404.
# Maybe it's the Streamlit server that we should be looking at?
docker run -it --rm -p 5500:5000 -p 5501:8501 --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
cd ~/AITemplate/examples/05_stable_diffusion/ && sh run_ait_sd_webui.sh
# Ah yeah, that did it. Now we can see the Stable Diffusion test at 5501. And we can also see our VRAM utilization increase by ~6 GB when we run the script. Now just to test the performance! Nice. We get <2s to generate an image.
# Now can we generalize this to a more useful model?
```

View File

@ -0,0 +1,46 @@
dataRoot: ./data
listen: false
port: 8000
whitelistMode: false
enableForwardedWhitelist: false
whitelist:
- 127.0.0.1
- 172.19.0.1
basicAuthMode: true
basicAuthUser:
username: joey
password: ***REMOVED***
enableCorsProxy: false
enableUserAccounts: false
enableDiscreetLogin: false
cookieSecret: Viwb315DDUewxmznF1cX1tJiLu/TW1AK8envDePAbovByvpKdJHPI5Nrcd6mpSGOkvDYy72OqhV8NnYubFA3KQ==
disableCsrfProtection: false
securityOverride: false
autorun: true
disableThumbnails: false
thumbnailsQuality: 95
avatarThumbnailsPng: false
allowKeysExposure: false
skipContentCheck: false
disableChatBackup: false
whitelistImportDomains:
- localhost
- cdn.discordapp.com
- files.catbox.moe
- raw.githubusercontent.com
requestOverrides: []
enableExtensions: true
extras:
disableAutoDownload: false
classificationModel: Cohee/distilbert-base-uncased-go-emotions-onnx
captioningModel: Xenova/vit-gpt2-image-captioning
embeddingModel: Cohee/jina-embeddings-v2-base-en
promptExpansionModel: Cohee/fooocus_expansion-onnx
speechToTextModel: Xenova/whisper-small
textToSpeechModel: Xenova/speecht5_tts
openai:
randomizeUserId: false
captionSystemPrompt: ""
deepl:
formality: default
enableServerPlugins: false

30
docker-llm-amd/up Executable file
View File

@ -0,0 +1,30 @@
#!/bin/bash
initialdir=${PWD}
cd /home/joey/Projects/LLMs/
wget -q --spider https://ollama-api.jafner.net
if [ $? -eq 0 ]; then
cd ollama
HSA_OVERRIDE_GFX_VERSION=11.0.0
OLLAMA_HOST=192.168.1.135:11434
OLLAMA_ORIGINS="app://obsidian.md*"
OLLAMA_MAX_LOADED_MODELS=0
ollama serve &
cd ..
fi
# Access ollama (API, not a web UI) at http://192.168.1.135:11434 *or* https://ollama-api.jafner.net
cd open-webui
docker compose up -d
cd ..
# Access open-webui at http://localhost:8080 *or* https://openwebui.jafner.net
cd SillyTavern
./start.sh &
cd ..
# Access SillyTavern at http://localhost:5000 *or* https://sillytavern.jafner.net
cd text-generation-webui
./start_linux.sh --api --flash-attn &
cd ..
# Access text-generation-webui at http://localhost:7860 *or* https://oobabooga.jafner.net