Merge docker-llm-amd into Jafner.net

2024-07-15 14:12:50 -07:00 · 2024-07-15 14:12:50 -07:00 · 0591fa4c6d
commit 0591fa4c6d
parent 64f00b73e5 6084f881a3
16 changed files with 616 additions and 0 deletions
--- a/docker-llm-amd/.env
+++ b/docker-llm-amd/.env
@ -0,0 +1 @@
+MODELS_DIR=~/Git/docker-llm-amd/models
--- a/docker-llm-amd/.gitignore
+++ b/docker-llm-amd/.gitignore
@ -0,0 +1,2 @@
+models/
+ollama/modelfiles/
--- a/docker-llm-amd/FLASH-ATTENTION.md
+++ b/docker-llm-amd/FLASH-ATTENTION.md
@ -0,0 +1,25 @@
+# Flash Attention in Docker on AMD is Not Yet Working
+Below are my notes on the efforts I've made to get it working.
+
+```Dockerfile
+FROM rocm/pytorch-nightly:latest
+COPY . .
+RUN git clone --recursive https://github.com/ROCm/flash-attention.git /tmp/flash-attention
+WORKDIR /tmp/flash-attention
+ENV MAX_JOBS=8
+RUN pip install -v .
+```
+
+# Resources
+1. [What is Flash-attention? (How do i use it with Oobabooga?) :...](https://www.reddit.com/r/Oobabooga/comments/193mcv0/what_is_flashattention_how_do_i_use_it_with/)
+2. [Adding flash attention to one click installer · Issue #4015 ...](https://github.com/oobabooga/text-generation-webui/issues/4015)
+3. [Accelerating Large Language Models with Flash Attention on A...](https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html)
+4. [GitHub - Dao-AILab/flash-attention: Fast and memory-efficien...](https://github.com/Dao-AILab/flash-attention)
+5. [GitHub - ROCm/llvm-project: This is the AMD-maintained fork ...](https://github.com/ROCm/llvm-project)
+6. [GitHub - ROCm/AITemplate: AITemplate is a Python framework w...](https://github.com/ROCm/AITemplate)
+7. [Stable diffusion with RX7900XTX on ROCm5.7 · ROCm/composable...](https://github.com/ROCm/composable_kernel/discussions/1032#522-build-ait-and-stable-diffusion-demo)
+8. [Current state of training on AMD Radeon 7900 XTX (with bench...](https://www.reddit.com/r/LocalLLaMA/comments/1atvxu2/current_state_of_training_on_amd_radeon_7900_xtx/) [[Current state of training on AMD Radeon 7900 XTX (with benchmarks)  rLocalLLaMA]]
+9. [llm-tracker - howto/AMD GPUs](https://llm-tracker.info/howto/AMD-GPUs)
+10. [RDNA3 support · Issue #27 · ROCm/flash-attention · GitHub](https://github.com/ROCm/flash-attention/issues/27)
+11. [GitHub - ROCm/xformers: Hackable and optimized Transformers ...](https://github.com/ROCm/xformers/tree/develop)
+12. [\[ROCm\] support Radeon™ 7900 series (gfx1100) without using...](https://github.com/vllm-project/vllm/pull/2768)
--- a/docker-llm-amd/README.md
+++ b/docker-llm-amd/README.md
@ -0,0 +1,63 @@
+### What we have so far
+
+1. [Ollama](https://github.com/ollama/ollama) loads and serves a few models via API. 
+    - Ollama itself doesn't have a UI. CLI and API only.
+    - The API can be accessed at [`https://api.ollama.jafner.net`](https://api.ollama.jafner.net).
+    - Ollama running as configured supports ROCm (GPU acceleration).
+    - Configured models are described [here](/ollama/modelfiles/), and 
+    - Run Ollama with: `HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0 ollama serve`
+2. [Open-webui](https://github.com/open-webui/open-webui) provides a pretty web interface for interacting with Ollama. 
+    - The web UI can be accessed at [`https://ollama.jafner.net`](https://ollama.jafner.net).
+    - The web UI is protected by Traefik's `lan-only` rule, as well as its own authentication layer.
+    - Run open-webui with: `cd ~/Projects/LLMs/open-webui && docker compose up -d && docker compose logs -f`
+        - Then open [the page](https://ollama.jafner.net) and log in.
+        - Connect the frontend to the ollama instance by opening the settings (top-right), clicking "Connections", and setting "Ollama Base URL" to "https://api.ollama.jafner.net". Hit refresh and begin using.
+3. [SillyTavern](https://github.com/SillyTavern/SillyTavern) provides a powerful interface for building and using characters.
+    - Run SillyTavern with: `cd ~/Projects/LLMs/SillyTavern && ./start.sh`
+4. [Oobabooga](https://github.com/oobabooga/text-generation-webui) provides a more powerful web UI than open-webui, but it's less pretty. 
+    - Run Oobabooga with: `cd ~/Projects/LLMs/text-generation-webui && ./start_linux.sh`
+    - Requires the following environment variables be set in `one_click.py` (right after import statements):
+```
+os.environ["ROCM_PATH"] = '/opt/rocm'
+os.environ["HSA_OVERRIDE_GFX_VERSION"] = '11.0.0'
+os.environ["HCC_AMDGPU_TARGET"] = 'gfx1100'
+os.environ["PATH"] = '/opt/rocm/bin:$PATH'
+os.environ["LD_LIBRARY_PATH"] = '/opt/rocm/lib:$LD_LIBRARY_PATH'
+os.environ["CUDA_VISIBLE_DEVICES"] = '0'
+os.environ["HCC_SERIALIZE_KERNEL"] = '0x3'
+os.environ["HCC_SERIALIZE_KERNEL"]='0x3'
+os.environ["HCC_SERIALIZE_COPY"]='0x3'
+os.environ["HIP_TRACE_API"]='0x2' 
+os.environ["HF_TOKEN"]='<my-huggingface-token>'
+```
+    - Requires the following environment variable be set in `start_linux.sh` for access to non-public model downloads:
+```
+# config
+HF_TOKEN="<my-huggingface-token>"
+```
+
+That's where we're at. 
+
+### Set Up Models Directory
+1. Navigate to the source directory with all models: `cd "~/Nextcloud/Large Language Models/GGUF/"`
+2. Symlink each file into the docker project's models directory: `for model in ./*; do ln $(realpath $model) $(realpath ~/Git/docker-llm-amd/models/$model); done`
+   - Note that the symlinks must be hardlinks or they will not be passed properly into containers.
+3. Launch ollama: `docker compose up -d ollama`
+4. Create models defined by the modelfiles: `docker compose exec -dit ollama /modelfiles/.loadmodels.sh`
+
+### Roadmap
+- Set up StableDiffusion-web-UI.
+- Get characters in SillyTavern behaving as expected.
+    - Repetition issues.
+    - Obsession with certain parts of prompt.
+    - Refusals.
+- Set up something for character voices. 
+    - [Coqui TTS - Docker install](https://github.com/coqui-ai/TTS/tree/dev?tab=readme-ov-file#docker-image).
+    - [TTS Generation Web UI](https://github.com/rsxdalv/tts-generation-webui).
+
+- Set up Extras for SillyTavern.
+
+### Notes
+- So many of these projects use Python with its various version and dependencies and shit. 
+    - *Always* use a Docker container or virtual environment.
+    - It's like a condom.
--- a/docker-llm-amd/docker-compose.yml
+++ b/docker-llm-amd/docker-compose.yml
@ -0,0 +1,142 @@
+# Addresses:
+# ollama :11434
+# open-webui :3000
+# sillytavern :8000
+# sdwebui :7868
+# oobabooga :7860 :5010
+# exui :5030
+
+version: '3'
+name: 'ai'
+services:
+  ollama:
+    container_name: ai_ollama
+    image: ollama/ollama:rocm
+    networks:
+      - ai
+    privileged: false
+    group_add:
+      - video
+    ports:
+      - 11434:11434
+    devices:
+      - /dev/kfd
+      - /dev/dri
+    volumes:
+      - ./ollama/modelfiles:/modelfiles 
+      - $MODELS_DIR:/models
+      - ollama-model-storage:/root/.ollama/models/blobs
+    environment:
+      - OLLAMA_ORIGINS="app://obsidian.md*"
+      - OLLAMA_MAX_LOADED_MODELS=0
+
+  open-webui:
+    container_name: ai_open-webui
+    image: ghcr.io/open-webui/open-webui:main
+    ports:
+      - 3000:8080
+    networks:
+      - ai
+    volumes:
+      - open-webui:/app/backend/data
+    environment:
+      - OLLAMA_BASE_URL=http://ollama:11434
+
+  sillytavern:
+    container_name: ai_sillytavern
+    image: ghcr.io/sillytavern/sillytavern:staging
+    networks:
+      - ai
+    privileged: false
+    ports:
+      - 8000:8000/tcp
+    volumes:
+      - ./sillytavern/config/config.yaml:/home/node/app/config/config.yaml
+    environment:
+      - TZ=America/Los_Angeles
+
+  sdwebui:
+    container_name: ai_sdwebui
+    build: 
+      context: ./sdwebui
+    networks:
+      - ai
+    privileged: false
+    group_add:
+      - video
+    ports:
+      - 7868:7860
+    devices:
+      - /dev/kfd
+      - /dev/dri
+    volumes:
+      - ./models_t2i:/dockerx/stable-diffusion-webui-amdgpu/models
+      - ./sdwebui/images:/images
+      - sdwebui_cache:/dockerx/stable-diffusion-webui-amdgpu/models/ONNX
+    deploy:
+      resources:
+        limits:
+          memory: 16G
+
+  oobabooga:
+    container_name: ai_oobabooga
+    image: atinoda/text-generation-webui:base-rocm
+    environment:
+      - EXTRA_LAUNCH_ARGS="--listen --verbose --chat-buttons --use_flash_attention_2 --flash-attn --api --extensions openai"
+    stdin_open: true
+    tty: true
+    networks:
+      - ai
+    ipc: host
+    group_add:
+      - video
+    cap_add:
+      - SYS_PTRACE
+    security_opt:
+      - seccomp=unconfined
+    ports:
+      - 7860:7860
+      - 5010:5000
+    devices:
+      - /dev/kfd
+      - /dev/dri
+    volumes:
+      - $MODELS_DIR:/app/models
+      - oobabooga_cache:/root/.cache
+      - ./oobabooga/characters:/app/characters
+      - ./oobabooga/instruction-templates:/app/instruction-templates
+      - ./oobabooga/loras:/app/loras
+      - ./oobabooga/presets:/app/presets
+      - ./oobabooga/prompts:/app/prompts
+      - ./oobabooga/training:/app/training
+  
+  exui:
+    container_name: ai_exui
+    build: 
+      context: ./exl2
+    networks:
+      - ai
+    privileged: false
+    group_add:
+      - video
+    ports:
+      - 5030:5000
+    devices:
+      - /dev/kfd
+      - /dev/dri
+    volumes:
+      - $MODELS_DIR:/models
+
+volumes:
+  ollama-model-storage:
+  open-webui:
+  sdwebui_cache:
+  oobabooga:
+  oobabooga_cache:
+networks:
+  ai:
+    name: "ai"
+    ipam:
+      driver: default
+      config: 
+       - subnet: 172.20.0.0/16
--- a/docker-llm-amd/exui/Dockerfile
+++ b/docker-llm-amd/exui/Dockerfile
@ -0,0 +1,11 @@
+FROM python:3.10-bookworm
+RUN apt update && \
+    apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
+    rm -rf /var/lib/apt/lists/*
+WORKDIR /usr/src/app
+COPY requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+RUN git clone https://github.com/turboderp/exui
+WORKDIR /usr/src/app/exui
+EXPOSE 5000
+CMD ["python", "-u", "server.py", "--host", "0.0.0.0:5000"]
--- a/docker-llm-amd/exui/requirements.txt
+++ b/docker-llm-amd/exui/requirements.txt
@ -0,0 +1,45 @@
+blinker==1.8.2
+certifi==2024.2.2
+charset-normalizer==3.3.2
+click==8.1.7
+cramjam==2.8.3
+exllamav2 @ https://github.com/turboderp/exllamav2/releases/download/v0.0.21/exllamav2-0.0.21+rocm6.0-cp310-cp310-linux_x86_64.whl
+fastparquet==2024.5.0
+filelock==3.13.1
+Flask==3.0.3
+fsspec==2024.2.0
+huggingface-hub==0.23.1
+idna==3.7
+itsdangerous==2.2.0
+Jinja2==3.1.3
+MarkupSafe==2.1.5
+mpmath==1.3.0
+networkx==3.2.1
+ninja==1.11.1.1
+numpy==1.26.3
+packaging==24.0
+pandas==2.2.2
+pillow==10.2.0
+Pygments==2.18.0
+pynvml==11.5.0
+python-dateutil==2.9.0.post0
+pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-2.3.0-cp310-cp310-linux_x86_64.whl
+pytz==2024.1
+PyYAML==6.0.1
+regex==2024.5.15
+requests==2.32.2
+safetensors==0.4.3
+sentencepiece==0.2.0
+six==1.16.0
+sympy==1.12
+tokenizers==0.19.1
+torch @ https://download.pytorch.org/whl/rocm6.0/torch-2.3.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
+torchaudio @ https://download.pytorch.org/whl/rocm6.0/torchaudio-2.3.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
+torchvision @ https://download.pytorch.org/whl/rocm6.0/torchvision-0.18.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl
+tqdm==4.66.4
+typing_extensions==4.9.0
+tzdata==2024.1
+urllib3==2.2.1
+waitress==3.0.0
+websockets==12.0
+Werkzeug==3.0.3
--- a/docker-llm-amd/ollama/README.md
+++ b/docker-llm-amd/ollama/README.md
@ -0,0 +1,90 @@
+# Ollama Notes
+Per: [Ollama/Ollama README](https://github.com/ollama/ollama)
+
+## Install Steps
+Per: [linux.md](https://github.com/ollama/ollama/blob/main/docs/linux.md)
+
+1. Download the binary: `sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama`
+2. Make the binary executable: `sudo chmod +x /usr/bin/ollama`
+3. Create a user for ollama: `sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama`
+4. Create a SystemD service file for ollama: `sudo nano /etc/systemd/system/ollama.service` and populate it with the following.
+```ini
+[Unit]
+Description=Ollama Service
+After=network-online.target
+
+[Service]
+Environment='HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0'
+ExecStart=/usr/bin/ollama serve
+User=ollama
+Group=ollama
+Restart=always
+RestartSec=3
+
+[Install]
+WantedBy=default.target
+```
+5. Register and enable the ollama service: `sudo systemctl daemon-reload && sudo systemctl enable ollama`
+6. Start ollama: `sudo systemctl start ollama`
+
+### Enable ROCm Support
+Per: [anvesh.jhuboo on Medium](https://medium.com/@anvesh.jhuboo/rocm-pytorch-on-fedora-51224563e5be)
+
+1. Add user to `video` group to allow access to GPU resources: `sudo usermod -aG video $LOGNAME`
+2. Install `rocminfo` package: `sudo dnf install rocminfo`
+3. Check for rocm support: `rocminfo`
+4. Install `rocm-opencl` package: `sudo dnf install rocm-opencl`
+5. Install `rocm-clinfo` package: `sudo dnf install rocm-clinfo`
+6. Verify opencl is working: `rocm-clinfo`
+7. Get the GFX version of your GPU: `rocminfo | grep gfx | head -n 1 | tr -s ' ' | cut -d' ' -f 3`
+    - The GFX version given is a stripped version number. 
+    - My Radeon 7900 XTX has a gfx string of `gfx1100`, which correlates with HSA GFX version 11.0.0. 
+    - Other cards commonly have a string of `gfx1030`, which correlates with HSA GFS version 10.3.0.
+    - There's a little bit more info [here](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html).
+8. Export your gfx version in `~/.bashrc`: `echo "export HSA_OVERRIDE_GFX_VERSION=11.0.0" >> ~/.bashrc && source ~/.bashrc`
+
+- Is this even part of the same thing? I ran `sudo dnf install https://repo.radeon.com/amdgpu-install/6.0.2/rhel/9.3/amdgpu-install-6.0.60002-1.el9.noarch.rpm`
+- Maybe this is the right place to look? [Fedora wiki - AMD ROCm](https://fedoraproject.org/wiki/SIGs/HC)
+
+## Run Ollama
+1. Test Ollama is working: `ollama run gemma:2b`
+    - Runs (downloads) the smallest model in [Ollama's library](https://ollama.com/library). 
+2. Run as a docker container: `docker run -d --device /dev/kfd --device /dev/dri -v /usr/lib64:/opt/lib64:ro -e HIP_PATH=/opt/lib64/rocm -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama && docker logs -f ollama`
+
+
+## Update Ollama
+1. Redownload the binary: `sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama`
+2. Make the binary executable: `sudo chmod +x /usr/bin/ollama`
+
+## Create Model from Modelfile
+
+`ollama create <model name> -f <modelfile relative path>`
+Where the modelfile is like:
+```
+# Choose either a model tag to download from ollama.com/library, or a path to a local model file (relative to the path of the modelfile).
+FROM ../Models/codellama-7b.Q8_0.gguf
+
+# set the chatml template passed to the model
+TEMPLATE """<|im_start|>system
+{{ .System }}<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"""
+
+# set the temperature to 1 [higher is more creative, lower is more coherent]
+PARAMETER temperature 1
+
+# set the system message
+SYSTEM """
+You are a senior devops engineer, acting as an assistant. You offer help with cloud technologies like: Terraform, AWS, kubernetes, python. You answer with code examples when possible
+"""
+
+# not sure what this does lol
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+```
+
+## Unload a Model
+There's no official support for this in the `ollama` CLI, but we can make it happen with the API:
+`curl https://api.ollama.jafner.net/api/generate -d '{"model": "<MODEL TO UNLOAD>", "keep_alive": 0}'`
--- a/docker-llm-amd/ollama/modelfiles/.loadmodels.sh
+++ b/docker-llm-amd/ollama/modelfiles/.loadmodels.sh
@ -0,0 +1,6 @@
+#!/bin/bash
+for modelfile in /modelfiles/*; do 
+    echo -n "Running: '"
+    echo "ollama create \"$(basename $modelfile)\" -f \"$modelfile\"'"
+    ollama create "$(basename $modelfile)" -f "$modelfile"
+done
--- a/docker-llm-amd/ollama/sync-modelfiles
+++ b/docker-llm-amd/ollama/sync-modelfiles
@ -0,0 +1,22 @@
+#!/bin/bash
+# THIS SCRIPT DOES NOT WORK RIGHT NOW
+# The script is fine, it just needs the modelfiles to be written with reference
+# to the models folder relative to the host system, rather than inside the 
+# container. We're using ./modelfiles/.loadmodels.sh instead right now.
+
+modelfiles="$(ls ./modelfiles/)"
+models="$(ollama list | tr -s ' ' | cut -f 1 | tail -n +2)"
+for model in $(echo "$models"); do 
+    if ! [[ $modelfiles == *"$model"* ]]; then 
+        echo -n "Running: '"
+        echo "ollama rm \"$model\"'"
+        ollama rm "$model"
+    fi
+done
+
+cd ./modelfiles
+for modelfile in ./*; do 
+    echo -n "Running: '"
+    echo "ollama create \"$(basename $modelfile)\" -f \"$modelfile\"'"
+    ollama create "$(basename $modelfile)" -f "$modelfile"
+done
--- a/docker-llm-amd/oobabooga.env
+++ b/docker-llm-amd/oobabooga.env
@ -0,0 +1,10 @@
+TORCH_CUDA_ARCH_LIST=7.5
+HOST_PORT=7860
+CONTAINER_PORT=7860
+HOST_API_PORT=5020
+CONTAINER_API_PORT=5000
+BUILD_EXTENSIONS=""
+APP_RUNTIME_GID=1000
+APP_GID=1000
+APP_UID=1000
+HF_HOME=/home/app/text-generation-webui/cache/
--- a/docker-llm-amd/oobabooga/Dockerfile
+++ b/docker-llm-amd/oobabooga/Dockerfile
@ -0,0 +1,66 @@
+# Cloned from: https://github.com/Atinoda/text-generation-webui-docker/blob/master/Dockerfile
+# Modified to install Flash-Attention-2 for AMD ROCm.
+# Install instructions for FA2 are based on:
+# https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html#using-pytorch-upstream-docker-image
+# and:
+# https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html
+# Also trimmed original comments and replaced with new.
+
+# Base build layer
+FROM ubuntu:22.04 AS app_base
+RUN apt-get update && apt-get install --no-install-recommends -y \
+    git vim build-essential python3-dev python3-venv python3-pip
+RUN pip3 install virtualenv
+RUN virtualenv /venv
+ENV VIRTUAL_ENV=/venv
+RUN python3 -m venv $VIRTUAL_ENV
+ENV PATH="$VIRTUAL_ENV/bin:$PATH"
+RUN pip3 install --upgrade pip setuptools
+COPY ./scripts /scripts
+RUN chmod +x /scripts/*
+RUN git clone https://github.com/oobabooga/text-generation-webui /src
+ARG VERSION_TAG
+ENV VERSION_TAG=${VERSION_TAG}
+RUN . /scripts/checkout_src_version.sh
+RUN cp -ar /src /app
+
+# AMD build layer
+FROM app_base AS app_rocm
+RUN pip3 install --pre torch torchvision torchaudio \
+    --index-url https://download.pytorch.org/whl/nightly/rocm6.1
+RUN pip3 install -r /app/requirements_amd.txt
+RUN git clone --recursive https://github.com/ROCm/flash-attention.git /src-fa
+RUN cd /src-fa && MAX_JOBS=$((`nproc` / 2)) pip install -v .
+FROM app_rocm AS app_rocm_x
+RUN chmod +x /scripts/build_extensions.sh && \
+    . /scripts/build_extensions.sh
+
+# Base run layer
+FROM ubuntu:22.04 AS run_base
+RUN apt-get update && apt-get install --no-install-recommends -y \
+    python3-venv python3-dev git
+COPY --from=app_base /app /app
+COPY --from=app_base /src /src
+ENV VIRTUAL_ENV=/venv
+ENV PATH="$VIRTUAL_ENV/bin:$PATH"
+WORKDIR /app
+EXPOSE 7860
+EXPOSE 5000
+EXPOSE 5005
+ENV PYTHONUNBUFFERED=1
+ARG BUILD_DATE
+ENV BUILD_DATE=$BUILD_DATE
+RUN echo "$BUILD_DATE" > /build_date.txt
+ARG VERSION_TAG
+ENV VERSION_TAG=$VERSION_TAG
+RUN echo "$VERSION_TAG" > /version_tag.txt
+COPY ./scripts /scripts
+RUN chmod +x /scripts/*
+ENTRYPOINT ["/scripts/docker-entrypoint.sh"]
+
+# AMD run layer
+FROM run_base AS default-rocm
+COPY --from=app_rocm_x $VIRTUAL_ENV $VIRTUAL_ENV
+RUN echo "ROCM Extended" > /variant.txt
+ENV EXTRA_LAUNCH_ARGS=""
+CMD ["python3", "/app/server.py"]
--- a/docker-llm-amd/sdwebui/Dockerfile
+++ b/docker-llm-amd/sdwebui/Dockerfile
@ -0,0 +1,17 @@
+FROM rocm/pytorch:latest
+WORKDIR /dockerx
+RUN git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu 
+WORKDIR /dockerx/stable-diffusion-webui-amdgpu
+RUN python -m pip install clip open-clip-torch onnxruntime-training xformers
+RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui-assets.git repositories/stable-diffusion-webui-assets
+RUN git clone https://github.com/Stability-AI/stablediffusion.git repositories/stable-diffusion-stability-ai
+RUN git clone https://github.com/Stability-AI/generative-models.git repositories/generative-models
+RUN git clone https://github.com/crowsonkb/k-diffusion.git repositories/k-diffusion
+RUN git clone https://github.com/salesforce/BLIP.git repositories/BLIP
+
+RUN python -m pip install --upgrade pip wheel
+ENV REQS_FILE='requirements_versions.txt'
+ENV venv_dir="-"
+RUN python -m pip install -r requirements_versions.txt
+ENV COMMANDLINE_ARGS="--listen --allow-code --api --administrator --no-download-sd-model --medvram --use-directml"
+CMD ["python", "-u", "launch.py", "--precision", "full", "--no-half"]
--- a/docker-llm-amd/sdwebui/README.md
+++ b/docker-llm-amd/sdwebui/README.md
@ -0,0 +1,40 @@
+Below is a rough and dirty documentation of steps taken to set up stable-diffusion for my 7900 XTX. Sources cited as used. 
+```
+# Per: https://github.com/ROCm/composable_kernel/discussions/1032
+# I did not read the bit where it said to execute these steps in a docker container (rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1)
+# The following is the docker run command I *would have* used:
+docker run rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1
+
+git clone -b amd-stg-open https://github.com/RadeonOpenCompute/llvm-project.git # 
+cd llvm-project && git checkout 1f2f539f7cab51623fad8c8a5b574eda1e81e0c0 && mkdir -p build && cd build
+cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=1 -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" ../llvm 
+make -j16 # This step takes a long time. Reduce 16 to use fewer cores for the job.
+# The build errored out at ~71% with:
+# make: *** [Makefile:156: all] Error 2
+# So that was a big waste of time.
+# It was during this step that I realized we can just spin up the provided docker container to get the pre-compiled binary...
+
+docker run aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
+# This is a ~20GB docker image. It takes a long time to pull.
+cd ~/AITemplate/examples/05_stable_diffusion/
+sh run_ait_sd_webui.sh
+# This gave us an error about
+#    [05:55:20] model_interface.cpp:94: Error: DeviceMalloc(&result, n_bytes) API call failed: 
+#    no ROCm-capable device is detected at model_interface.cpp, line49
+# Per: https://www.reddit.com/r/ROCm/comments/177pwxv/how_can_i_set_up_linux_rocm_pytorch_for_7900xtx/
+# We rerun the container with some extra parameters to give it access to our GPU.
+docker run -it --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0
+cd ~/AITemplate/examples/05_stable_diffusion/
+sh run_ait_sd_webui.sh
+# This got us to "Uvicorn running on http://0.0.0.0:5000"
+# Nice. Oh. We haven't exposed any ports...
+docker run -it --rm -p 5500:5000 --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0 
+cd ~/AITemplate/examples/05_stable_diffusion/ && sh run_ait_sd_webui.sh
+# I think :5000 on the local host is probably already in use, so we'll just dodge the possibility of a collision. 
+# Uhh. Well now what? connecting to http://localhost:5500 in our browser returns a 404.
+# Maybe it's the Streamlit server that we should be looking at?
+docker run -it --rm -p 5500:5000 -p 5501:8501 --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0 
+cd ~/AITemplate/examples/05_stable_diffusion/ && sh run_ait_sd_webui.sh
+# Ah yeah, that did it. Now we can see the Stable Diffusion test at 5501. And we can also see our VRAM utilization increase by ~6 GB when we run the script. Now just to test the performance! Nice. We get <2s to generate an image. 
+# Now can we generalize this to a more useful model?
+```
--- a/docker-llm-amd/sillytavern/config/config.yaml
+++ b/docker-llm-amd/sillytavern/config/config.yaml
@ -0,0 +1,46 @@
+dataRoot: ./data
+listen: false
+port: 8000
+whitelistMode: false
+enableForwardedWhitelist: false
+whitelist:
+  - 127.0.0.1
+  - 172.19.0.1
+basicAuthMode: true
+basicAuthUser:
+  username: joey
+  password: ***REMOVED***
+enableCorsProxy: false
+enableUserAccounts: false
+enableDiscreetLogin: false
+cookieSecret: Viwb315DDUewxmznF1cX1tJiLu/TW1AK8envDePAbovByvpKdJHPI5Nrcd6mpSGOkvDYy72OqhV8NnYubFA3KQ==
+disableCsrfProtection: false
+securityOverride: false
+autorun: true
+disableThumbnails: false
+thumbnailsQuality: 95
+avatarThumbnailsPng: false
+allowKeysExposure: false
+skipContentCheck: false
+disableChatBackup: false
+whitelistImportDomains:
+  - localhost
+  - cdn.discordapp.com
+  - files.catbox.moe
+  - raw.githubusercontent.com
+requestOverrides: []
+enableExtensions: true
+extras:
+  disableAutoDownload: false
+  classificationModel: Cohee/distilbert-base-uncased-go-emotions-onnx
+  captioningModel: Xenova/vit-gpt2-image-captioning
+  embeddingModel: Cohee/jina-embeddings-v2-base-en
+  promptExpansionModel: Cohee/fooocus_expansion-onnx
+  speechToTextModel: Xenova/whisper-small
+  textToSpeechModel: Xenova/speecht5_tts
+openai:
+  randomizeUserId: false
+  captionSystemPrompt: ""
+deepl:
+  formality: default
+enableServerPlugins: false
--- a/docker-llm-amd/up
+++ b/docker-llm-amd/up
@ -0,0 +1,30 @@
+#!/bin/bash
+initialdir=${PWD}
+cd /home/joey/Projects/LLMs/
+
+wget -q --spider https://ollama-api.jafner.net
+if [ $? -eq 0 ]; then
+    cd ollama
+    HSA_OVERRIDE_GFX_VERSION=11.0.0 
+    OLLAMA_HOST=192.168.1.135:11434 
+    OLLAMA_ORIGINS="app://obsidian.md*"
+    OLLAMA_MAX_LOADED_MODELS=0
+    ollama serve &
+    cd ..
+fi
+# Access ollama (API, not a web UI) at http://192.168.1.135:11434 *or* https://ollama-api.jafner.net
+
+cd open-webui
+docker compose up -d 
+cd ..
+# Access open-webui at http://localhost:8080 *or* https://openwebui.jafner.net
+
+cd SillyTavern
+./start.sh &
+cd ..
+# Access SillyTavern at http://localhost:5000 *or* https://sillytavern.jafner.net
+
+cd text-generation-webui
+./start_linux.sh --api --flash-attn & 
+cd ..
+# Access text-generation-webui at http://localhost:7860 *or* https://oobabooga.jafner.net