diff --git a/docker-llm-amd/.env b/docker-llm-amd/.env new file mode 100644 index 00000000..b1c5151a --- /dev/null +++ b/docker-llm-amd/.env @@ -0,0 +1 @@ +MODELS_DIR=~/Git/docker-llm-amd/models \ No newline at end of file diff --git a/docker-llm-amd/.gitignore b/docker-llm-amd/.gitignore new file mode 100644 index 00000000..651832e5 --- /dev/null +++ b/docker-llm-amd/.gitignore @@ -0,0 +1,2 @@ +models/ +ollama/modelfiles/ \ No newline at end of file diff --git a/docker-llm-amd/FLASH-ATTENTION.md b/docker-llm-amd/FLASH-ATTENTION.md new file mode 100644 index 00000000..c3e76408 --- /dev/null +++ b/docker-llm-amd/FLASH-ATTENTION.md @@ -0,0 +1,25 @@ +# Flash Attention in Docker on AMD is Not Yet Working +Below are my notes on the efforts I've made to get it working. + +```Dockerfile +FROM rocm/pytorch-nightly:latest +COPY . . +RUN git clone --recursive https://github.com/ROCm/flash-attention.git /tmp/flash-attention +WORKDIR /tmp/flash-attention +ENV MAX_JOBS=8 +RUN pip install -v . +``` + +# Resources +1. [What is Flash-attention? (How do i use it with Oobabooga?) :...](https://www.reddit.com/r/Oobabooga/comments/193mcv0/what_is_flashattention_how_do_i_use_it_with/) +2. [Adding flash attention to one click installer · Issue #4015 ...](https://github.com/oobabooga/text-generation-webui/issues/4015) +3. [Accelerating Large Language Models with Flash Attention on A...](https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html) +4. [GitHub - Dao-AILab/flash-attention: Fast and memory-efficien...](https://github.com/Dao-AILab/flash-attention) +5. [GitHub - ROCm/llvm-project: This is the AMD-maintained fork ...](https://github.com/ROCm/llvm-project) +6. [GitHub - ROCm/AITemplate: AITemplate is a Python framework w...](https://github.com/ROCm/AITemplate) +7. [Stable diffusion with RX7900XTX on ROCm5.7 · ROCm/composable...](https://github.com/ROCm/composable_kernel/discussions/1032#522-build-ait-and-stable-diffusion-demo) +8. [Current state of training on AMD Radeon 7900 XTX (with bench...](https://www.reddit.com/r/LocalLLaMA/comments/1atvxu2/current_state_of_training_on_amd_radeon_7900_xtx/) [[Current state of training on AMD Radeon 7900 XTX (with benchmarks) rLocalLLaMA]] +9. [llm-tracker - howto/AMD GPUs](https://llm-tracker.info/howto/AMD-GPUs) +10. [RDNA3 support · Issue #27 · ROCm/flash-attention · GitHub](https://github.com/ROCm/flash-attention/issues/27) +11. [GitHub - ROCm/xformers: Hackable and optimized Transformers ...](https://github.com/ROCm/xformers/tree/develop) +12. [\[ROCm\] support Radeon™ 7900 series (gfx1100) without using...](https://github.com/vllm-project/vllm/pull/2768) \ No newline at end of file diff --git a/docker-llm-amd/README.md b/docker-llm-amd/README.md new file mode 100644 index 00000000..fd086170 --- /dev/null +++ b/docker-llm-amd/README.md @@ -0,0 +1,63 @@ +### What we have so far + +1. [Ollama](https://github.com/ollama/ollama) loads and serves a few models via API. + - Ollama itself doesn't have a UI. CLI and API only. + - The API can be accessed at [`https://api.ollama.jafner.net`](https://api.ollama.jafner.net). + - Ollama running as configured supports ROCm (GPU acceleration). + - Configured models are described [here](/ollama/modelfiles/), and + - Run Ollama with: `HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0 ollama serve` +2. [Open-webui](https://github.com/open-webui/open-webui) provides a pretty web interface for interacting with Ollama. + - The web UI can be accessed at [`https://ollama.jafner.net`](https://ollama.jafner.net). + - The web UI is protected by Traefik's `lan-only` rule, as well as its own authentication layer. + - Run open-webui with: `cd ~/Projects/LLMs/open-webui && docker compose up -d && docker compose logs -f` + - Then open [the page](https://ollama.jafner.net) and log in. + - Connect the frontend to the ollama instance by opening the settings (top-right), clicking "Connections", and setting "Ollama Base URL" to "https://api.ollama.jafner.net". Hit refresh and begin using. +3. [SillyTavern](https://github.com/SillyTavern/SillyTavern) provides a powerful interface for building and using characters. + - Run SillyTavern with: `cd ~/Projects/LLMs/SillyTavern && ./start.sh` +4. [Oobabooga](https://github.com/oobabooga/text-generation-webui) provides a more powerful web UI than open-webui, but it's less pretty. + - Run Oobabooga with: `cd ~/Projects/LLMs/text-generation-webui && ./start_linux.sh` + - Requires the following environment variables be set in `one_click.py` (right after import statements): +``` +os.environ["ROCM_PATH"] = '/opt/rocm' +os.environ["HSA_OVERRIDE_GFX_VERSION"] = '11.0.0' +os.environ["HCC_AMDGPU_TARGET"] = 'gfx1100' +os.environ["PATH"] = '/opt/rocm/bin:$PATH' +os.environ["LD_LIBRARY_PATH"] = '/opt/rocm/lib:$LD_LIBRARY_PATH' +os.environ["CUDA_VISIBLE_DEVICES"] = '0' +os.environ["HCC_SERIALIZE_KERNEL"] = '0x3' +os.environ["HCC_SERIALIZE_KERNEL"]='0x3' +os.environ["HCC_SERIALIZE_COPY"]='0x3' +os.environ["HIP_TRACE_API"]='0x2' +os.environ["HF_TOKEN"]='' +``` + - Requires the following environment variable be set in `start_linux.sh` for access to non-public model downloads: +``` +# config +HF_TOKEN="" +``` + +That's where we're at. + +### Set Up Models Directory +1. Navigate to the source directory with all models: `cd "~/Nextcloud/Large Language Models/GGUF/"` +2. Symlink each file into the docker project's models directory: `for model in ./*; do ln $(realpath $model) $(realpath ~/Git/docker-llm-amd/models/$model); done` + - Note that the symlinks must be hardlinks or they will not be passed properly into containers. +3. Launch ollama: `docker compose up -d ollama` +4. Create models defined by the modelfiles: `docker compose exec -dit ollama /modelfiles/.loadmodels.sh` + +### Roadmap +- Set up StableDiffusion-web-UI. +- Get characters in SillyTavern behaving as expected. + - Repetition issues. + - Obsession with certain parts of prompt. + - Refusals. +- Set up something for character voices. + - [Coqui TTS - Docker install](https://github.com/coqui-ai/TTS/tree/dev?tab=readme-ov-file#docker-image). + - [TTS Generation Web UI](https://github.com/rsxdalv/tts-generation-webui). + +- Set up Extras for SillyTavern. + +### Notes +- So many of these projects use Python with its various version and dependencies and shit. + - *Always* use a Docker container or virtual environment. + - It's like a condom. \ No newline at end of file diff --git a/docker-llm-amd/docker-compose.yml b/docker-llm-amd/docker-compose.yml new file mode 100644 index 00000000..a38a7766 --- /dev/null +++ b/docker-llm-amd/docker-compose.yml @@ -0,0 +1,142 @@ +# Addresses: +# ollama :11434 +# open-webui :3000 +# sillytavern :8000 +# sdwebui :7868 +# oobabooga :7860 :5010 +# exui :5030 + +version: '3' +name: 'ai' +services: + ollama: + container_name: ai_ollama + image: ollama/ollama:rocm + networks: + - ai + privileged: false + group_add: + - video + ports: + - 11434:11434 + devices: + - /dev/kfd + - /dev/dri + volumes: + - ./ollama/modelfiles:/modelfiles + - $MODELS_DIR:/models + - ollama-model-storage:/root/.ollama/models/blobs + environment: + - OLLAMA_ORIGINS="app://obsidian.md*" + - OLLAMA_MAX_LOADED_MODELS=0 + + open-webui: + container_name: ai_open-webui + image: ghcr.io/open-webui/open-webui:main + ports: + - 3000:8080 + networks: + - ai + volumes: + - open-webui:/app/backend/data + environment: + - OLLAMA_BASE_URL=http://ollama:11434 + + sillytavern: + container_name: ai_sillytavern + image: ghcr.io/sillytavern/sillytavern:staging + networks: + - ai + privileged: false + ports: + - 8000:8000/tcp + volumes: + - ./sillytavern/config/config.yaml:/home/node/app/config/config.yaml + environment: + - TZ=America/Los_Angeles + + sdwebui: + container_name: ai_sdwebui + build: + context: ./sdwebui + networks: + - ai + privileged: false + group_add: + - video + ports: + - 7868:7860 + devices: + - /dev/kfd + - /dev/dri + volumes: + - ./models_t2i:/dockerx/stable-diffusion-webui-amdgpu/models + - ./sdwebui/images:/images + - sdwebui_cache:/dockerx/stable-diffusion-webui-amdgpu/models/ONNX + deploy: + resources: + limits: + memory: 16G + + oobabooga: + container_name: ai_oobabooga + image: atinoda/text-generation-webui:base-rocm + environment: + - EXTRA_LAUNCH_ARGS="--listen --verbose --chat-buttons --use_flash_attention_2 --flash-attn --api --extensions openai" + stdin_open: true + tty: true + networks: + - ai + ipc: host + group_add: + - video + cap_add: + - SYS_PTRACE + security_opt: + - seccomp=unconfined + ports: + - 7860:7860 + - 5010:5000 + devices: + - /dev/kfd + - /dev/dri + volumes: + - $MODELS_DIR:/app/models + - oobabooga_cache:/root/.cache + - ./oobabooga/characters:/app/characters + - ./oobabooga/instruction-templates:/app/instruction-templates + - ./oobabooga/loras:/app/loras + - ./oobabooga/presets:/app/presets + - ./oobabooga/prompts:/app/prompts + - ./oobabooga/training:/app/training + + exui: + container_name: ai_exui + build: + context: ./exl2 + networks: + - ai + privileged: false + group_add: + - video + ports: + - 5030:5000 + devices: + - /dev/kfd + - /dev/dri + volumes: + - $MODELS_DIR:/models + +volumes: + ollama-model-storage: + open-webui: + sdwebui_cache: + oobabooga: + oobabooga_cache: +networks: + ai: + name: "ai" + ipam: + driver: default + config: + - subnet: 172.20.0.0/16 diff --git a/docker-llm-amd/exui/Dockerfile b/docker-llm-amd/exui/Dockerfile new file mode 100644 index 00000000..68216491 --- /dev/null +++ b/docker-llm-amd/exui/Dockerfile @@ -0,0 +1,11 @@ +FROM python:3.10-bookworm +RUN apt update && \ + apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \ + rm -rf /var/lib/apt/lists/* +WORKDIR /usr/src/app +COPY requirements.txt ./ +RUN pip install --no-cache-dir -r requirements.txt +RUN git clone https://github.com/turboderp/exui +WORKDIR /usr/src/app/exui +EXPOSE 5000 +CMD ["python", "-u", "server.py", "--host", "0.0.0.0:5000"] \ No newline at end of file diff --git a/docker-llm-amd/exui/requirements.txt b/docker-llm-amd/exui/requirements.txt new file mode 100644 index 00000000..ccf0aef8 --- /dev/null +++ b/docker-llm-amd/exui/requirements.txt @@ -0,0 +1,45 @@ +blinker==1.8.2 +certifi==2024.2.2 +charset-normalizer==3.3.2 +click==8.1.7 +cramjam==2.8.3 +exllamav2 @ https://github.com/turboderp/exllamav2/releases/download/v0.0.21/exllamav2-0.0.21+rocm6.0-cp310-cp310-linux_x86_64.whl +fastparquet==2024.5.0 +filelock==3.13.1 +Flask==3.0.3 +fsspec==2024.2.0 +huggingface-hub==0.23.1 +idna==3.7 +itsdangerous==2.2.0 +Jinja2==3.1.3 +MarkupSafe==2.1.5 +mpmath==1.3.0 +networkx==3.2.1 +ninja==1.11.1.1 +numpy==1.26.3 +packaging==24.0 +pandas==2.2.2 +pillow==10.2.0 +Pygments==2.18.0 +pynvml==11.5.0 +python-dateutil==2.9.0.post0 +pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-2.3.0-cp310-cp310-linux_x86_64.whl +pytz==2024.1 +PyYAML==6.0.1 +regex==2024.5.15 +requests==2.32.2 +safetensors==0.4.3 +sentencepiece==0.2.0 +six==1.16.0 +sympy==1.12 +tokenizers==0.19.1 +torch @ https://download.pytorch.org/whl/rocm6.0/torch-2.3.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl +torchaudio @ https://download.pytorch.org/whl/rocm6.0/torchaudio-2.3.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl +torchvision @ https://download.pytorch.org/whl/rocm6.0/torchvision-0.18.0%2Brocm6.0-cp310-cp310-linux_x86_64.whl +tqdm==4.66.4 +typing_extensions==4.9.0 +tzdata==2024.1 +urllib3==2.2.1 +waitress==3.0.0 +websockets==12.0 +Werkzeug==3.0.3 diff --git a/docker-llm-amd/ollama/README.md b/docker-llm-amd/ollama/README.md new file mode 100644 index 00000000..44cf107c --- /dev/null +++ b/docker-llm-amd/ollama/README.md @@ -0,0 +1,90 @@ +# Ollama Notes +Per: [Ollama/Ollama README](https://github.com/ollama/ollama) + +## Install Steps +Per: [linux.md](https://github.com/ollama/ollama/blob/main/docs/linux.md) + +1. Download the binary: `sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama` +2. Make the binary executable: `sudo chmod +x /usr/bin/ollama` +3. Create a user for ollama: `sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama` +4. Create a SystemD service file for ollama: `sudo nano /etc/systemd/system/ollama.service` and populate it with the following. +```ini +[Unit] +Description=Ollama Service +After=network-online.target + +[Service] +Environment='HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0' +ExecStart=/usr/bin/ollama serve +User=ollama +Group=ollama +Restart=always +RestartSec=3 + +[Install] +WantedBy=default.target +``` +5. Register and enable the ollama service: `sudo systemctl daemon-reload && sudo systemctl enable ollama` +6. Start ollama: `sudo systemctl start ollama` + +### Enable ROCm Support +Per: [anvesh.jhuboo on Medium](https://medium.com/@anvesh.jhuboo/rocm-pytorch-on-fedora-51224563e5be) + +1. Add user to `video` group to allow access to GPU resources: `sudo usermod -aG video $LOGNAME` +2. Install `rocminfo` package: `sudo dnf install rocminfo` +3. Check for rocm support: `rocminfo` +4. Install `rocm-opencl` package: `sudo dnf install rocm-opencl` +5. Install `rocm-clinfo` package: `sudo dnf install rocm-clinfo` +6. Verify opencl is working: `rocm-clinfo` +7. Get the GFX version of your GPU: `rocminfo | grep gfx | head -n 1 | tr -s ' ' | cut -d' ' -f 3` + - The GFX version given is a stripped version number. + - My Radeon 7900 XTX has a gfx string of `gfx1100`, which correlates with HSA GFX version 11.0.0. + - Other cards commonly have a string of `gfx1030`, which correlates with HSA GFS version 10.3.0. + - There's a little bit more info [here](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html). +8. Export your gfx version in `~/.bashrc`: `echo "export HSA_OVERRIDE_GFX_VERSION=11.0.0" >> ~/.bashrc && source ~/.bashrc` + +- Is this even part of the same thing? I ran `sudo dnf install https://repo.radeon.com/amdgpu-install/6.0.2/rhel/9.3/amdgpu-install-6.0.60002-1.el9.noarch.rpm` +- Maybe this is the right place to look? [Fedora wiki - AMD ROCm](https://fedoraproject.org/wiki/SIGs/HC) + +## Run Ollama +1. Test Ollama is working: `ollama run gemma:2b` + - Runs (downloads) the smallest model in [Ollama's library](https://ollama.com/library). +2. Run as a docker container: `docker run -d --device /dev/kfd --device /dev/dri -v /usr/lib64:/opt/lib64:ro -e HIP_PATH=/opt/lib64/rocm -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama && docker logs -f ollama` + + +## Update Ollama +1. Redownload the binary: `sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama` +2. Make the binary executable: `sudo chmod +x /usr/bin/ollama` + +## Create Model from Modelfile + +`ollama create -f ` +Where the modelfile is like: +``` +# Choose either a model tag to download from ollama.com/library, or a path to a local model file (relative to the path of the modelfile). +FROM ../Models/codellama-7b.Q8_0.gguf + +# set the chatml template passed to the model +TEMPLATE """<|im_start|>system +{{ .System }}<|im_end|> +<|im_start|>user +{{ .Prompt }}<|im_end|> +<|im_start|>assistant +""" + +# set the temperature to 1 [higher is more creative, lower is more coherent] +PARAMETER temperature 1 + +# set the system message +SYSTEM """ +You are a senior devops engineer, acting as an assistant. You offer help with cloud technologies like: Terraform, AWS, kubernetes, python. You answer with code examples when possible +""" + +# not sure what this does lol +PARAMETER stop "<|im_start|>" +PARAMETER stop "<|im_end|>" +``` + +## Unload a Model +There's no official support for this in the `ollama` CLI, but we can make it happen with the API: +`curl https://api.ollama.jafner.net/api/generate -d '{"model": "", "keep_alive": 0}'` \ No newline at end of file diff --git a/docker-llm-amd/ollama/modelfiles/.loadmodels.sh b/docker-llm-amd/ollama/modelfiles/.loadmodels.sh new file mode 100755 index 00000000..c05603ee --- /dev/null +++ b/docker-llm-amd/ollama/modelfiles/.loadmodels.sh @@ -0,0 +1,6 @@ +#!/bin/bash +for modelfile in /modelfiles/*; do + echo -n "Running: '" + echo "ollama create \"$(basename $modelfile)\" -f \"$modelfile\"'" + ollama create "$(basename $modelfile)" -f "$modelfile" +done \ No newline at end of file diff --git a/docker-llm-amd/ollama/sync-modelfiles b/docker-llm-amd/ollama/sync-modelfiles new file mode 100755 index 00000000..4201ab1b --- /dev/null +++ b/docker-llm-amd/ollama/sync-modelfiles @@ -0,0 +1,22 @@ +#!/bin/bash +# THIS SCRIPT DOES NOT WORK RIGHT NOW +# The script is fine, it just needs the modelfiles to be written with reference +# to the models folder relative to the host system, rather than inside the +# container. We're using ./modelfiles/.loadmodels.sh instead right now. + +modelfiles="$(ls ./modelfiles/)" +models="$(ollama list | tr -s ' ' | cut -f 1 | tail -n +2)" +for model in $(echo "$models"); do + if ! [[ $modelfiles == *"$model"* ]]; then + echo -n "Running: '" + echo "ollama rm \"$model\"'" + ollama rm "$model" + fi +done + +cd ./modelfiles +for modelfile in ./*; do + echo -n "Running: '" + echo "ollama create \"$(basename $modelfile)\" -f \"$modelfile\"'" + ollama create "$(basename $modelfile)" -f "$modelfile" +done diff --git a/docker-llm-amd/oobabooga.env b/docker-llm-amd/oobabooga.env new file mode 100644 index 00000000..2cb70db0 --- /dev/null +++ b/docker-llm-amd/oobabooga.env @@ -0,0 +1,10 @@ +TORCH_CUDA_ARCH_LIST=7.5 +HOST_PORT=7860 +CONTAINER_PORT=7860 +HOST_API_PORT=5020 +CONTAINER_API_PORT=5000 +BUILD_EXTENSIONS="" +APP_RUNTIME_GID=1000 +APP_GID=1000 +APP_UID=1000 +HF_HOME=/home/app/text-generation-webui/cache/ \ No newline at end of file diff --git a/docker-llm-amd/oobabooga/Dockerfile b/docker-llm-amd/oobabooga/Dockerfile new file mode 100644 index 00000000..c65d99aa --- /dev/null +++ b/docker-llm-amd/oobabooga/Dockerfile @@ -0,0 +1,66 @@ +# Cloned from: https://github.com/Atinoda/text-generation-webui-docker/blob/master/Dockerfile +# Modified to install Flash-Attention-2 for AMD ROCm. +# Install instructions for FA2 are based on: +# https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html#using-pytorch-upstream-docker-image +# and: +# https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html +# Also trimmed original comments and replaced with new. + +# Base build layer +FROM ubuntu:22.04 AS app_base +RUN apt-get update && apt-get install --no-install-recommends -y \ + git vim build-essential python3-dev python3-venv python3-pip +RUN pip3 install virtualenv +RUN virtualenv /venv +ENV VIRTUAL_ENV=/venv +RUN python3 -m venv $VIRTUAL_ENV +ENV PATH="$VIRTUAL_ENV/bin:$PATH" +RUN pip3 install --upgrade pip setuptools +COPY ./scripts /scripts +RUN chmod +x /scripts/* +RUN git clone https://github.com/oobabooga/text-generation-webui /src +ARG VERSION_TAG +ENV VERSION_TAG=${VERSION_TAG} +RUN . /scripts/checkout_src_version.sh +RUN cp -ar /src /app + +# AMD build layer +FROM app_base AS app_rocm +RUN pip3 install --pre torch torchvision torchaudio \ + --index-url https://download.pytorch.org/whl/nightly/rocm6.1 +RUN pip3 install -r /app/requirements_amd.txt +RUN git clone --recursive https://github.com/ROCm/flash-attention.git /src-fa +RUN cd /src-fa && MAX_JOBS=$((`nproc` / 2)) pip install -v . +FROM app_rocm AS app_rocm_x +RUN chmod +x /scripts/build_extensions.sh && \ + . /scripts/build_extensions.sh + +# Base run layer +FROM ubuntu:22.04 AS run_base +RUN apt-get update && apt-get install --no-install-recommends -y \ + python3-venv python3-dev git +COPY --from=app_base /app /app +COPY --from=app_base /src /src +ENV VIRTUAL_ENV=/venv +ENV PATH="$VIRTUAL_ENV/bin:$PATH" +WORKDIR /app +EXPOSE 7860 +EXPOSE 5000 +EXPOSE 5005 +ENV PYTHONUNBUFFERED=1 +ARG BUILD_DATE +ENV BUILD_DATE=$BUILD_DATE +RUN echo "$BUILD_DATE" > /build_date.txt +ARG VERSION_TAG +ENV VERSION_TAG=$VERSION_TAG +RUN echo "$VERSION_TAG" > /version_tag.txt +COPY ./scripts /scripts +RUN chmod +x /scripts/* +ENTRYPOINT ["/scripts/docker-entrypoint.sh"] + +# AMD run layer +FROM run_base AS default-rocm +COPY --from=app_rocm_x $VIRTUAL_ENV $VIRTUAL_ENV +RUN echo "ROCM Extended" > /variant.txt +ENV EXTRA_LAUNCH_ARGS="" +CMD ["python3", "/app/server.py"] \ No newline at end of file diff --git a/docker-llm-amd/sdwebui/Dockerfile b/docker-llm-amd/sdwebui/Dockerfile new file mode 100644 index 00000000..2f1be00e --- /dev/null +++ b/docker-llm-amd/sdwebui/Dockerfile @@ -0,0 +1,17 @@ +FROM rocm/pytorch:latest +WORKDIR /dockerx +RUN git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu +WORKDIR /dockerx/stable-diffusion-webui-amdgpu +RUN python -m pip install clip open-clip-torch onnxruntime-training xformers +RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui-assets.git repositories/stable-diffusion-webui-assets +RUN git clone https://github.com/Stability-AI/stablediffusion.git repositories/stable-diffusion-stability-ai +RUN git clone https://github.com/Stability-AI/generative-models.git repositories/generative-models +RUN git clone https://github.com/crowsonkb/k-diffusion.git repositories/k-diffusion +RUN git clone https://github.com/salesforce/BLIP.git repositories/BLIP + +RUN python -m pip install --upgrade pip wheel +ENV REQS_FILE='requirements_versions.txt' +ENV venv_dir="-" +RUN python -m pip install -r requirements_versions.txt +ENV COMMANDLINE_ARGS="--listen --allow-code --api --administrator --no-download-sd-model --medvram --use-directml" +CMD ["python", "-u", "launch.py", "--precision", "full", "--no-half"] \ No newline at end of file diff --git a/docker-llm-amd/sdwebui/README.md b/docker-llm-amd/sdwebui/README.md new file mode 100644 index 00000000..d59e1299 --- /dev/null +++ b/docker-llm-amd/sdwebui/README.md @@ -0,0 +1,40 @@ +Below is a rough and dirty documentation of steps taken to set up stable-diffusion for my 7900 XTX. Sources cited as used. +``` +# Per: https://github.com/ROCm/composable_kernel/discussions/1032 +# I did not read the bit where it said to execute these steps in a docker container (rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1) +# The following is the docker run command I *would have* used: +docker run rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 + +git clone -b amd-stg-open https://github.com/RadeonOpenCompute/llvm-project.git # +cd llvm-project && git checkout 1f2f539f7cab51623fad8c8a5b574eda1e81e0c0 && mkdir -p build && cd build +cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=1 -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" ../llvm +make -j16 # This step takes a long time. Reduce 16 to use fewer cores for the job. +# The build errored out at ~71% with: +# make: *** [Makefile:156: all] Error 2 +# So that was a big waste of time. +# It was during this step that I realized we can just spin up the provided docker container to get the pre-compiled binary... + +docker run aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0 +# This is a ~20GB docker image. It takes a long time to pull. +cd ~/AITemplate/examples/05_stable_diffusion/ +sh run_ait_sd_webui.sh +# This gave us an error about +# [05:55:20] model_interface.cpp:94: Error: DeviceMalloc(&result, n_bytes) API call failed: +# no ROCm-capable device is detected at model_interface.cpp, line49 +# Per: https://www.reddit.com/r/ROCm/comments/177pwxv/how_can_i_set_up_linux_rocm_pytorch_for_7900xtx/ +# We rerun the container with some extra parameters to give it access to our GPU. +docker run -it --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0 +cd ~/AITemplate/examples/05_stable_diffusion/ +sh run_ait_sd_webui.sh +# This got us to "Uvicorn running on http://0.0.0.0:5000" +# Nice. Oh. We haven't exposed any ports... +docker run -it --rm -p 5500:5000 --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0 +cd ~/AITemplate/examples/05_stable_diffusion/ && sh run_ait_sd_webui.sh +# I think :5000 on the local host is probably already in use, so we'll just dodge the possibility of a collision. +# Uhh. Well now what? connecting to http://localhost:5500 in our browser returns a 404. +# Maybe it's the Streamlit server that we should be looking at? +docker run -it --rm -p 5500:5000 -p 5501:8501 --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host aska0096/rocm5.7_ait_ck_navi31_sd2:v1.0 +cd ~/AITemplate/examples/05_stable_diffusion/ && sh run_ait_sd_webui.sh +# Ah yeah, that did it. Now we can see the Stable Diffusion test at 5501. And we can also see our VRAM utilization increase by ~6 GB when we run the script. Now just to test the performance! Nice. We get <2s to generate an image. +# Now can we generalize this to a more useful model? +``` \ No newline at end of file diff --git a/docker-llm-amd/sillytavern/config/config.yaml b/docker-llm-amd/sillytavern/config/config.yaml new file mode 100644 index 00000000..541c89fd --- /dev/null +++ b/docker-llm-amd/sillytavern/config/config.yaml @@ -0,0 +1,46 @@ +dataRoot: ./data +listen: false +port: 8000 +whitelistMode: false +enableForwardedWhitelist: false +whitelist: + - 127.0.0.1 + - 172.19.0.1 +basicAuthMode: true +basicAuthUser: + username: joey + password: ***REMOVED*** +enableCorsProxy: false +enableUserAccounts: false +enableDiscreetLogin: false +cookieSecret: Viwb315DDUewxmznF1cX1tJiLu/TW1AK8envDePAbovByvpKdJHPI5Nrcd6mpSGOkvDYy72OqhV8NnYubFA3KQ== +disableCsrfProtection: false +securityOverride: false +autorun: true +disableThumbnails: false +thumbnailsQuality: 95 +avatarThumbnailsPng: false +allowKeysExposure: false +skipContentCheck: false +disableChatBackup: false +whitelistImportDomains: + - localhost + - cdn.discordapp.com + - files.catbox.moe + - raw.githubusercontent.com +requestOverrides: [] +enableExtensions: true +extras: + disableAutoDownload: false + classificationModel: Cohee/distilbert-base-uncased-go-emotions-onnx + captioningModel: Xenova/vit-gpt2-image-captioning + embeddingModel: Cohee/jina-embeddings-v2-base-en + promptExpansionModel: Cohee/fooocus_expansion-onnx + speechToTextModel: Xenova/whisper-small + textToSpeechModel: Xenova/speecht5_tts +openai: + randomizeUserId: false + captionSystemPrompt: "" +deepl: + formality: default +enableServerPlugins: false diff --git a/docker-llm-amd/up b/docker-llm-amd/up new file mode 100755 index 00000000..145ae511 --- /dev/null +++ b/docker-llm-amd/up @@ -0,0 +1,30 @@ +#!/bin/bash +initialdir=${PWD} +cd /home/joey/Projects/LLMs/ + +wget -q --spider https://ollama-api.jafner.net +if [ $? -eq 0 ]; then + cd ollama + HSA_OVERRIDE_GFX_VERSION=11.0.0 + OLLAMA_HOST=192.168.1.135:11434 + OLLAMA_ORIGINS="app://obsidian.md*" + OLLAMA_MAX_LOADED_MODELS=0 + ollama serve & + cd .. +fi +# Access ollama (API, not a web UI) at http://192.168.1.135:11434 *or* https://ollama-api.jafner.net + +cd open-webui +docker compose up -d +cd .. +# Access open-webui at http://localhost:8080 *or* https://openwebui.jafner.net + +cd SillyTavern +./start.sh & +cd .. +# Access SillyTavern at http://localhost:5000 *or* https://sillytavern.jafner.net + +cd text-generation-webui +./start_linux.sh --api --flash-attn & +cd .. +# Access text-generation-webui at http://localhost:7860 *or* https://oobabooga.jafner.net \ No newline at end of file