History

Joey Hafner 6086222503 Reorganize root level of repo. - Move homelab, Jafner.dev (now called blog) to root. - Rename "archived projects" -> "archive" - Rename "active projects" -> "projects" - Rename "jafner-homebrew" -> "5ehomebrew" - Rename "docker-llm-amd" -> "local-ai"		2024-07-16 12:17:55 -07:00
..
modelfiles	Reorganize root level of repo.	2024-07-16 12:17:55 -07:00
README.md	Reorganize root level of repo.	2024-07-16 12:17:55 -07:00
sync-modelfiles	Reorganize root level of repo.	2024-07-16 12:17:55 -07:00

README.md

Ollama Notes

Per: Ollama/Ollama README

Install Steps

Per: linux.md

Download the binary: sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
Make the binary executable: sudo chmod +x /usr/bin/ollama
Create a user for ollama: sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
Create a SystemD service file for ollama: sudo nano /etc/systemd/system/ollama.service and populate it with the following.

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
Environment='HSA_OVERRIDE_GFX_VERSION=11.0.0 OLLAMA_HOST=192.168.1.135:11434 OLLAMA_ORIGINS="app://obsidian.md*" OLLAMA_MAX_LOADED_MODELS=0'
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3

[Install]
WantedBy=default.target

Register and enable the ollama service: sudo systemctl daemon-reload && sudo systemctl enable ollama
Start ollama: sudo systemctl start ollama

Enable ROCm Support

Per: anvesh.jhuboo on Medium

Add user to video group to allow access to GPU resources: sudo usermod -aG video $LOGNAME
Install rocminfo package: sudo dnf install rocminfo
Check for rocm support: rocminfo
Install rocm-opencl package: sudo dnf install rocm-opencl
Install rocm-clinfo package: sudo dnf install rocm-clinfo
Verify opencl is working: rocm-clinfo
Get the GFX version of your GPU: rocminfo | grep gfx | head -n 1 | tr -s ' ' | cut -d' ' -f 3
- The GFX version given is a stripped version number.
- My Radeon 7900 XTX has a gfx string of gfx1100, which correlates with HSA GFX version 11.0.0.
- Other cards commonly have a string of gfx1030, which correlates with HSA GFS version 10.3.0.
- There's a little bit more info here.
Export your gfx version in ~/.bashrc: echo "export HSA_OVERRIDE_GFX_VERSION=11.0.0" >> ~/.bashrc && source ~/.bashrc

Is this even part of the same thing? I ran sudo dnf install https://repo.radeon.com/amdgpu-install/6.0.2/rhel/9.3/amdgpu-install-6.0.60002-1.el9.noarch.rpm
Maybe this is the right place to look? Fedora wiki - AMD ROCm

Run Ollama

Test Ollama is working: ollama run gemma:2b
- Runs (downloads) the smallest model in Ollama's library.
Run as a docker container: docker run -d --device /dev/kfd --device /dev/dri -v /usr/lib64:/opt/lib64:ro -e HIP_PATH=/opt/lib64/rocm -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama && docker logs -f ollama

Update Ollama

Redownload the binary: sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
Make the binary executable: sudo chmod +x /usr/bin/ollama

Create Model from Modelfile

ollama create <model name> -f <modelfile relative path> Where the modelfile is like:

# Choose either a model tag to download from ollama.com/library, or a path to a local model file (relative to the path of the modelfile).
FROM ../Models/codellama-7b.Q8_0.gguf

# set the chatml template passed to the model
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are a senior devops engineer, acting as an assistant. You offer help with cloud technologies like: Terraform, AWS, kubernetes, python. You answer with code examples when possible
"""

# not sure what this does lol
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"

Unload a Model

There's no official support for this in the ollama CLI, but we can make it happen with the API: curl https://api.ollama.jafner.net/api/generate -d '{"model": "<MODEL TO UNLOAD>", "keep_alive": 0}'