Model, cluster, application.

Three layers. No vendor maze.

What I do.

LLMs running on your GPUs.

The cluster underneath the AI.

Software for what off-the-shelf cannot do.

The exact tools matter.

Inference

vLLM, Ollama, LMCache

Models

Mistral, MiniMax, Qwen

Orchestration

Talos Linux, Kubernetes, FluxCD, Helm

Virtualisation

Proxmox, GPU passthrough, Cloud-init

Automation

OpenTofu, Ansible, n8n

CI/CD

Self-hosted GitLab, GitLab CI

Observability

Grafana, Prometheus, Loki, ClickHouse

App layer

FastAPI, Next.js, Postgres, Redis

Privacy

Erebus PII filter, Headscale

AI that knows your docs, codebase, and processes. Runs locally, nothing leaves the network.

LibreChat or OpenWebUI with PII filtering and audit logs.

Extract, classify, summarise. Runs on a single GPU box.

Talos from scratch, FluxCD for everything.

Grafana dashboards that show what is happening, with alerts before a model OOMs.

Hosted inside the EU, on your hardware or colo.

Tell me what you need.