



What I do.
01
On-prem AI
LLMs running on your GPUs.
- vLLM and Ollama deployments
- RAG with local vector stores
- Fine-tuning on your hardware
- OpenAI-compatible routing
- GPU monitoring and capacity planning
02
Private cloud and hybrid infra
The cluster underneath the AI.
- Talos Linux and Kubernetes clusters
- FluxCD GitOps pipelines
- Proxmox virtualisation and GPU passthrough
- Grafana, Prometheus, Loki monitoring
- Self-hosted GitLab and CI runners
03
Custom apps
Software for what off-the-shelf cannot do.
- FastAPI services and internal tools
- Next.js dashboards and admin panels
- n8n workflow automation
- Self-hosted SaaS replacements
- API integrations between your existing tools
The actual stack.
The exact tools matter.
Inference
vLLM, Ollama, LMCache
Models
Mistral, MiniMax, Qwen
Orchestration
Talos Linux, Kubernetes, FluxCD, Helm
Virtualisation
Proxmox, GPU passthrough, Cloud-init
Automation
OpenTofu, Ansible, n8n
CI/CD
Self-hosted GitLab, GitLab CI
Observability
Grafana, Prometheus, Loki, ClickHouse
App layer
FastAPI, Next.js, Postgres, Redis
Privacy
Erebus PII filter, Headscale
In practice.
01
Agents on your own knowledge
AI that knows your docs, codebase, and processes. Runs locally, nothing leaves the network.
02
Chat UIs for client-facing teams
LibreChat or OpenWebUI with PII filtering and audit logs.
03
Document processing pipelines
Extract, classify, summarise. Runs on a single GPU box.
04
Custom Kubernetes platforms
Talos from scratch, FluxCD for everything.
05
GPU monitoring
Grafana dashboards that show what is happening, with alerts before a model OOMs.
06
Sovereign AI for EU clients
Hosted inside the EU, on your hardware or colo.