Model, cluster, application.

Three layers. One engineer. No vendor maze.

GPU 1GPU 2GPU 3GPU 4
Infrastructure first. Then the model. Then the interface.

What I do.

01

On-prem AI

LLMs running on your GPUs.

  • vLLM and Ollama deployments
  • RAG with local vector stores
  • Fine-tuning on your hardware
  • OpenAI-compatible routing
  • GPU monitoring and capacity planning
02

Private cloud and hybrid infra

The cluster underneath the AI.

  • Talos Linux and Kubernetes clusters
  • FluxCD GitOps pipelines
  • Proxmox virtualisation and GPU passthrough
  • Grafana, Prometheus, Loki monitoring
  • Self-hosted GitLab and CI runners
03

Custom apps

Software for what off-the-shelf cannot do.

  • FastAPI services and internal tools
  • Next.js dashboards and admin panels
  • n8n workflow automation
  • Self-hosted SaaS replacements
  • API integrations between your existing tools

The actual stack.

The exact tools matter.

Inference
vLLM, Ollama, LMCache
Models
Mistral, MiniMax, Qwen
Orchestration
Talos Linux, Kubernetes, FluxCD, Helm
Virtualisation
Proxmox, GPU passthrough, Cloud-init
Automation
OpenTofu, Ansible, n8n
CI/CD
Self-hosted GitLab, GitLab CI
Observability
Grafana, Prometheus, Loki, ClickHouse
App layer
FastAPI, Next.js, Postgres, Redis
Privacy
Erebus PII filter, Headscale

In practice.

01

Agents on your own knowledge

AI that knows your docs, codebase, and processes. Runs locally, nothing leaves the network.

02

Chat UIs for client-facing teams

LibreChat or OpenWebUI with PII filtering and audit logs.

03

Document processing pipelines

Extract, classify, summarise. Runs on a single GPU box.

04

Custom Kubernetes platforms

Talos from scratch, FluxCD for everything.

05

GPU monitoring

Grafana dashboards that show what is happening, with alerts before a model OOMs.

06

Sovereign AI for EU clients

Hosted inside the EU, on your hardware or colo.

Got something to build?

Tell me what you need.

Get in touch