The spine — How it resolves

One engine.
You choose the control.

The same resolution engine across the board. The only thing that changes is who pushes the button — you, or the cluster itself.

Kobi (Copilot)

Who acts Kobi + you

Who approves You click Execute

Speed Seconds to a fix

Same detect → diagnose → remediate engine. Only the control model changes.

01

Assisted resolution · Available today

Kobi resolves.
Teaches. And never leaves you stuck.

Press ⌘ J and describe what's wrong. Kobi queries your live cluster through 17+ tools, finds the root cause, and proposes the exact fix — you click Execute. It explains every step, and when a fix is beyond what it can run, it hands you the precise console commands.

01 · Resolves

From alert to fix, with your click

Kobi finds the root cause and proposes the exact command — rollback, scale, patch limits. You approve; it executes with full RBAC and an audit entry. The human stays in the loop.

02 · Teaches

Shows its work, every time

Each diagnosis comes with the reasoning and the commands behind it. Your team levels up on Kubernetes — not just on KubeBolt. Onboarding and incident response in one.

03 · Guides to console

Never a dead end

When a fix falls outside what Kobi can execute, it doesn't shrug — it hands you the exact step-by-step to run by hand, so you're never blocked.

04 · Anywhere

UI, IDE, CLI via MCP

Kobi exposes a Model Context Protocol server. Plug it into Cursor or Claude Code and your assistant gets live, read-aware cluster context without leaving the editor.

Kobi · KubeBolt's Copilot ANTHROPIC · CLAUDE-SONNET-4-6

My deployment demo/demo-web just rolled out and pods are crashing. What's going on?

🤖

New pods are crashing. Checking logs from the CrashLoopBackOff pod to find the root cause.

🤖

Clear diagnosis. Here's the summary:

🔴 Problem: New deploy broken (Revision 21)

What's happening?

The RollingUpdate is in progress but the new pods from revision 21 (demo-web-b7c7c876b) are failing immediately on startup. The container log confirms it:

CODE ⎘ Copy

bad deploy

The container exits with error and Kubernetes restarts it continuously → CrashLoopBackOff.

Current state

REPLICASETREVISIONREADYSTATUS

demo-web-b7c7c876b21 (new)0/2🔴 CrashLoopBackOff

demo-web-7d746ccd4520 (prev)3/3✅ Running

Good news: thanks to RollingUpdate, the 3 pods from revision 20 are still active and serving traffic. The service is not fully down.

✅ Recommended fix: Rollback to revision 20

The fastest and safest action is to revert to the previous state (revision 20), which was working correctly.

🔧

KOBI PROPOSES MEDIUM RISK

Roll back deployment demo/demo-web from revision 21 to revision 20

deployments·demo/demo-web ↗

Revision 21 (ReplicaSet demo-web-b7c7c876b) is failing with "bad deploy" — new pods enter CrashLoopBackOff immediately. Revision 20 (demo-web-7d746ccd45) has 3 healthy pods serving traffic. Reverting eliminates the failing pods and restores the stable state.

fromRevision: 21 toRevision: 20

🤖

Confirm the rollback above to revert the deployment to revision 20. Once executed, Kubernetes will eliminate the failing pods and the deployment will be fully stable.

💡 Next step: Before redeploying, investigate what changed in revision 21 (config, entrypoint, env vars) that causes the container to exit with "bad deploy".

AI can make mistakes. Verify important information before acting on it.
⌘+Enter to send · ⌘J to toggle

Multi-model

Bring your own model

For interactive, human-approved work, Kobi lets you pick your provider: Anthropic Claude, OpenAI GPT, xAI Grok, or any OpenAI-compatible endpoint (self-hosted Llama / Qwen / Mistral via vLLM, Ollama, LM Studio). Set one as fallback on 429 / 5xx. Prompt caching keeps the bill predictable. (Autopilot, acting unsupervised, runs Claude exclusively.)

→ Anthropic · Claude Sonnet / Opus / Haiku
→ OpenAI · GPT-5 / o-series
→ xAI · Grok 4 family
→ OpenAI-compatible · self-hosted, custom gateways

Anywhere via MCP

Cursor. Claude Code. Your terminal.

Kobi exposes a Model Context Protocol server. Plug it into your IDE and your AI assistant gets read access to your cluster — without leaving the editor.

# claude code
/mcp add kubebolt

02

Autonomous resolution

MVP · Early access

Autopilot.
The same engine. Hands-free. Auditable.

Autopilot wakes only when something matters — a crash loop spreads, an SLO budget burns, a node degrades. It opens a session, gathers context, decides, and acts. In our MVP it already resolves real incidents end-to-end in under 90 seconds, at about $0.30 each — and we've demoed it live. Built on the Claude Agent SDK with multi-region failover, rolling out with KubeBolt Cloud.

< 90s to resolve

$0.30 per incident

100% auditable

autopilot · operación autónoma LIVE · session 4f8a

● CRITICAL AUTONOMOUS

Image Pull Failed

Pod/image-app · ns autopilot-demo

1m 10sresolved

296ktokens

$0.335 stages

✓ Triage Haiku 4.5 $0.027

rule image-pull-backoff · severity critical · should route yes

Pod can't pull its container image (ImagePullBackOff), blocking the workload. Routing for investigation into image tag, registry creds, and network.
✓ Investigate 97% confidence Sonnet 4.6 $0.073

Deployment image-app was rolled to revision 116 with a non-existent tag nginx:does-not-exist-2099 — Docker Hub returns a hard 404. Revision 115 still serves 1/1, so there's no full outage.

Docker Hub 404 ImagePullBackOff ×3 rev 115 available
✓ Plan 2 actions · high risk Sonnet 4.6 $0.033

#1 rollback_deployment HIGH RISK APPROVAL
image-app → revision 115 (nginx:1.27-alpine)

#2 verify_pods_ready LOW RISK
confirm pods Running/Ready after rollback
✓ Execute Sonnet 4.6 $0.055

✓ rollback_deployment succeeded · toRevision 115 · 5.3s

✓ verify_pods_ready succeeded · 128ms

verification ✓ passed · all pods Running nginx:1.27-alpine
✓ Postmortem Opus 4.7 $0.144

image-app ImagePullBackOff from non-existent tag. Bad image-tag deployment, not a registry or network fault. Autopilot rolled back to rev 115; pods returned Ready and the rollout converged in ~5s. User-visible impact: a stuck rollout, no outage.

TIMELINE live

✓ TRIA

✓ INVE

✓ PLAN

✓ EXEC

✓ POST

7:58:58 · run

incident detected · run started
7:58:58 · message

rule image-pull-backoff → route
7:59:11 · tool call

get_events image-app
7:59:11 · tool result

Docker Hub 404 · ImagePullBackOff ×3
7:59:23 · message

evidence conclusive · 97% confidence
7:59:36 · message

2 actions · risk=high
7:59:55 · message

executing: rollback first, then verify
7:59:56 · tool call

rollback_deployment
7:59:56 · approval

awaiting approval
8:00:01 · approval

operator approved
8:00:01 · tool result

rollback_deployment · 5.3s
8:00:05 · tool result

verify_pods_ready · 128ms
8:00:08 · message

both actions completed · pods healthy
8:00:08 · message

post-incident report generated
8:00:49 · run

run.completed

01

Root-cause analysis

When an incident fires, Autopilot launches a long-running investigation session. It correlates events, logs, deploys, and prior incidents — then writes a verdict you can audit.

02

Guided remediation

Proposes a patch, runs it through a deterministic executor with policy guardrails, and rolls back if SLO breaches. You approve once; it remembers.

03

Postmortems on autopilot

Generates a draft postmortem from the incident timeline. Action items linked to PRs. Five-whys included. Edit, don't write from scratch.

04

Cluster lifecycle ops

Schedules power-on / power-off across EKS, AKS, GKE, OpenShift and node pools. Pays for KubeBolt itself in most environments.

Why Claude exclusively — for Autopilot

For unsupervised action we run Claude exclusively, with multi-region failover across Anthropic API, AWS Bedrock and Google Vertex AI. Autonomous remediation demands the single most reliable model — no router, no model zoo, no ambiguity in a postmortem. Kobi, with you in the loop, is where bring-your-own-model lives.

03

Module — Cluster Lifecycle

Roadmap · 2026 · Business / Enterprise

The module
that will pay for itself.

Most clusters run 24/7 even though humans don't. KubeBolt's lifecycle module will schedule power-on / power-off across your fleet — preview environments, dev clusters, weekend downtime — and track the savings. Shipping with the KubeBolt Business tier in 2026.

Typical savings

~65%

on non-production cluster spend, when scheduled to nights & weekends with smart warm-up before working hours.

The infrastructure savings often exceed the KubeBolt subscription itself.

Supported platforms

→ EKS
→ AKS
→ GKE
→ OpenShift
→ Generic node pools

Per-cluster schedules, weekday / weekend rules, manual override, audit log of every power transition. Designed not to interrupt CI runs in flight.

04

Architecture — Determinism first

L1 + L5 partial · L2–L4, L6 roadmap 2026

Skills.
The cheapest LLM call
is the one you don't make.

A Skill is a declarative, deterministic diagnostic routine — a recipe for a known failure pattern. L1 Detectors ships as the 24-rule Insights Engine below. L5 Executor ships partially as the deterministic write-ops surface above (Set image, Set resources, Secret reveal with audit, Apply manifest). The remaining layers land with Autopilot in 2026.

L1

SHIPPED

Detectors

Deterministic. No AI.

L2

ROADMAP

Router

Haiku-class. Triage.

L3

ROADMAP

Investigator

Sonnet-class. RCA.

L4

ROADMAP

Planner

Sonnet / Opus.

L5

PARTIAL

Executor

Deterministic. Guardrails.

L6

ROADMAP

Postmortem

Sonnet / Opus.

05

Insights Engine

24 rules. Zero configuration.

Continuous evaluation against proven heuristics. Actionable recommendations, not raw PromQL. Each rule is a Skill — so the engine works even before any model is configured.

Crash loop detected Critical

OOM killed Critical

Image pull backoff Critical

Node not ready Critical

Zero replicas Critical

Helm release failed Critical

CPU throttle risk Warning

Memory pressure Warning

HPA maxed out Warning

Certificate expiring Warning

ArgoCD app not healthy Warning

+13 more rules Full catalog →

06

Module — Cluster operations

Available today

Operate.
Don't just observe.

Every kubectl-equivalent verb is a click. Set image, scale, drain a node, edit env, reveal a secret, apply a fresh manifest — all from the dashboard with RBAC enforcement on every endpoint, an audit log entry on every mutation, and a governance switch to scope or turn off write actions. No terminal needed.

demo · demo-web DEPLOYMENT

Set image

Container

web

Current image ghcr.io/clm-cloud-solutions/web:1.4.2

New image ghcr.io/clm-cloud-solutions/web:1.5.0

tracing-agent INIT

Current image ghcr.io/clm-cloud-solutions/tracing:0.3.1

New image ghcr.io/clm-cloud-solutions/tracing:0.4.0

Rollout history

#23 active 2s ago
#22 v1.4.2 14m ago
#21 v1.4.1 1h ago
#20 v1.4.0 3h ago
#19 v1.3.4 1d ago

ROLLOUT IN PROGRESS 2 of 3 pods updated

Audit log Logged · Editor role · 2s ago

01

Workload edits

Set image, set CPU/memory requests and limits, set env (literal, ConfigMap ref, Secret ref), edit labels and annotations. Strategic-merge patches with init-container support and limit-≥-request guards.

02

Secret reveal with audit

Mandatory operator reason (≥10 chars) before any reveal. Production-namespace gate escalates to Admin. Per-key 60s auto-hide. Audit log records who/what/when/why — never the values.

03

Node maintenance

Cordon, uncordon, and drain with live SSE progress as pods are evicted. Cancel mid-flight. Same toolbar parity from Node detail and the Nodes list.

04

Apply new manifest

Paste YAML or pick from 18 starter templates per kind. CodeMirror editor with pre-flight validation: kind / apiVersion / namespace consistency, single-document guard, status / managedFields auto-stripped.

05

Revision diff & rollback

Every rollout's History tab shows a side-by-side YAML diff — what changed, line by line — whether it came from kubectl, GitOps, or KubeBolt. Roll back to either revision in one click.

06

Helm releases, end to end

See every Helm release in the cluster — chart, values, rendered manifest, revision history, and dependencies — decoded straight from Helm's own Secrets. No SDK, no terminal.

07

Surface — Operator UI

See your cluster.
In motion.

A real-time map of every pod, service, and request path. Mini-dashboards on every resource. Traffic flow you can actually watch. Built for engineers who'd rather see the topology than read another kubectl get.

kind-kubebolt-dev

LIVE · 2 nodes

◉

ingress-nginx-controller

SERVICE

⬢

ingress-nginx-controller-7f8

POD

◉

demo-web

SERVICE

⬢

demo-web-8d657b-1

POD

⬢

demo-web-8d657b-2

POD

⬢

demo-web-8d657b-3

POD

32.43 ev/s FORWARDED

HTTP 9.24 req/s

2xx 4.77/s

4xx 4.48/s

avg latency 0.4 ms

01

30+ resource views

Pods, Deployments, Services, Ingresses, ConfigMaps, Secrets, Jobs, Nodes — every kind gets a purpose-built view with the right columns, the right actions, the right detail panes.

02

Live, not polled

Every list, graph, and edge is driven by Watch streams over the K8s API. State changes show up in milliseconds — no F5, no stale rows, no surprise.

03

⌘K everything

Type to jump to any resource across any namespace. Fuzzy match across kinds. Recent context pinned. Works the same as your editor — because that's how you already think.

04

Three lenses on the dashboard

Overview for at-a-glance health, Capacity for right-sizing recommendations and top consumers, Reliability for L7 error rates, hot-spots, and Network Drops via Hubble.

08

Stack

Lightweight by design.

Source

Kubernetes

API Server
Metrics Server
OTel Collector (forked)

Core

KubeBolt Engine

Go · Auth · RBAC
BoltDB embedded
Insights Engine · 24 rules
MCP servers

Surfaces

Where you work

Web UI · 30+ views
Slack · Discord · Email
Cursor · Claude Code (MCP)

Go 1.25+ · client-go · BoltDB · Anthropic API · OpenAI API · OpenTelemetry · Hubble flows · Model Context Protocol · React 18 · TypeScript · Go 1.25+ · client-go · BoltDB · Anthropic API · OpenAI API · OpenTelemetry · Hubble flows · Model Context Protocol · React 18 · TypeScript ·

Why KubeBolt

Most tools watch.
KubeBolt operates.

Dashboards, metrics and alerts are table stakes — everyone has them. The difference is what happens after the alert fires.

Table stakes — everyone has it

What a dashboard does

Dashboards and resource visualization
Metric and log aggregation
Alerting when something breaks
A chat box bolted onto the cluster

Only KubeBolt

What no dashboard has

Autonomous remediation with guardrails It doesn't just page you — it fixes the incident, with operator-grade policy and a full audit trail.
A fleet of specialized agents Networking, storage, workloads, security — each layer watched and diagnosed end-to-end, not one generic bot.
The whole incident lifecycle Detect → diagnose → remediate → verify → postmortem. One loop, closed automatically.
Your API server stays private The agent dials out; nothing comes in. Works on private GKE/EKS, behind bastions, air-gapped — no VPN, no exposed cluster.

09

Install

One command.
Any cluster.

OCI chart on GHCR. Configurable RBAC, Ingress, auth, resources.

helm install kubebolt \
  oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt

kubectl port-forward svc/kubebolt 3000:80

Single container with embedded frontend. Multi-arch (amd64/arm64).

docker run -p 3000:3000 \
  -v ~/.kube:/root/.kube:ro \
  ghcr.io/clm-cloud-solutions/kubebolt:latest

Single executable, embedded UI, one port. Auto-loads .env.

curl -LO https://github.com/clm-cloud-solutions/kubebolt/releases/latest/download/kubebolt-darwin-arm64
chmod +x kubebolt-darwin-arm64
./kubebolt-darwin-arm64 --kubeconfig ~/.kube/config

Tap and install. Automatic version management.

brew install clm-cloud-solutions/tap/kubebolt
kubebolt --kubeconfig ~/.kube/config

Need the full reference? Read the docs →

Roadmap — What's coming

Not a finished product. A living one.

KubeBolt ships continuously. Here's what runs today and where it's heading — so you know exactly what you're joining.

01 Available now

Resolution, today

Kobi Copilot — assisted resolution with audited actions
kubectl operations + Helm visibility via agent
24-rule Insights Engine, zero config

02 In beta

Going autonomous

Autopilot MVP — autonomous remediation
Predictive failure detection
Cost optimization insights

03 Coming soon

Enterprise-ready

Hosted Cloud — Autopilot + Lifecycle
SSO, advanced RBAC, audit trail
Multi-cluster fleet management

04 Research

The frontier

Multi-step remediation across services
Specialized sub-agents per domain
Predictive auto-tuning

Plans — One place to choose your path

Free forever,
or fully hosted.

The open-source agent is free forever under Apache 2.0. KubeBolt Cloud adds hosted Autopilot, Lifecycle Management and team SSO — join the waitlist for founding-customer pricing.

OSS · Self-Hosted Available today

Free forever. No caps.

Apache 2.0. You operate the infra. KubeBolt operates nothing.

Clusters unlimited
Nodes / pods unlimited
Users unlimited
Retention your disk
AI Copilot BYOK · unlimited
Support GitHub · Community

Install in 60 seconds →

SaaS · Cloud Coming 2026

Free hosted tier. Caps included.

We operate the control plane. You only deploy the agent. Lead-magnet limits.

Clusters 2
Nodes / pods 10 / 150
Active users 3 (hard cap)
Retention 15 days
AI credits / mo 500 (cutoff)
Webhooks · custom rules 3 · 3

Join the waitlist →

Early access

Be first on the
commercial cloud.

The open-source agent is free forever. KubeBolt Cloud — hosted Autopilot, Lifecycle Management, and team SSO — launches in 2026. Join the waitlist for early access and founding-customer pricing.

10+ already on the waitlist 12 stars on GitHub

No spam. Unsubscribe with one click. We share product updates only.

Kubernetes incidents, resolved.

One engine. You choose the control.

Kobi resolves. Teaches. And never leaves you stuck.

From alert to fix, with your click

Shows its work, every time

Never a dead end

UI, IDE, CLI via MCP

Bring your own model

Cursor. Claude Code. Your terminal.

Autopilot. The same engine. Hands-free. Auditable.

Root-cause analysis

Guided remediation

Postmortems on autopilot

Cluster lifecycle ops

The module that will pay for itself.

Skills. The cheapest LLM call is the one you don't make.

24 rules. Zero configuration.

Operate. Don't just observe.

Workload edits

Secret reveal with audit

Node maintenance

Apply new manifest

Revision diff & rollback

Helm releases, end to end

See your cluster. In motion.

30+ resource views

Live, not polled

⌘K everything

Three lenses on the dashboard

Lightweight by design.

Most tools watch. KubeBolt operates.

What a dashboard does

What no dashboard has

One command. Any cluster.

Not a finished product. A living one.

Resolution, today

Going autonomous

Enterprise-ready

The frontier

Free forever, or fully hosted.

Free forever. No caps.

Free hosted tier. Caps included.

Be first on the commercial cloud.

One engine.
You choose the control.

Kobi resolves.
Teaches. And never leaves you stuck.

Autopilot.
The same engine. Hands-free. Auditable.

The module
that will pay for itself.

Skills.
The cheapest LLM call
is the one you don't make.

Operate.
Don't just observe.

See your cluster.
In motion.

Most tools watch.
KubeBolt operates.

One command.
Any cluster.

Free forever,
or fully hosted.

Be first on the
commercial cloud.