v1.15.0 · Stable Apache 2.0 · Open Source 12 on GitHub
Watch it fix itself

Kubernetes incidents, resolved.

One resolution engine, two modes. Kobi fixes it with your click; Autopilot fixes it hands-free.

87s median time-to-resolve
24/7 agent on shift
0-touch in Autopilot
cluster / prod-eu-west Monitoring

The spine — How it resolves

One engine.
You choose the control.

The same resolution engine across the board. The only thing that changes is who pushes the button — you, or the cluster itself.

Kobi (Copilot)
Who acts Kobi + you
Who approves You click Execute
Speed Seconds to a fix

Same detect → diagnose → remediate engine. Only the control model changes.

01

Assisted resolution · Available today

Kobi resolves.
Teaches. And never leaves you stuck.

Press ⌘ J and describe what's wrong. Kobi queries your live cluster through 17+ tools, finds the root cause, and proposes the exact fix — you click Execute. It explains every step, and when a fix is beyond what it can run, it hands you the precise console commands.

01 · Resolves

From alert to fix, with your click

Kobi finds the root cause and proposes the exact command — rollback, scale, patch limits. You approve; it executes with full RBAC and an audit entry. The human stays in the loop.

02 · Teaches

Shows its work, every time

Each diagnosis comes with the reasoning and the commands behind it. Your team levels up on Kubernetes — not just on KubeBolt. Onboarding and incident response in one.

03 · Guides to console

Never a dead end

When a fix falls outside what Kobi can execute, it doesn't shrug — it hands you the exact step-by-step to run by hand, so you're never blocked.

04 · Anywhere

UI, IDE, CLI via MCP

Kobi exposes a Model Context Protocol server. Plug it into Cursor or Claude Code and your assistant gets live, read-aware cluster context without leaving the editor.

Multi-model

Bring your own model

For interactive, human-approved work, Kobi lets you pick your provider: Anthropic Claude, OpenAI GPT, xAI Grok, or any OpenAI-compatible endpoint (self-hosted Llama / Qwen / Mistral via vLLM, Ollama, LM Studio). Set one as fallback on 429 / 5xx. Prompt caching keeps the bill predictable. (Autopilot, acting unsupervised, runs Claude exclusively.)

  • → Anthropic · Claude Sonnet / Opus / Haiku
  • → OpenAI · GPT-5 / o-series
  • → xAI · Grok 4 family
  • → OpenAI-compatible · self-hosted, custom gateways

Anywhere via MCP

Cursor. Claude Code. Your terminal.

Kobi exposes a Model Context Protocol server. Plug it into your IDE and your AI assistant gets read access to your cluster — without leaving the editor.

# claude code
/mcp add kubebolt
02

Autonomous resolution

MVP · Early access

Autopilot.
The same engine. Hands-free. Auditable.

Autopilot wakes only when something matters — a crash loop spreads, an SLO budget burns, a node degrades. It opens a session, gathers context, decides, and acts. In our MVP it already resolves real incidents end-to-end in under 90 seconds, at about $0.30 each — and we've demoed it live. Built on the Claude Agent SDK with multi-region failover, rolling out with KubeBolt Cloud.

< 90s to resolve
$0.30 per incident
100% auditable
01

Root-cause analysis

When an incident fires, Autopilot launches a long-running investigation session. It correlates events, logs, deploys, and prior incidents — then writes a verdict you can audit.

02

Guided remediation

Proposes a patch, runs it through a deterministic executor with policy guardrails, and rolls back if SLO breaches. You approve once; it remembers.

03

Postmortems on autopilot

Generates a draft postmortem from the incident timeline. Action items linked to PRs. Five-whys included. Edit, don't write from scratch.

04

Cluster lifecycle ops

Schedules power-on / power-off across EKS, AKS, GKE, OpenShift and node pools. Pays for KubeBolt itself in most environments.

Why Claude exclusively — for Autopilot

For unsupervised action we run Claude exclusively, with multi-region failover across Anthropic API, AWS Bedrock and Google Vertex AI. Autonomous remediation demands the single most reliable model — no router, no model zoo, no ambiguity in a postmortem. Kobi, with you in the loop, is where bring-your-own-model lives.

03

Module — Cluster Lifecycle

Roadmap · 2026 · Business / Enterprise

The module
that will pay for itself.

Most clusters run 24/7 even though humans don't. KubeBolt's lifecycle module will schedule power-on / power-off across your fleet — preview environments, dev clusters, weekend downtime — and track the savings. Shipping with the KubeBolt Business tier in 2026.

Typical savings

~65%

on non-production cluster spend, when scheduled to nights & weekends with smart warm-up before working hours.

The infrastructure savings often exceed the KubeBolt subscription itself.

Supported platforms

  • EKS
  • AKS
  • GKE
  • OpenShift
  • Generic node pools

Per-cluster schedules, weekday / weekend rules, manual override, audit log of every power transition. Designed not to interrupt CI runs in flight.

04

Architecture — Determinism first

L1 + L5 partial · L2–L4, L6 roadmap 2026

Skills.
The cheapest LLM call
is the one you don't make.

A Skill is a declarative, deterministic diagnostic routine — a recipe for a known failure pattern. L1 Detectors ships as the 24-rule Insights Engine below. L5 Executor ships partially as the deterministic write-ops surface above (Set image, Set resources, Secret reveal with audit, Apply manifest). The remaining layers land with Autopilot in 2026.

L1

SHIPPED

Detectors

Deterministic. No AI.

L2

ROADMAP

Router

Haiku-class. Triage.

L3

ROADMAP

Investigator

Sonnet-class. RCA.

L4

ROADMAP

Planner

Sonnet / Opus.

L5

PARTIAL

Executor

Deterministic. Guardrails.

L6

ROADMAP

Postmortem

Sonnet / Opus.

05

Insights Engine

24 rules. Zero configuration.

Continuous evaluation against proven heuristics. Actionable recommendations, not raw PromQL. Each rule is a Skill — so the engine works even before any model is configured.

Crash loop detected Critical
OOM killed Critical
Image pull backoff Critical
Node not ready Critical
Zero replicas Critical
Helm release failed Critical
CPU throttle risk Warning
Memory pressure Warning
HPA maxed out Warning
Certificate expiring Warning
ArgoCD app not healthy Warning
+13 more rules Full catalog →
06

Module — Cluster operations

Available today

Operate.
Don't just observe.

Every kubectl-equivalent verb is a click. Set image, scale, drain a node, edit env, reveal a secret, apply a fresh manifest — all from the dashboard with RBAC enforcement on every endpoint, an audit log entry on every mutation, and a governance switch to scope or turn off write actions. No terminal needed.

01

Workload edits

Set image, set CPU/memory requests and limits, set env (literal, ConfigMap ref, Secret ref), edit labels and annotations. Strategic-merge patches with init-container support and limit-≥-request guards.

02

Secret reveal with audit

Mandatory operator reason (≥10 chars) before any reveal. Production-namespace gate escalates to Admin. Per-key 60s auto-hide. Audit log records who/what/when/why — never the values.

03

Node maintenance

Cordon, uncordon, and drain with live SSE progress as pods are evicted. Cancel mid-flight. Same toolbar parity from Node detail and the Nodes list.

04

Apply new manifest

Paste YAML or pick from 18 starter templates per kind. CodeMirror editor with pre-flight validation: kind / apiVersion / namespace consistency, single-document guard, status / managedFields auto-stripped.

05

Revision diff & rollback

Every rollout's History tab shows a side-by-side YAML diff — what changed, line by line — whether it came from kubectl, GitOps, or KubeBolt. Roll back to either revision in one click.

06

Helm releases, end to end

See every Helm release in the cluster — chart, values, rendered manifest, revision history, and dependencies — decoded straight from Helm's own Secrets. No SDK, no terminal.

07

Surface — Operator UI

See your cluster.
In motion.

A real-time map of every pod, service, and request path. Mini-dashboards on every resource. Traffic flow you can actually watch. Built for engineers who'd rather see the topology than read another kubectl get.

01

30+ resource views

Pods, Deployments, Services, Ingresses, ConfigMaps, Secrets, Jobs, Nodes — every kind gets a purpose-built view with the right columns, the right actions, the right detail panes.

02

Live, not polled

Every list, graph, and edge is driven by Watch streams over the K8s API. State changes show up in milliseconds — no F5, no stale rows, no surprise.

03

⌘K everything

Type to jump to any resource across any namespace. Fuzzy match across kinds. Recent context pinned. Works the same as your editor — because that's how you already think.

04

Three lenses on the dashboard

Overview for at-a-glance health, Capacity for right-sizing recommendations and top consumers, Reliability for L7 error rates, hot-spots, and Network Drops via Hubble.

08

Stack

Lightweight by design.

Source

Kubernetes

  • API Server
  • Metrics Server
  • OTel Collector (forked)

Core

KubeBolt Engine

  • Go · Auth · RBAC
  • BoltDB embedded
  • Insights Engine · 24 rules
  • MCP servers

Surfaces

Where you work

  • Web UI · 30+ views
  • Slack · Discord · Email
  • Cursor · Claude Code (MCP)
Go 1.25+ · client-go · BoltDB · Anthropic API · OpenAI API · OpenTelemetry · Hubble flows · Model Context Protocol · React 18 · TypeScript · Go 1.25+ · client-go · BoltDB · Anthropic API · OpenAI API · OpenTelemetry · Hubble flows · Model Context Protocol · React 18 · TypeScript ·

Why KubeBolt

Most tools watch.
KubeBolt operates.

Dashboards, metrics and alerts are table stakes — everyone has them. The difference is what happens after the alert fires.

Table stakes — everyone has it

What a dashboard does

  • Dashboards and resource visualization
  • Metric and log aggregation
  • Alerting when something breaks
  • A chat box bolted onto the cluster
Only KubeBolt

What no dashboard has

  • Autonomous remediation with guardrails It doesn't just page you — it fixes the incident, with operator-grade policy and a full audit trail.
  • A fleet of specialized agents Networking, storage, workloads, security — each layer watched and diagnosed end-to-end, not one generic bot.
  • The whole incident lifecycle Detect → diagnose → remediate → verify → postmortem. One loop, closed automatically.
  • Your API server stays private The agent dials out; nothing comes in. Works on private GKE/EKS, behind bastions, air-gapped — no VPN, no exposed cluster.
09

Install

One command.
Any cluster.

OCI chart on GHCR. Configurable RBAC, Ingress, auth, resources.

helm install kubebolt \
  oci://ghcr.io/clm-cloud-solutions/kubebolt/helm/kubebolt

kubectl port-forward svc/kubebolt 3000:80

Need the full reference? Read the docs →

Roadmap — What's coming

Not a finished product. A living one.

KubeBolt ships continuously. Here's what runs today and where it's heading — so you know exactly what you're joining.

01 Available now

Resolution, today

  • Kobi Copilot — assisted resolution with audited actions
  • kubectl operations + Helm visibility via agent
  • 24-rule Insights Engine, zero config
02 In beta

Going autonomous

  • Autopilot MVP — autonomous remediation
  • Predictive failure detection
  • Cost optimization insights
03 Coming soon

Enterprise-ready

  • Hosted Cloud — Autopilot + Lifecycle
  • SSO, advanced RBAC, audit trail
  • Multi-cluster fleet management
04 Research

The frontier

  • Multi-step remediation across services
  • Specialized sub-agents per domain
  • Predictive auto-tuning

Plans — One place to choose your path

Free forever,
or fully hosted.

The open-source agent is free forever under Apache 2.0. KubeBolt Cloud adds hosted Autopilot, Lifecycle Management and team SSO — join the waitlist for founding-customer pricing.

OSS · Self-Hosted Available today

Free forever. No caps.

Apache 2.0. You operate the infra. KubeBolt operates nothing.

  • Clusters unlimited
  • Nodes / pods unlimited
  • Users unlimited
  • Retention your disk
  • AI Copilot BYOK · unlimited
  • Support GitHub · Community
Install in 60 seconds →

Early access

Be first on the
commercial cloud.

The open-source agent is free forever. KubeBolt Cloud — hosted Autopilot, Lifecycle Management, and team SSO — launches in 2026. Join the waitlist for early access and founding-customer pricing.

10+ already on the waitlist 12 stars on GitHub

No spam. Unsubscribe with one click. We share product updates only.