Operations

This guide covers the main production tasks for kassette: inspecting runs, debugging failures, forking runs, and cleaning up old journals.

Agent observability

kassette is designed so an agent can inspect a run directly. The agent reads the journal and can answer:

What already happened?
Where did the run suspend or fail?
Where can I safely fork?

The bundled skill

The repo ships an agent skill at skills/kassette/SKILL.md. Use it as the operational guide for agents working with kassette. It covers storage discovery, journal basics, jq patterns, status interpretation, suspend/resume debugging, and safe forking. Use the skill together with the CLI. In practice, agents are the main expected users of the CLI.

The CLI

@usekassette/cli ships a kassette command that works uniformly against both file: and s3:// urls.

kassette --storage <url> <verb> [args]

--storage accepts the same URL formats the workflow API uses through LocalStorage and RemoteStorage:

Scheme	Form	Backend
`file:`	`file:<path>` (relative or absolute)	`LocalStorage`
`s3://`	`s3://<bucket>[/<prefix>]`	`RemoteStorage` via `@usekassette/s3`

@usekassette/s3 is optional. Install it alongside @usekassette/cli if you want to use s3:// urls.

The CLI has four verbs:

Verb	Form	Output
`list`	`list [--limit N]`	One `{"runId":"..."}` per line
`status`	`status <runId>`	One `RunStatus` JSON object
`dump`	`dump <runId> [--offset N]`	Journal entries as JSONL, one per line
`fork`	`fork <srcRunId> (--from-offset N \| --from-step <id>) [--new-run-id <id>]`	One `{"runId":"<new>"}`

fork is the only verb that writes anything. It will create a new run, but never modifies the source run. You must pass exactly one fork point:

--from-offset N
--from-step <id>

--from-step cuts at the offset of the first step entry with that stepId.

# Inspect storage
kassette --storage file:.kassette list                       # one runId per line
kassette --storage file:.kassette status <runId>             # JSON status object
kassette --storage file:.kassette dump <runId>               # JSONL journal entries
kassette --storage file:.kassette dump <runId> --offset 7    # entries from offset 7

# Fork
kassette --storage file:.kassette fork <runId> --from-offset 7
kassette --storage s3://bucket fork <runId> --from-step llm#3

Reading a journal with jq

kassette dump emits JSONL, so you can pipe it into jq:

# Step sequence with offsets
kassette --storage file:.kassette dump <runId> | jq '{offset, type, stepId}'

# Just step results
kassette --storage file:.kassette dump <runId> | jq 'select(.type == "step") | {stepId, result}'

# Count entries by type
kassette --storage file:.kassette dump <runId> | jq -r .type | sort | uniq -c

# Session boundaries (each `start` is a new session)
kassette --storage file:.kassette dump <runId> | jq 'select(.type == "start") | {offset, session}'

For programmatic access, use storage.readAll(runId). It returns every journal entry with its offset:

const storage = new LocalStorage('.kassette');
const entries = await storage.readAll('run-abc');

for (const entry of entries) {
  if (entry.type === 'step') {
    console.log(`${entry.stepId} → ${JSON.stringify(entry.result)}`);
  }
}

runStatus(entries) transforms the journal into a single status object. See Reference for the shape.

Interpreting run status

kassette status <runId> reads the journal and returns one status object. A run has one of five states.

Apply these rules in order:

If the last entry is complete, the run is completed. The previous step entry’s result is usually the agent’s final output.
If the last entry is error, the run is failed. Read message, name, stack, and the last step entry before the error.
If the last entry is cancel, the run is cancelled. If reason is "suspend_timeout_expired", the suspend deadline passed before resume() was called.
Otherwise, walk backward through the journal and skip start entries. If the first non-start entry is suspend, the run is suspended. waitingFor tells you which event must arrive before the run can continue. Skipping start matters because a crash during resume can write a start entry after the suspend; that does not change the run’s state.
Otherwise, the run is unsettled. It may be running, crashed, abandoned, or empty. The journal alone cannot tell the difference. Use an out-of-band liveness signal if you need to know.

Debugging a stuck run

A stuck run is a run that should have finished but has not.

Run kassette status <runId>.
If the run is suspended, check why the resume event has not arrived. The journal shows thewaitingFor event name. Trace that event through your webhook handler, queue consumer, scheduler, or dispatcher.
If the run is unsettled and no process is active, the process probably crashed before writing a terminal entry. Call start() again with the same run ID. kassette will open a new session, replay the journal, and continue from the first unfinished step.
If the run is unsettled and a process is active, wait or inspect that process. With LocalStorage, a .lock file with a live PID means a session is running. A lock with a dead PID will be reclaimed by the next process.
If the run is failed or cancelled, it is terminal. Read the journal for the reason. To try again, fork from before the failure or start a fresh run.

Debugging a failed run

An error entry includes message, name, and stack.

To understand the failure, read the last step entry before the error. That shows what the agent had already done when it failed. Then search the workflow code for the next step to see what would have run next.

To retry part of the run, fork from before the failure, fix the code or input, and continue from the fork. See Versioning to decide whether the change needs a version bump.

Forking workflows

Forking is useful for any long run, but it matters most for AI agents. Agent runs can be expensive, and re-running from the beginning may not reproduce the same behavior because LLM calls can return different results.

Forking lets you avoid that. It copies the journal up to a chosen point and starts a new run from there. Completed steps replay from the journal so you only pay for LLM calls after the fork.

How it works

A programmatic fork() copies journal entries before the cut into a new run. It also handles session numbering and removes terminal entries such as complete, error, and cancel.

import { fork, start } from '@usekassette/core';

const newRunId = await fork(storage, 'run-abc', { fromOffset: 13 });
const session = await start(storage, newRunId);
// replays memoized steps (offsets 0–12), then goes live from offset 13

The workflow API exposes the same operation on kassette:

const result = await agent.fork({ runId: 'run-abc', fromOffset: 13 });

You can also fork from the CLI:

kassette --storage file:.kassette fork run-abc --from-offset 13
# {"runId":"<new-fork-runId>"}

Choosing the fork point

Inspect the journal to find where things went wrong:

kassette --storage file:.kassette dump run-abc | jq '{offset, type, stepId}'

{"offset":0,"type":"start","stepId":null}
{"offset":1,"type":"step","stepId":"llm"}
{"offset":2,"type":"step","stepId":"tool:lookup_order"}
{"offset":3,"type":"step","stepId":"llm#2"}
{"offset":4,"type":"step","stepId":"tool:check_refund_policy"}
{"offset":5,"type":"step","stepId":"llm#3"}
{"offset":6,"type":"step","stepId":"tool:process_refund"}   <-- fork here?
{"offset":7,"type":"step","stepId":"llm#4"}
{"offset":8,"type":"step","stepId":"tool:send_confirmation_email"}
{"offset":9,"type":"complete"}

If the agent sent the wrong email at offset 8, fork from offset 7 to re-run the LLM call that decided what to send:

kassette --storage file:.kassette fork run-abc --from-offset 7

Or fork from offset 5 to let the LLM decide again after seeing the refund policy:

kassette --storage file:.kassette fork run-abc --from-offset 5

Caveats

The agent code must not have changed between the original run and the fork. Replay depends on deterministic call order. If the code changed, step IDs may bind memoized results to the wrong calls. Pass version on start to catch this as a VersionMismatchError. See Versioning.

Comparing forked runs

Two runs forked from the same point share the same journal prefix. Diff the runs to see where they diverged.

diff <(kassette --storage file:.kassette dump run-fork-1 | jq '{offset,type,stepId}') \
     <(kassette --storage file:.kassette dump run-fork-2 | jq '{offset,type,stepId}')

This compares the journal structure (offsets, entry types, and step IDs). To compare the actual step results, including LLM responses, diff only the step entries:

diff <(kassette --storage file:.kassette dump run-fork-1 | jq 'select(.type == "step") | {stepId, result}') \
     <(kassette --storage file:.kassette dump run-fork-2 | jq 'select(.type == "step") | {stepId, result}')

Using journals as test fixtures

A journal can act as a deterministic replay fixture. Copy a journal into your test suite, then run the agent against it. kassette replays the recorded steps and makes no LLM calls.

import { copyFile } from 'node:fs/promises';

// Set up: copy a known-good journal into the test storage directory
await copyFile('fixtures/happy-path.jsonl', '.kassette-test/test-run.jsonl');

const storage = new LocalStorage('.kassette-test');
const session = await start(storage, 'test-run');

// The agent replays all memoized steps — zero LLM calls, deterministic output
const result = await agentLoop(session);
assert.equal(result, expectedOutput);

To create a fixture from a real run:

cp .kassette/run-abc.jsonl test/fixtures/happy-path.jsonl

To test a specific failure path, copy only the entries before the failure:

head -n 8 .kassette/run-abc.jsonl > test/fixtures/partial-run.jsonl

Now the test will replay the first 7 memoized steps before reaching the first missing step. Stub the LLM at that point, return a controlled response, and assert what the agent does next.

Retention and cleanup

kassette does not delete journals for you. After a run completes, its journal stays in storage until you remove it. This is intentional since the journal is the audit trail.

Storage grows with every run. A typical journal is small, but total usage is unbounded. Set a retention policy before storage growth becomes a problem.

LocalStorage

For local storage, use a cron job or scheduled task to delete old journals:

# Delete journals older than 30 days
find .kassette -name '*.jsonl' -mtime +30 -delete
find .kassette -name '*.lock' -mtime +30 -delete

If you only want to delete completed, failed, or cancelled runs, scan runs with kassette list and check each run with kassette status before deleting it:

kassette --storage file:.kassette list \
  | xargs -I{} sh -c 'kassette --storage file:.kassette status {} | jq --arg id {} ...'

RemoteStorage (S3)

For S3 (or S3-compatible like R2/GCS), use a lifecycle policy on the bucket or prefix where journals are stored:

{
  "Rules": [
    {
      "ID": "kassette-journal-retention",
      "Status": "Enabled",
      "Filter": { "Prefix": "kassette/" },
      "Expiration": { "Days": 90 }
    }
  ]
}

Choosing a retention window

Choose a retention window based on these needs:

Replay. Do not expire suspended runs that may still resume. Keep journals longer than your longest expected suspend.
Audit. A journal records what the agent did. Keep terminal journals as long as your compliance, support, or postmortem process needs them.
Forking. Forks need the original journal. If you use forks for debugging or backtracking, keep terminal journals through that debugging window.

If you need long-term history but do not want it in active storage, archive before deleting. Use kassette dump to copy journals to colder storage, then expire them from the active path.

Operations#

Agent observability#

The bundled skill#

The CLI#

Reading a journal with jq#

Interpreting run status#

Debugging a stuck run#

Debugging a failed run#

Forking workflows#

How it works#

Choosing the fork point#

Caveats#

Comparing forked runs#

Using journals as test fixtures#

Retention and cleanup#

LocalStorage#

RemoteStorage (S3)#

Choosing a retention window#