Operations
This guide covers the main production tasks for kassette: inspecting runs, debugging failures, forking runs, and cleaning up old journals.
Agent observability
kassette is designed so an agent can inspect a run directly. The agent reads the journal and can answer:
- What already happened?
- Where did the run suspend or fail?
- Where can I safely fork?
The bundled skill
The repo ships an agent skill at skills/kassette/SKILL.md. Use it as the operational guide for agents working with kassette. It covers storage discovery, journal basics, jq patterns, status interpretation, suspend/resume debugging, and safe forking. Use the skill together with the CLI. In practice, agents are the main expected users of the CLI.
The CLI
@usekassette/cli ships a kassette command that works uniformly against both file: and s3:// urls.
kassette --storage <url> <verb> [args]
--storage accepts the same URL formats the workflow API uses through LocalStorage and RemoteStorage:
| Scheme | Form | Backend |
|---|---|---|
file: | file:<path> (relative or absolute) | LocalStorage |
s3:// | s3://<bucket>[/<prefix>] | RemoteStorage via @usekassette/s3 |
@usekassette/s3 is optional. Install it alongside @usekassette/cli if you want to use s3:// urls.
The CLI has four verbs:
| Verb | Form | Output |
|---|---|---|
list | list [--limit N] | One {"runId":"..."} per line |
status | status <runId> | One RunStatus JSON object |
dump | dump <runId> [--offset N] | Journal entries as JSONL, one per line |
fork | fork <srcRunId> (--from-offset N | --from-step <id>) [--new-run-id <id>] | One {"runId":"<new>"} |
fork is the only verb that writes anything. It will create a new run, but never modifies the source run. You must pass exactly one fork point:
--from-offset N--from-step <id>
--from-step cuts at the offset of the first step entry with that stepId.
# Inspect storage
kassette --storage file:.kassette list # one runId per line
kassette --storage file:.kassette status <runId> # JSON status object
kassette --storage file:.kassette dump <runId> # JSONL journal entries
kassette --storage file:.kassette dump <runId> --offset 7 # entries from offset 7
# Fork
kassette --storage file:.kassette fork <runId> --from-offset 7
kassette --storage s3://bucket fork <runId> --from-step llm#3
Reading a journal with jq
kassette dump emits JSONL, so you can pipe it into jq:
# Step sequence with offsets
kassette --storage file:.kassette dump <runId> | jq '{offset, type, stepId}'
# Just step results
kassette --storage file:.kassette dump <runId> | jq 'select(.type == "step") | {stepId, result}'
# Count entries by type
kassette --storage file:.kassette dump <runId> | jq -r .type | sort | uniq -c
# Session boundaries (each `start` is a new session)
kassette --storage file:.kassette dump <runId> | jq 'select(.type == "start") | {offset, session}'
For programmatic access, use storage.readAll(runId). It returns every journal entry with its offset:
const storage = new LocalStorage('.kassette');
const entries = await storage.readAll('run-abc');
for (const entry of entries) {
if (entry.type === 'step') {
console.log(`${entry.stepId} → ${JSON.stringify(entry.result)}`);
}
}
runStatus(entries) transforms the journal into a single status object. See Reference for the shape.
Interpreting run status
kassette status <runId> reads the journal and returns one status object. A run has one of five states.
Apply these rules in order:
- If the last entry is
complete, the run is completed. The previousstepentry’sresultis usually the agent’s final output. - If the last entry is
error, the run is failed. Readmessage,name,stack, and the laststepentry before the error. - If the last entry is
cancel, the run is cancelled. Ifreasonis"suspend_timeout_expired", the suspend deadline passed beforeresume()was called. - Otherwise, walk backward through the journal and skip
startentries. If the first non-startentry issuspend, the run is suspended.waitingFortells you which event must arrive before the run can continue. Skippingstartmatters because a crash during resume can write astartentry after thesuspend; that does not change the run’s state. - Otherwise, the run is unsettled. It may be running, crashed, abandoned, or empty. The journal alone cannot tell the difference. Use an out-of-band liveness signal if you need to know.
Debugging a stuck run
A stuck run is a run that should have finished but has not.
- Run
kassette status <runId>. - If the run is suspended, check why the resume event has not arrived. The journal shows the
waitingForevent name. Trace that event through your webhook handler, queue consumer, scheduler, or dispatcher. - If the run is unsettled and no process is active, the process probably crashed before writing a terminal entry. Call
start()again with the same run ID. kassette will open a new session, replay the journal, and continue from the first unfinished step. - If the run is unsettled and a process is active, wait or inspect that process. With
LocalStorage, a.lockfile with a live PID means a session is running. A lock with a dead PID will be reclaimed by the next process. - If the run is failed or cancelled, it is terminal. Read the journal for the reason. To try again, fork from before the failure or start a fresh run.
Debugging a failed run
An error entry includes message, name, and stack.
To understand the failure, read the last step entry before the error. That shows what the agent had already done when it failed. Then search the workflow code for the next step to see what would have run next.
To retry part of the run, fork from before the failure, fix the code or input, and continue from the fork. See Versioning to decide whether the change needs a version bump.
Forking workflows
Forking is useful for any long run, but it matters most for AI agents. Agent runs can be expensive, and re-running from the beginning may not reproduce the same behavior because LLM calls can return different results.
Forking lets you avoid that. It copies the journal up to a chosen point and starts a new run from there. Completed steps replay from the journal so you only pay for LLM calls after the fork.
How it works
A programmatic fork() copies journal entries before the cut into a new run. It also handles session numbering and removes terminal entries such as complete, error, and cancel.
import { fork, start } from '@usekassette/core';
const newRunId = await fork(storage, 'run-abc', { fromOffset: 13 });
const session = await start(storage, newRunId);
// replays memoized steps (offsets 0–12), then goes live from offset 13
The workflow API exposes the same operation on kassette:
const result = await agent.fork({ runId: 'run-abc', fromOffset: 13 });
You can also fork from the CLI:
kassette --storage file:.kassette fork run-abc --from-offset 13
# {"runId":"<new-fork-runId>"}
Choosing the fork point
Inspect the journal to find where things went wrong:
kassette --storage file:.kassette dump run-abc | jq '{offset, type, stepId}'
{"offset":0,"type":"start","stepId":null}
{"offset":1,"type":"step","stepId":"llm"}
{"offset":2,"type":"step","stepId":"tool:lookup_order"}
{"offset":3,"type":"step","stepId":"llm#2"}
{"offset":4,"type":"step","stepId":"tool:check_refund_policy"}
{"offset":5,"type":"step","stepId":"llm#3"}
{"offset":6,"type":"step","stepId":"tool:process_refund"} <-- fork here?
{"offset":7,"type":"step","stepId":"llm#4"}
{"offset":8,"type":"step","stepId":"tool:send_confirmation_email"}
{"offset":9,"type":"complete"}
If the agent sent the wrong email at offset 8, fork from offset 7 to re-run the LLM call that decided what to send:
kassette --storage file:.kassette fork run-abc --from-offset 7
Or fork from offset 5 to let the LLM decide again after seeing the refund policy:
kassette --storage file:.kassette fork run-abc --from-offset 5
Caveats
The agent code must not have changed between the original run and the fork. Replay depends on deterministic call order. If the code changed, step IDs may bind memoized results to the wrong calls. Pass version on start to catch this as a VersionMismatchError. See Versioning.
Comparing forked runs
Two runs forked from the same point share the same journal prefix. Diff the runs to see where they diverged.
diff <(kassette --storage file:.kassette dump run-fork-1 | jq '{offset,type,stepId}') \
<(kassette --storage file:.kassette dump run-fork-2 | jq '{offset,type,stepId}')
This compares the journal structure (offsets, entry types, and step IDs). To compare the actual step results, including LLM responses, diff only the step entries:
diff <(kassette --storage file:.kassette dump run-fork-1 | jq 'select(.type == "step") | {stepId, result}') \
<(kassette --storage file:.kassette dump run-fork-2 | jq 'select(.type == "step") | {stepId, result}')
Using journals as test fixtures
A journal can act as a deterministic replay fixture. Copy a journal into your test suite, then run the agent against it. kassette replays the recorded steps and makes no LLM calls.
import { copyFile } from 'node:fs/promises';
// Set up: copy a known-good journal into the test storage directory
await copyFile('fixtures/happy-path.jsonl', '.kassette-test/test-run.jsonl');
const storage = new LocalStorage('.kassette-test');
const session = await start(storage, 'test-run');
// The agent replays all memoized steps — zero LLM calls, deterministic output
const result = await agentLoop(session);
assert.equal(result, expectedOutput);
To create a fixture from a real run:
cp .kassette/run-abc.jsonl test/fixtures/happy-path.jsonl
To test a specific failure path, copy only the entries before the failure:
head -n 8 .kassette/run-abc.jsonl > test/fixtures/partial-run.jsonl
Now the test will replay the first 7 memoized steps before reaching the first missing step. Stub the LLM at that point, return a controlled response, and assert what the agent does next.
Retention and cleanup
kassette does not delete journals for you. After a run completes, its journal stays in storage until you remove it. This is intentional since the journal is the audit trail.
Storage grows with every run. A typical journal is small, but total usage is unbounded. Set a retention policy before storage growth becomes a problem.
LocalStorage
For local storage, use a cron job or scheduled task to delete old journals:
# Delete journals older than 30 days
find .kassette -name '*.jsonl' -mtime +30 -delete
find .kassette -name '*.lock' -mtime +30 -delete
If you only want to delete completed, failed, or cancelled runs, scan runs with kassette list and check each run with kassette status before deleting it:
kassette --storage file:.kassette list \
| xargs -I{} sh -c 'kassette --storage file:.kassette status {} | jq --arg id {} ...'
RemoteStorage (S3)
For S3 (or S3-compatible like R2/GCS), use a lifecycle policy on the bucket or prefix where journals are stored:
{
"Rules": [
{
"ID": "kassette-journal-retention",
"Status": "Enabled",
"Filter": { "Prefix": "kassette/" },
"Expiration": { "Days": 90 }
}
]
}
Choosing a retention window
Choose a retention window based on these needs:
- Replay. Do not expire suspended runs that may still resume. Keep journals longer than your longest expected suspend.
- Audit. A journal records what the agent did. Keep terminal journals as long as your compliance, support, or postmortem process needs them.
- Forking. Forks need the original journal. If you use forks for debugging or backtracking, keep terminal journals through that debugging window.
If you need long-term history but do not want it in active storage, archive before deleting. Use kassette dump to copy journals to colder storage, then expire them from the active path.