Embeddable durability primitives

Retry agent workflows without repeating work.

kassette is a tiny, zero-dependency TypeScript library that makes agentic workflows durable. It journals completed steps, then replays them on retry after a crash, timeout, or redeploy.

Docs

What it does

It makes retries safe.

Replay finished steps

On retry, kassette replays the journal and returns recorded results instead of calling the model or tool again.
Wait without running

Suspend for a human, webhook, or CI job. The process exits, then replay continues when the event arrives.
Use your existing retry system

kassette is a library, not a runtime. Your queue, job runner, or webhook re-invokes the same runId.
Keep state in a plain journal

Each run is a readable JSONL journal on a filesystem or object store. It is the state, audit trail, and resume point.

Example

Normal async code, durable steps.

import { kassette, LocalStorage } from '@usekassette/kassette';

const agent = kassette(async (ctx, ticket) => {
  const analysis = await ctx.step('analyze', () =>
    llm.chat('Diagnose this issue and recommend a fix', { ticket })
  );

  if (analysis.destructive) {
    // process can exit here; resume from anywhere via agent.resume()
    const approval = await ctx.suspend('human-approval');
    if (!approval.approved) return { outcome: 'skipped', reason: approval.notes };
  }

  const result = await ctx.step('apply-fix', () =>
    executeTool(analysis.suggestedAction)
  );
  return { outcome: 'resolved', result };
}, { storage: new LocalStorage('.kassette') });

await agent.start(ticket);

.kassette/INC-4821.jsonl

{"type":"start","session":1,"offset":0,"timestamp":"2026-05-08T14:21:03Z","metadata":{"ticket":{"id":"INC-4821","title":"Pod crashing on startup"}}}

+ 5 more lines

{"type":"step","session":1,"offset":1,"timestamp":"2026-05-08T14:21:08Z","stepId":"analyze","name":"analyze","result":{"destructive":true,"suggestedAction":"restart-pod-7f3c","rationale":"OOM during init; restart releases stuck handle"}}

{"type":"suspend","session":1,"offset":2,"timestamp":"2026-05-08T14:21:08Z","reason":"Waiting for event: human-approval","waitingFor":"human-approval"}

{"type":"resume","session":2,"offset":3,"timestamp":"2026-05-08T14:47:12Z","eventName":"human-approval","value":{"approved":true,"notes":""}}

{"type":"step","session":2,"offset":4,"timestamp":"2026-05-08T14:47:14Z","stepId":"apply-fix","name":"apply-fix","result":{"ok":true,"podId":"7f3c"}}

{"type":"complete","session":2,"offset":5,"timestamp":"2026-05-08T14:47:14Z"}

Examples

Try the workflow closest to yours.

The shortest path through the docs.

Use it when

Skip the work you've already done.

Reach for kassette when the problem is not 'how do I run this again?' but 'how do I avoid doing the same work twice?' Your existing stack makes retries easy but not safe.

API reference GitHub README

Retry agent workflows without repeating work.

It makes retries safe.

Replay finished steps

Wait without running

Use your existing retry system

Keep state in a plain journal

Normal async code, durable steps.

Try the workflow closest to yours.

The shortest path through the docs.

Skip the work you've already done.