# Storage backends

kassette saves each run in a journal. kassette supports both writing to a filesystem or to an object store using either `LocalStorage` or `RemoteStorage`. In some cases, you might even need to implement custom storage using the [`Storage` interface](reference.md#storage-interface).

## Picking a backend

The choice of storage backend should be determined by where the run can resume rather than performance. That’s because in kassette's agentic workload &mdash; 10-100 sequential turns dominated by LLMs, tools, humans, webhooks, or timeouts, with small storage needs &mdash; neither storage throughput nor latency will be your bottleneck. For more on what kassette was designed for, see [Why kassette](why-kassette.md#the-agentic-workload).

| Scenario                                                                             |        | Backend                       |
| ------------------------------------------------------------------------------------ | ------ | ----------------------------- |
| Same machine recovers the run that started it?                                       | Yes    | `LocalStorage`                |
| Different machine, region, or runtime needs to resume?                               | Yes    | `RemoteStorage` (S3, R2, GCS) |
| Running on serverless (Lambda, Workers, Cloud Run jobs)?                             | Yes    | `RemoteStorage`               |
| Running on persistent infra (long-lived boxes, CI runners with a persistent volume)? | Either | `LocalStorage` preferred      |

For large entries, hundreds of steps, or journals that grow into multiple gigabytes, prefer `LocalStorage` when possible.

## LocalStorage

```typescript
new LocalStorage(dir: string)
```

Writes a JSONL file per run to a directory on disk: `{dir}/{runId}.jsonl`.

**Atomic appends.** Each entry is a single `write()` plus `fsync()` as one buffer so we never have torn writes.

**Single-writer fencing.** A spanning lockfile per `runId` (`{dir}/{runId}.lock`) created with atomic `link(2)`. PID-based stale detection will allow the next process reclaim a lock from a dead writer. The `start` entry itself is fenced before the lock span begins, so a process reclaiming a stale lock cannot open a session at a number that has already been superseded. The lock is held from `start` through the terminal entry so that no other process can write to the journal during a session.

**Scope.** Use `LocalStorage` when the next process can read the same directory and share the same lock semantics. (The important bit being that the lock uses local PID checks.)

**Performance.** Each append is an O(1) op because it writes only the new entry and `fsync`s it. The journal size mostly does not change the append cost which is determined by your disk and filesystem. This is the backend to use when the retrying process can read the same persistent filesystem. (It can handle very large journals.)

## RemoteStorage

```typescript
new RemoteStorage(client: ObjectStoreClient, options?: { prefix?: string })
```

Writes each run as a single object in S3-compatible storage: `{prefix}/{runId}.jsonl`. Use with `@usekassette/s3` for AWS S3, Cloudflare R2, and any other service that is S3-compatible.

**Atomic appends.** `PutObject` is atomic by its contract and each write is the full journal as one object.

**Single-writer fencing.** Conditional `PutObject` with `If-Match: <etag>` on the previous object's etag. A writer that loses the CAS race will re-read the journal and retry. If the re-read shows a higher session number, the writer throws `FencedError`. Two sessions can briefly coexist, but the older is fenced on its next append. At most one orphaned step may execute before the fence. For the full CAS and fencing design, see [Object storage design](object-storage-design.md).

**Required backend semantics.** Read-after-write consistency with conditional writes (`If-Match` and `If-None-Match`). S3, R2, and GCS qualify. Eventually-consistent or CDN-fronted stores break fencing. Two writers can pass the same conditional check against a stale etag and both commit, defeating the single-writer invariant.

**Performance.** Each append reads the current object and uploads the whole journal with the additional entry line. That makes recovery cheap since `readAll` is a single `GetObject` request, but the write cost will grow as the journal does. Total uploaded bytes across a run will end up as `O(N²)`. Write amplification is generally not a problem in agentic workflows.

## Custom backends

```typescript
interface Storage {
  append(runId: string, entry: Entry): Promise<number>;
  readAll(runId: string): Promise<JournalEntry[]>;
  list(): Promise<string[]>;
}
```

There's a very small contract to implement if you want to support a different remote backend that is not S3-compatible. For S3-compatible stores beyond what `@usekassette/s3` ships, implement [`ObjectStoreClient`](reference.md#objectstoreclient--getobjectresult--preconditionfailederror) and pass it to `RemoteStorage`. The interface signatures and full semantics live in [Reference](reference.md#storage-interface).

A correct implementation of `append` needs to be atomic, assign a monotonically increasing offset, and reject writes from superseded sessions by throwing a `FencedError`. `readAll` should return every entry in append order. `list` returns every `runId` present in storage.

## Retention

Neither of the two storage backends does any cleanup. Settled work stays settled so storage will always grow across runs. See [Operations](operations.md#retention-and-cleanup) for cleanup mechanics, eg, S3 lifecycle policies for `RemoteStorage` or a cron that sweeps for `LocalStorage`.