PokeePokee Enterprise API

Cancel an in-flight stream

Abort a turn mid-stream. The agent stops; partial usage is not billed.

Long-running agent turns (research, deck generation, multi-step reasoning) sometimes need to be aborted — user changed their mind, request was misrouted, you want to free a session for a higher-priority message. Cancel cleanly stops the work and skips billing for the partial output.

You'll need the response_id

The {rid} in the cancel URL is the response_id for one specific turn (distinct from the session id). You get it from the original POST that started the stream:

  • The X-Response-Id HTTP response header — available before any SSE data
  • The data.id field on the first SSE event (message.created for sessions, response.created for /v1/responses)

Capture it as soon as the response headers come in so you can fire the cancel without waiting for the stream to drain. See Resume for the full discoverability story.

Endpoints

POST /v1/sessions/{id}/responses/{rid}/cancel

Cancel an in-flight session turn.

POST /v1/responses/{rid}/cancel

Cancel an in-flight stateless /v1/responses stream.

Both share the same response shape:

{"response_id": "msg_...", "status": "cancelling"}

Status codes:

CodeMeaning
202 AcceptedCancellation initiated. The open SSE stream (and any reconnect) will see message.cancelled (sessions) or response.cancelled (stateless) as the next/last event.
200 OKTurn already finished — {"status": "already_complete"}. Idempotent no-op.
404 Not Foundrid unknown or buffer evicted (>120s past completion).

What happens after cancel

  1. Server flips an internal cancel flag on the buffer.
  2. The agent's next CPU/IO yield observes the flag and unwinds. (Inflight HTTP/inference calls are cancelled best-effort.)
  3. The buffer's terminal event becomes message.cancelled / response.cancelled instead of message.completed / response.completed.
  4. No billing for the partial turn. Cancelled streams skip the post-stream usage record + credit debit.

Idempotency

Calling cancel twice is safe:

  • First call on an in-flight turn: 202 cancelling.
  • Second call (after the first one's effect propagates): 200 already_complete.
  • Subsequent calls in the grace window: 200 already_complete.
  • After grace expiry: 404 not found.

Sessions: cancel vs. session destroy

CancelDELETE session
EffectAbort current turn; session stays aliveDestroy session entirely (kill process, free workspace)
Use when"User changed their mind about this prompt""Done with this conversation"
Workspace filesPreservedWiped (workspace mount is per-session)
Next message?Yes — same session_idNo — session is gone

Python example

import os, httpx, threading, time

API, KEY = os.environ["POKEE_API"], os.environ["POKEE_KEY"]
H = {"Authorization": f"Bearer {KEY}"}

with httpx.Client(base_url=API, headers=H, timeout=None) as c:
    sid = c.post("/v1/sessions", json={"persistent": True}).json()["id"]

    rid_ready = threading.Event()
    rid_holder = {}

    def reader():
        with c.stream("POST", f"/v1/sessions/{sid}/messages",
                      json={"message": "Write a 5000-word essay on cryptography"}) as r:
            # Capture rid as soon as response headers land — well before
            # the first SSE event. This is the canonical pattern.
            rid_holder["rid"] = r.headers["X-Response-Id"]
            rid_ready.set()
            for line in r.iter_lines():
                if line.startswith("event: message.cancelled"):
                    print("got cancellation terminal")
                    return

    t = threading.Thread(target=reader)
    t.start()

    rid_ready.wait(timeout=5)
    rid = rid_holder["rid"]
    print(f"cancelling {rid}")
    time.sleep(5)
    r = c.post(f"/v1/sessions/{sid}/responses/{rid}/cancel")
    print(r.status_code, r.json())
    t.join()

Pairs well with

  • Resume — clients that reconnect to an in-flight turn after disconnect see message.cancelled if you fired cancel during the gap.
  • Idempotency keys — a client that cancels and then wants to retry a brand-new turn should use a fresh idempotency key (the cancelled one is bound to the original response_id).

On this page