Cancel an in-flight stream
Abort a turn mid-stream. The agent stops; partial usage is not billed.
Long-running agent turns (research, deck generation, multi-step reasoning) sometimes need to be aborted — user changed their mind, request was misrouted, you want to free a session for a higher-priority message. Cancel cleanly stops the work and skips billing for the partial output.
You'll need the response_id
The {rid} in the cancel URL is the response_id for one specific turn (distinct from the session id). You get it from the original POST that started the stream:
- The
X-Response-IdHTTP response header — available before any SSE data - The
data.idfield on the first SSE event (message.createdfor sessions,response.createdfor/v1/responses)
Capture it as soon as the response headers come in so you can fire the cancel without waiting for the stream to drain. See Resume for the full discoverability story.
Endpoints
POST /v1/sessions/{id}/responses/{rid}/cancel
Cancel an in-flight session turn.
POST /v1/responses/{rid}/cancel
Cancel an in-flight stateless /v1/responses stream.
Both share the same response shape:
{"response_id": "msg_...", "status": "cancelling"}Status codes:
| Code | Meaning |
|---|---|
202 Accepted | Cancellation initiated. The open SSE stream (and any reconnect) will see message.cancelled (sessions) or response.cancelled (stateless) as the next/last event. |
200 OK | Turn already finished — {"status": "already_complete"}. Idempotent no-op. |
404 Not Found | rid unknown or buffer evicted (>120s past completion). |
What happens after cancel
- Server flips an internal cancel flag on the buffer.
- The agent's next CPU/IO yield observes the flag and unwinds. (Inflight HTTP/inference calls are cancelled best-effort.)
- The buffer's terminal event becomes
message.cancelled/response.cancelledinstead ofmessage.completed/response.completed. - No billing for the partial turn. Cancelled streams skip the post-stream usage record + credit debit.
Idempotency
Calling cancel twice is safe:
- First call on an in-flight turn:
202 cancelling. - Second call (after the first one's effect propagates):
200 already_complete. - Subsequent calls in the grace window:
200 already_complete. - After grace expiry:
404 not found.
Sessions: cancel vs. session destroy
| Cancel | DELETE session | |
|---|---|---|
| Effect | Abort current turn; session stays alive | Destroy session entirely (kill process, free workspace) |
| Use when | "User changed their mind about this prompt" | "Done with this conversation" |
| Workspace files | Preserved | Wiped (workspace mount is per-session) |
| Next message? | Yes — same session_id | No — session is gone |
Python example
import os, httpx, threading, time
API, KEY = os.environ["POKEE_API"], os.environ["POKEE_KEY"]
H = {"Authorization": f"Bearer {KEY}"}
with httpx.Client(base_url=API, headers=H, timeout=None) as c:
sid = c.post("/v1/sessions", json={"persistent": True}).json()["id"]
rid_ready = threading.Event()
rid_holder = {}
def reader():
with c.stream("POST", f"/v1/sessions/{sid}/messages",
json={"message": "Write a 5000-word essay on cryptography"}) as r:
# Capture rid as soon as response headers land — well before
# the first SSE event. This is the canonical pattern.
rid_holder["rid"] = r.headers["X-Response-Id"]
rid_ready.set()
for line in r.iter_lines():
if line.startswith("event: message.cancelled"):
print("got cancellation terminal")
return
t = threading.Thread(target=reader)
t.start()
rid_ready.wait(timeout=5)
rid = rid_holder["rid"]
print(f"cancelling {rid}")
time.sleep(5)
r = c.post(f"/v1/sessions/{sid}/responses/{rid}/cancel")
print(r.status_code, r.json())
t.join()Pairs well with
- Resume — clients that reconnect to an in-flight turn after disconnect see
message.cancelledif you fired cancel during the gap. - Idempotency keys — a client that cancels and then wants to retry a brand-new turn should use a fresh idempotency key (the cancelled one is bound to the original
response_id).