← Docs
Streaming
Stream tokens over Server-Sent Events. Cuts perceived latency from 3-10s to ~200-500ms TTFT.
Examples
Python
from openai import OpenAIclient = OpenAI(base_url="https://api.sealink.asia/v1", api_key="<your-sealink-key>")stream = client.chat.completions.create(model="claude-sonnet-4-6",messages=[{"role": "user", "content": "Stream a 200-word reply."}],stream=True,)for chunk in stream:delta = chunk.choices[0].delta.contentif delta:print(delta, end="", flush=True)print()
Node.js
import OpenAI from "openai";const client = new OpenAI({baseURL: "https://api.sealink.asia/v1",apiKey: process.env.SEALINK_API_KEY,});const stream = await client.chat.completions.create({model: "claude-sonnet-4-6",messages: [{ role: "user", content: "Stream a 200-word reply." }],stream: true,});for await (const chunk of stream) {const delta = chunk.choices[0]?.delta?.content;if (delta) process.stdout.write(delta);}
cURL (-N keeps stream open)
curl https://api.sealink.asia/v1/chat/completions \-H "Authorization: Bearer $SEALINK_API_KEY" \-H "Content-Type: application/json" \-N \-d '{"model": "claude-sonnet-4-6","messages": [{"role":"user","content":"Stream please"}],"stream": true}'
When to enable streaming
- Chat / customer-support UIs — users see text appear live
- Code generation — Cursor / Claude Code style
- Long generation (>500 tokens)
When not to stream
- When you need to parse complete JSON (function calling / structured output)
- Short replies (<100 tokens) — streaming overhead exceeds benefit
- Background batch jobs with no UI watching
Common pitfalls
- Proxies / nginx / Cloudflare buffer SSE by default. SeaLink disables buffering server-side but verify your stack passes through.
- Forgetting to close the stream leaks connections — use try/finally or an async iterator that auto-closes.
- Usage is finalized at stream end. OpenAI SDK emits usage on the last frame (set stream_options={"include_usage": true}).