SeaLink
← Docs

Streaming

Stream tokens over Server-Sent Events. Cuts perceived latency from 3-10s to ~200-500ms TTFT.

Examples

Python
from openai import OpenAI
client = OpenAI(base_url="https://api.sealink.asia/v1", api_key="<your-sealink-key>")
stream = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Stream a 200-word reply."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()
Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.sealink.asia/v1",
apiKey: process.env.SEALINK_API_KEY,
});
const stream = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: "Stream a 200-word reply." }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}
cURL (-N keeps stream open)
curl https://api.sealink.asia/v1/chat/completions \
-H "Authorization: Bearer $SEALINK_API_KEY" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "claude-sonnet-4-6",
"messages": [{"role":"user","content":"Stream please"}],
"stream": true
}'

When to enable streaming

  • Chat / customer-support UIs — users see text appear live
  • Code generation — Cursor / Claude Code style
  • Long generation (>500 tokens)

When not to stream

  • When you need to parse complete JSON (function calling / structured output)
  • Short replies (<100 tokens) — streaming overhead exceeds benefit
  • Background batch jobs with no UI watching

Common pitfalls

  • Proxies / nginx / Cloudflare buffer SSE by default. SeaLink disables buffering server-side but verify your stack passes through.
  • Forgetting to close the stream leaks connections — use try/finally or an async iterator that auto-closes.
  • Usage is finalized at stream end. OpenAI SDK emits usage on the last frame (set stream_options={"include_usage": true}).