Streaming

Stream tokens over Server-Sent Events. Cuts perceived latency from 3-10s to ~200-500ms TTFT.

Examples

Python

from openai import OpenAI

client = OpenAI(base_url="https://api.sealink.asia/v1", api_key="<your-sealink-key>")

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Stream a 200-word reply."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.sealink.asia/v1",
  apiKey: process.env.SEALINK_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Stream a 200-word reply." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

cURL (-N keeps stream open)

curl https://api.sealink.asia/v1/chat/completions \
  -H "Authorization: Bearer $SEALINK_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role":"user","content":"Stream please"}],
    "stream": true
  }'

When to enable streaming

Chat / customer-support UIs — users see text appear live
Code generation — Cursor / Claude Code style
Long generation (>500 tokens)

When not to stream

When you need to parse complete JSON (function calling / structured output)
Short replies (<100 tokens) — streaming overhead exceeds benefit
Background batch jobs with no UI watching

Common pitfalls

Proxies / nginx / Cloudflare buffer SSE by default. SeaLink disables buffering server-side but verify your stack passes through.
Forgetting to close the stream leaks connections — use try/finally or an async iterator that auto-closes.
Usage is finalized at stream end. OpenAI SDK emits usage on the last frame (set stream_options={"include_usage": true}).