SeaLink
← Docs

Concepts

For people new to AI APIs. 30-second skim, then decide which to dig into.

Model

The underlying AI brain that reads your prompt and writes a reply.

  • Different models have different strengths: Claude is strong on long context and code; GPT and Gemini cover multimodal work; Qwen, DeepSeek, Kimi, and GLM cover Chinese and SEA language tasks.
  • SeaLink covers 10 model ecosystems so you don't have to commit to one vendor.

When it matters: When you decide cost vs. quality vs. speed for your use case.

Token

How models count words. ~4 English chars = 1 token. ~1.5 Chinese chars = 1 token.

  • "Hello world" = ~3 tokens. "你好世界" = ~3 tokens. A typical chat reply might be 200-500 tokens.
  • Both your input and the model's output are billed in tokens. Output usually costs 3-5x more than input.
  • SeaLink shows tokens-per-call estimates everywhere — try /tools/tokenizer.

When it matters: Whenever you forecast monthly cost or hit a model's context limit.

Context window

How many tokens a model can read in one shot.

  • GPT-4o-mini: 128K tokens (~300 pages). Claude Sonnet: 200K. Kimi K2: 1M (~2000 pages).
  • If your input + history + reply would exceed it, the model returns 413. Switch to a longer-context model or trim history.

When it matters: When you summarize long documents, do whole-codebase analysis, or run multi-turn agents.

API key

A secret string that authenticates your app with SeaLink.

  • Looks like sk-sealink-... — treat it like a password. Don't commit to git, don't paste in chat.
  • Each SeaLink account can have multiple keys with different model whitelists, monthly budgets, and expiry dates.
  • If a key leaks: rotate it from /dashboard/keys. The old key is revoked immediately; update your app to use the new key before rotating.

When it matters: Every time you ship code that calls SeaLink.

RAG (Retrieval-Augmented Generation)

Pattern: retrieve relevant docs first, then ask the model to answer using them as context.

  • Step 1: embed your documents into vectors with text-embedding-3-large.
  • Step 2: at query time, embed the question, find top-k similar chunks.
  • Step 3: send chunks + question to a chat model (e.g. Qwen Plus).
  • Why use it: cheaper than fine-tuning, more current than the model's training data, and the model can cite sources.

When it matters: When you want the model to answer based on YOUR documents, not its general knowledge.

Tool use / Function calling

Letting the model call your code (search DB, send email, query API) instead of just answering text.

  • You declare what tools the model can use. The model decides if/when to call them and returns the call's arguments.
  • You execute the call (in your code), pass the result back, and the model continues.
  • All major SeaLink models support this with identical syntax — see /docs/function-calling.

When it matters: Building an agent, a customer-support bot that books appointments, or anything that mixes natural language with structured actions.

Streaming

Receive the reply token-by-token as it's generated, instead of waiting for the whole thing.

  • Cuts perceived latency from 3-10s (long reply) to ~200-500ms first token. Users see text appear live, like ChatGPT.
  • Enable with stream:true in your request. Use OpenAI SDK's async iterator pattern.

When it matters: User-facing chat UIs. Don't bother for short replies (<100 tokens) or background batch jobs.

Prompt caching

Reuse a long fixed prompt across many calls — pay 10% on cached tokens instead of full price.

  • Common use: a 12K-token system prompt called 10K times/month. Without cache: ~$375/month. With cache (90% hit): ~$56/month.
  • Anthropic and OpenAI both support it. SeaLink passes the headers through transparently.

When it matters: Customer-support bots, RAG with fixed system prompt, agents that loop over the same context.