RAG with text-embedding-3-large + Qwen Plus

Launch embedding model + value-tier Chinese chat model for searchable documents.

OpenAI Embedding 3 Large Qwen3 Plus· 6 min read

Architecture

Documents → text-embedding-3-large vectors → store in any vector DB → at query time embed the question → top-k retrieval → feed to Qwen Plus for the answer.

Embed your corpus

Use the OpenAI Python SDK with SeaLink's base URL. text-embedding-3-large returns 3072-dim vectors and is the launch embedding path.

embed_corpus.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.sealink.asia/v1",
    api_key="<your-sealink-key>",
)

docs = ["Document 1...", "เอกสาร 2...", "文档 3..."]
res = client.embeddings.create(model="text-embedding-3-large", input=docs)

# Each embedding has 3072 dimensions
vectors = [d.embedding for d in res.data]
# Now insert into pgvector / Qdrant / Weaviate / etc.

Answer with Qwen Plus

After retrieving top-k chunks, build a prompt and call Qwen Plus. It handles Chinese / English / SEA languages natively.

answer.py

def answer(question, retrieved_chunks):
    context = "\n\n".join(retrieved_chunks)
    prompt = f"""Answer the question using only the context below. If the answer isn't there, say so.

Context:
{context}

Question: {question}
Answer:"""

    resp = client.chat.completions.create(
        model="qwen3-plus",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=600,
    )
    return resp.choices[0].message.content

Cost ballpark

10K queries / month × (300 tokens RAG context + 200 tokens answer) on Qwen Plus ≈ $1.30. text-embedding-3-large adds about $0.13 per million input tokens.

Next steps