Cheap research agent: Kimi for context, Sonnet for synthesis

Read 50 PDFs with Kimi K2 ($0.6/1M in), distill key points, then have Claude Sonnet write the executive summary.

Why two models

Long-context models are cheap per token but generate worse executive prose. Premium models cost 5-25x more per output token but write much better. Split the workload.

Step 1 — Kimi reads everything

Concatenate up to ~800K tokens of PDF text into Kimi's 1M-token context. Ask for structured key points (JSON).

extract.py

corpus = "\n\n--- PDF BOUNDARY ---\n\n".join(pdf_texts)

extract = client.chat.completions.create(
    model="kimi-k2",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": (
                "Read all the PDFs separated by '--- PDF BOUNDARY ---'. "
                "Return JSON: {\"docs\": [{\"title\": ..., \"key_points\": [...]}]}"
            ),
        },
        {"role": "user", "content": corpus},
    ],
    max_tokens=4000,
)

Step 2 — Sonnet synthesizes

Pass Kimi's JSON to Claude Sonnet 4.6. Ask for an executive summary in your house style.

synthesize.py

summary = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {
            "role": "system",
            "content": (
                "Write a one-page executive summary. Tone: direct, no jargon, "
                "no marketing language. Lead with the single most important "
                "finding. Cite sources by title."
            ),
        },
        {"role": "user", "content": extract.choices[0].message.content},
    ],
    max_tokens=1200,
)
print(summary.choices[0].message.content)

Cost

50 PDFs × 16K tokens = 800K input. Kimi K2 input: $0.48. Kimi output (4K): $0.01. Sonnet input (4K): $0.012. Sonnet output (1.2K): $0.018. Total ≈ $0.52 / report. Run 100/month for $52.

Next steps