

在各类业务场景中,模型推理请求常包含大量重复的前缀输入,例如系统指令、历史对话等。显式缓存(Context Cache)通过复用稳定的上下文,避免模型重复计算相同内容,从而显著降低推理开销。
这些场景的共性是:上下文较长、复用频次较高。通过缓存前缀计算结果,显式缓存可有效降低推理延迟与计算成本。
万擎面向用户提供三种显式缓存的能力
说明:缓存写入功能2026.2.2-2026.2.28期间限时免费,缓存命中价格为0.2元/百万token。
curl 'https://wanqing.streamlakeapi.com/api/gateway/v1/endpoints/chat/completions' \
-H "Authorization: Bearer $WQ_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "模型 id/推理点 id",
"messages": [ ... ],
"mode": "create",
"ttl": 3600
}'
// ...
{"cache_id":"d151a70f-1d72-4117-b52a-37083eef4853",
// ...
{"prompt_tokens":195,"total_tokens":XXX,"completion_tokens":XX
// ...
curl 'https://wanqing.streamlakeapi.com/api/gateway/v1/endpoints/chat/completions' \
-H "Authorization: Bearer $WQ_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"cache_id": "d151a70f-1d72-4117-b52a-37083eef4853",
"messages": [...],
"mode": "prefix"
}'
// ...
{"prompt_tokens":XXX,"total_tokens":XXX,"completion_tokens":XXX,"prompt_tokens_details":{"cached_tokens":194}
// ...
curl 'https://wanqing.streamlakeapi.com/api/gateway/v1/endpoints/chat/completions' \
-H "Authorization: Bearer $WQ_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"cache_id": "d151a70f-1d72-4117-b52a-37083eef4853",
"messages": [...],
"mode": "append"
}'
// ...
{"prompt_tokens":XXX,"total_tokens":XXX,"completion_tokens":XXX,"prompt_tokens_details":{"cached_tokens":330}
// ...