curl 'https://wanqing.streamlakeapi.com/api/gateway/v1/endpoints/chat/completions' \
  -H "Authorization: Bearer $WQ_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "模型 id/推理点 id",
    "messages": [ ... ],
    "mode": "create",
    "ttl": 3600
  }'

ttl（Time-To-Live）：解析缓存的生命时间，单位为秒，默认值600秒；

期望输出

// ...
{"cache_id":"d151a70f-1d72-4117-b52a-37083eef4853",
// ...
{"prompt_tokens":195,"total_tokens":XXX,"completion_tokens":XX
// ...

- Cache ID：缓存创建后返回的唯一ID（例如 d151a70f-1d72-4117-b52a-37083eef4853），表明该messages已被缓存。用户后续可凭借此ID进行prefix或append操作。
- 在这个示例里面，请求的messages长度为195个token。推荐用户请求messages长度>1000。当token数小于1000 token时，不保证成功。

prefix

curl 'https://wanqing.streamlakeapi.com/api/gateway/v1/endpoints/chat/completions' \
  -H "Authorization: Bearer $WQ_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "cache_id": "d151a70f-1d72-4117-b52a-37083eef4853",
    "messages": [...],
    "mode": "prefix"
}'

用户根据cache id为d151a70f-1d72-4117-b52a-37083eef4853的前缀进行推理；
该能力下，使用指定Cache ID的缓存内容作为本次推理的前缀，仅做复用，不会被创建为新的缓存。

期望输出

// ...
{"prompt_tokens":XXX,"total_tokens":XXX,"completion_tokens":XXX,"prompt_tokens_details":{"cached_tokens":194}
// ...

- 在返回的response里，存在cached_tokens字段，预期该字段对应的数值和create时prompt_tokens字段对应的值接近。在本文的例子里，create时prompt_tokens字段为195，cached_tokens字段为194，两者是接近的。
- 在对同一cache ID进行多次prefix时，cached_tokens不应降低。在本文的例子里cached_tokens应当不低于194。

append

curl 'https://wanqing.streamlakeapi.com/api/gateway/v1/endpoints/chat/completions' \
  -H "Authorization: Bearer $WQ_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "cache_id": "d151a70f-1d72-4117-b52a-37083eef4853",
    "messages": [...],
    "mode": "append"
}'

用户根据cache id为d151a70f-1d72-4117-b52a-37083eef4853的前缀进行推理；
该能力下，本次请求的新内容会追加到原有缓存之后，形成新前缀并替换。此后，使用该ID进行的任何新调用，都将基于更新后的前缀进行推理。

期望输出

// ...
{"prompt_tokens":XXX,"total_tokens":XXX,"completion_tokens":XXX,"prompt_tokens_details":{"cached_tokens":330}
// ...

- 和prefix模式不同，在对同一cache ID进行多次append时，cached_tokens应当单调递增。在本文的例子里，prefix后cached_tokens为194，在append后，cached_tokens为330，增加了136 token。
- 推荐每次append的token数大于64，小于64的token数不保证成功。

该篇文档内容是否对您有帮助？

有帮助没帮助

关于我们

支持与服务

法律支持

联系我们

友情链接