DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs

DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs

In large language model API usage, a significant portion of user inputs tends to be repetitive. For instance, user prompts often include repeated references, and in multi-turn conversations, previous content is frequently re-entered. In large language model API usage, a significant portion of user inputs tends to be repetitive. For instance, user prompts often include repeated references, and in multi-turn conversations, previous content is frequently re-entered.

News

Context Caching is Available 2024/08/02

News

Context Caching is Available 2024/08/02

To address this, DeepSeek has implemented Context Caching on Disk technology. This innovative approach caches content that is expected to be reused on a distributed disk array. When duplicate inputs are detected, the repeated parts are retrieved from the cache, bypassing the need for recomputation. This not only reduces service latency but also significantly cuts down on overall usage costs.

For cache hits, DeepSeek charges $0.014 per million tokens, slashing API costs by up to 90%1.

Hint 1: The API price has been updated. For details, please refer to Models & Pricing.

How to Use DeepSeek API's Caching Service

The disk caching service is now available for all users, requiring no code or interface changes. The cache service runs automatically, and billing is based on actual cache hits.

Note that only requests with identical prefixes (starting from the 0th token) will be considered duplicates. Partial matches in the middle of the input will not trigger a cache hit.

Here are two classic cache usage scenarios:

1. Multi-turn conversation: The next turn can hit the context cache generated by the previous turn.

Q&A assistants with long preset prompts

Role-play with extensive character settings and multi-turn conversations

Data analysis with recurring queries on the same documents/files

Code analysis and debugging with repeated repository references

Improve model output performance through Few-shot learning.

...

2. Data analysis: Subsequent requests with the same prefix can hit the context cache.

Beneficial Scenarios for Context Caching on Disk:

Q&A assistants with long preset prompts

prompt_cache_hit_tokens：Number of tokens from the input that were served from the cache ($0.014 per million tokens)

prompt_cache_miss_tokens: Number of tokens from the input that were not served from the cache ($0.14 per million tokens)

Role-play with extensive character settings and multi-turn conversations

Data analysis with recurring queries on the same documents/files

Code analysis and debugging with repeated repository references

Improve model output performance through Few-shot learning.

...

For more detailed instructions, please refer to the guide Use Context Caching.

Monitoring Cache Hits

Two new fields in the API response's usage section help users monitor cache performance:

prompt_cache_hit_tokens：Number of tokens from the input that were served from the cache ($0.014 per million tokens)

prompt_cache_miss_tokens: Number of tokens from the input that were not served from the cache ($0.14 per million tokens)

Reducing Latency

First token latency will be significantly reduced in requests with long, repetitive inputs.

For a 128K prompt with high reference, the first token latency is cut from 13s to just 500ms.

Lowering Costs

Users can save up to 90% on costs with optimization for cache characteristics.

Even without any optimization, historical data shows that users save over 50% on average.

The service has no additional fees beyond the $0.014 per million tokens for cache hits, and storage usage for the cache is free.

Security Concerns

The cache system is designed with robust security strategy.

How to Use DeepSeek API's Caching Service

Monitoring Cache Hits

Reducing Latency

Lowering Costs

Security Concerns

Why DeepSeek Leads with Disk Caching

DeepSeek API’s Concurrency and Rate Limits

DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs

For cache hits, DeepSeek charges $0.014 per million tokens, slashing API costs by up to 90%1.

Beneficial Scenarios for Context Caching on Disk:

Role-play with extensive character settings and multi-turn conversations

Improve model output performance through Few-shot learning.

Two new fields in the API response's usage section help users monitor cache performance:

First token latency will be significantly reduced in requests with long, repetitive inputs.

Users can save up to 90% on costs with optimization for cache characteristics.

More from DeepSeek Meer van DeepSeek

DeepSeek V4 Preview Release | DeepSeek API Docs DeepSeek V4 Preview Release | DeepSeek API Docs

DeepSeek-V3.2 Release | DeepSeek API Docs DeepSeek-V3.2 Release | DeepSeek API Docs

Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs

DeepSeek-R1 Release | DeepSeek API Docs DeepSeek-R1 Release | DeepSeek API Docs

DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs

For cache hits, DeepSeek charges $0.014 per million tokens, slashing API costs by up to 90%1.

Beneficial Scenarios for Context Caching on Disk:

Role-play with extensive character settings and multi-turn conversations

Improve model output performance through Few-shot learning.

Two new fields in the API response's usage section help users monitor cache performance:

First token latency will be significantly reduced in requests with long, repetitive inputs.

Users can save up to 90% on costs with optimization for cache characteristics.

More from DeepSeek Meer van DeepSeek

DeepSeek V4 Preview Release | DeepSeek API Docs DeepSeek V4 Preview Release | DeepSeek API Docs

DeepSeek-V3.2 Release | DeepSeek API Docs DeepSeek-V3.2 Release | DeepSeek API Docs

Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs

DeepSeek-R1 Release | DeepSeek API Docs DeepSeek-R1 Release | DeepSeek API Docs

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.