← Back to DeepSeek updates ← Terug naar DeepSeek-updates
DeepSeek ARTICLE ARTIKEL 25 April 2026 25 april 2026

DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs

In large language model API usage, a significant portion of user inputs tends to be repetitive. For instance, user prompts often include repeated references, and in multi-turn conversations, previous content is frequently re-entered. In large language model API usage, a significant portion of user inputs tends to be repetitive. For instance, user prompts often include repeated references, and in multi-turn conversations, previous content is frequently re-entered.

DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs
Article details Artikelgegevens
AI maker AI-maker DeepSeek Type Type Article Artikel Published Gepubliceerd 25 April 2026 25 april 2026 Updates Updates Videos Video's View original article Bekijk origineel artikel

News

Context Caching is Available 2024/08/02

News

Context Caching is Available 2024/08/02

In large language model API usage, a significant portion of user inputs tends to be repetitive. For instance, user prompts often include repeated references, and in multi-turn conversations, previous content is frequently re-entered.

To address this, DeepSeek has implemented Context Caching on Disk technology. This innovative approach caches content that is expected to be reused on a distributed disk array. When duplicate inputs are detected, the repeated parts are retrieved from the cache, bypassing the need for recomputation. This not only reduces service latency but also significantly cuts down on overall usage costs.

For cache hits, DeepSeek charges $0.014 per million tokens, slashing API costs by up to 90%1.

Hint 1: The API price has been updated. For details, please refer to Models & Pricing.

How to Use DeepSeek API's Caching Service​

The disk caching service is now available for all users, requiring no code or interface changes. The cache service runs automatically, and billing is based on actual cache hits.

Note that only requests with identical prefixes (starting from the 0th token) will be considered duplicates. Partial matches in the middle of the input will not trigger a cache hit.

Here are two classic cache usage scenarios:

1. Multi-turn conversation: The next turn can hit the context cache generated by the previous turn.

Q&A assistants with long preset prompts

Role-play with extensive character settings and multi-turn conversations

Data analysis with recurring queries on the same documents/files

Code analysis and debugging with repeated repository references

Improve model output performance through Few-shot learning.

...

2. Data analysis: Subsequent requests with the same prefix can hit the context cache.

Beneficial Scenarios for Context Caching on Disk:

Q&A assistants with long preset prompts

prompt_cache_hit_tokens:Number of tokens from the input that were served from the cache ($0.014 per million tokens)

prompt_cache_miss_tokens: Number of tokens from the input that were not served from the cache ($0.14 per million tokens)

Role-play with extensive character settings and multi-turn conversations

Data analysis with recurring queries on the same documents/files

Code analysis and debugging with repeated repository references

Improve model output performance through Few-shot learning.

...

For more detailed instructions, please refer to the guide Use Context Caching.

Monitoring Cache Hits​

Two new fields in the API response's usage section help users monitor cache performance:

prompt_cache_hit_tokens:Number of tokens from the input that were served from the cache ($0.014 per million tokens)

prompt_cache_miss_tokens: Number of tokens from the input that were not served from the cache ($0.14 per million tokens)

Reducing Latency​

First token latency will be significantly reduced in requests with long, repetitive inputs.

For a 128K prompt with high reference, the first token latency is cut from 13s to just 500ms.

Lowering Costs​

Users can save up to 90% on costs with optimization for cache characteristics.

Even without any optimization, historical data shows that users save over 50% on average.

The service has no additional fees beyond the $0.014 per million tokens for cache hits, and storage usage for the cache is free.

Security Concerns​

The cache system is designed with robust security strategy.

How to Use DeepSeek API's Caching Service

Monitoring Cache Hits

Reducing Latency

Lowering Costs

Security Concerns

Why DeepSeek Leads with Disk Caching

DeepSeek API’s Concurrency and Rate Limits

More from DeepSeek Meer van DeepSeek

All updates Alle updates

Gemini komt eraan