Anthropics's New Feature: Prompt Caching Explained

Anthropic has recently introduced a new feature on its API called prompt caching. This feature aims to remember the context between API calls, allowing developers to avoid repeating prompts. The prompt caching functionality is currently available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku models. However, support for the largest Claude model, Opus, is still in the works.

Prompt caching, as described in a 2023 paper, enables users to store frequently used contexts in their sessions. By retaining these prompts, users can provide additional background information without incurring extra costs. This feature proves to be useful in scenarios where users need to send a substantial amount of context in a prompt and refer back to it during various conversations with the model. Furthermore, prompt caching allows developers and users to refine model responses more effectively.

Anthropic has reported that early users have experienced significant speed and cost improvements with prompt caching across a range of use cases. These include reducing costs and latency for long instructions and uploaded documents for conversational agents, facilitating faster autocompletion of codes, providing multiple instructions to agentic search tools, and embedding entire documents in a prompt.

One key advantage of caching prompts is the reduction in prices per token. According to Anthropic, utilizing cached prompts is notably cheaper than the base input token price. For instance, in the case of Claude 3.5 Sonnet, writing a prompt to be cached costs $3.75 per 1 million tokens (MTok), whereas using a cached prompt only costs $0.30 per MTok. This pricing model allows users to make a slightly higher upfront payment and subsequently enjoy a 10x increase in savings when using the cached prompt in the future. Similarly, Claude 3 Haiku users will pay $0.30/MTok to cache and $0.03/MTok when accessing stored prompts.

While prompt caching has not been rolled out for Claude 3 Opus yet, Anthropic has already disclosed its pricing plans. Writing to cache will cost $18.75/MTok, while accessing the cached prompt will be priced at $1.50/MTok. However, it is important to note that Anthropic’s prompt cache has a 5-minute lifetime and gets refreshed upon each use.

Anthropic’s foray into prompt caching can be seen as a part of its strategy to compete against other AI platforms, such as Google and OpenAI, through pricing. Prior to the launch of the Claude 3 models, Anthropic had significantly reduced the prices of its tokens. The company now finds itself in a competitive environment where offering low-priced options for third-party developers has become crucial.

Other platforms, like Lamina and OpenAI, also offer variants of prompt caching. Lamina, for example, utilizes KV caching to drive down the costs of GPUs. On the other hand, OpenAI’s GPT-4o provides a memory feature where the model retains preferences or details, distinguishing it from traditional prompt caching mechanisms.

Prompt caching introduced by Anthropic opens up new possibilities for users to streamline their interactions with AI models, improve cost efficiency, and enhance model responses. As the competition in the AI landscape intensifies, features like prompt caching become instrumental in attracting and retaining developers on a platform.

Anthropics’s New Feature: Prompt Caching Explained

Leave a Reply Cancel reply

Articles You May Like

Leave a Reply Cancel reply