-
-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] 支持提示词缓存Prompt caching #4561
Comments
🥰 Description of requirementsSome APIs such as OpenAI/Claude/MOONSHOT already support Prompt caching, which can significantly reduce the cost of multiple rounds of question and answer 🧐 SolutionProvide switch options 📝 Supplementary informationNo response |
Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible. |
OpenAI 的prompt cacheing 是默认开启的,不需要额外设置 |
OpenAI's prompt caching is enabled by default and no additional settings are required. |
Anthropic Claude 缓存的提示仅在5分钟内有效,我认为不适合本项目 |
Anthropic Claude Cached tips are only valid for 5 minutes and I don't think they are suitable for this project |
This issue is closed, If you have any questions, you can comment and reply. |
Anthropic 的 Caching 其实我有计划做的 |
Actually I have a plan to do it |
这个我感觉没啥意义吧?它的适用场景一般是单一功能的聊天机器人,比如某公司的客服,需要短时间内多次调用API,并且多次调用的提示具有相同的前缀 |
I don’t think this makes any sense, does it? Its applicable scenario is generally a single-function chat robot, such as a company's customer service, which needs to call the API multiple times in a short period of time, and the prompts for multiple calls have the same prefix. |
@BrandonStudio 有意义的,比如 system prompts 的缓存就非常有价值,像 Artifacts 4000个 tokens,只需要多一轮对话,那么默认缓存就值回本了,更不用说类似爬虫插件一次拉回一篇超长文本的场景(1w)。 还有类似文件上传的 case,结合 prompt caching,我可以直接做成全文本上传的方案,那么这个节省下来的费用更是可观。 至于在交互上,这个不会去让用户自行操作的,而是针对个别类型的上下文做。比如 system role ,tools 的调用返回,PDF 文件的内容这些。 另外我之前测的时候也不是所有内容都支持 cache 的,user 的 content 如果少于 x 个 token(具体数值有点忘了),加了 cache 反而会直接抛错。所以 cache 前我会做一轮检测的,如果字符串长度小于某个值也不会去 cache。 |
@BrandonStudio It makes sense. For example, the cache of system prompts is very valuable. For example, 4000 tokens of Artifacts only require one more round of dialogue, so the default cache will be worth it, not to mention a pull by a similar crawler plug-in. Scenario of replying a very long text (1w). There is also a case similar to file upload. Combined with prompt caching, I can directly make a full text upload solution, so the cost savings will be considerable. As for interaction, this will not be done by users themselves, but will be done based on individual types of context. For example, system role, tool call return, PDF file content, etc. In addition, when I tested before, not all content supported cache. If the user's content was less than x tokens (I forgot the specific value), adding cache would directly throw an error. So I will do a round of testing before caching. If the string length is less than a certain value, it will not be cached. |
问题还是5分钟的缓存时间限制,怎么样保证添加这个功能之后费用是减少的,而不是反而增加 |
The problem is still a 5 -minute cache time limit. How to ensure that the cost after adding this function is reduced, not but increase |
所以说给个开关准没错。 |
So it's right to give a switch. |
这样的话应该再加个定时器 |
In this case, you should add a timer |
Hi, I'm selfhosting Qwen 2.5 coder 32b using llama.cpp, lately we got massive speed gains thanks to speculative decoding which requires prompt caching. So now Lobe chat works only at half speed compared to what's obtainable. And secondly during conversations the prompt processing time gets longer and longer. Both issues described are not an issue when using llama.cpp inbuilt chat UI although their UI is very basic. Here below two links for more details. |
Need this too.
另外,即使没有上述情境,只要用户要多轮对话,那就会节省成本: |
Need this too.
In addition, even if there is no above situation, as long as the user has multiple rounds of dialogue, it will save costs: |
Anthropic 目前应该最多支持4个缓存控制。 |
Anthropic should currently support up to 4 cache controls. |
Need this too, especially with Anthropic where token cost is high, a button to activate prompt caching could greatly help (especially since other UIs are already doing it)! |
This issue is closed, If you have any questions, you can comment and reply. |
🎉 This issue has been resolved in version 1.69.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
🥰 需求描述
部分API如OpenAI/Claude/MOONSHOT等已支持Prompt caching,能够大幅降低多轮问答的成本
🧐 解决方案
提供开关选项
📝 补充信息
No response
The text was updated successfully, but these errors were encountered: