-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sonnet 3.5 is using a lot of output tokens, hitting 4k output token limit #705
Comments
p.s. I note the input token calculation report is wrong from aider as well. |
Thanks for trying aider and filing this issue. Others have reported similar issues recently. Your confirmation that sonnet did indeed return 4k tokens is very helpful info. |
Same with |
I can confirm this. Can't work with Sonnet 3.5 because it stops after a while and prints an error about tokens limit reached, even though the limit was not reached. |
It is reaching the output limit. The token counts being shown aren't accurate because Anthropic hasn't released a tokenize for these models. But Anthropic itself is returning the token limit error. |
4k tokens is like ~hundreds of lines of code. Are you guys routinely asking for a single change that involves hundreds of lines of code? I understand that a refactor might tend in this direction. I can certainly concoct a situation to force this to happen. But I'd really love if folks could share some example output from |
======= Model claude-3-5-sonnet-20240620 has hit a token limit! Input tokens: 9790 of 200000 For more info: https://aider.chat/docs/troubleshooting/token-limits.html see file |
Thanks @Yona544, I really appreciate that. Unfortunately it appears that the model output doesn't get saved to |
I don't understand your surprise at hearing that we reach the 4k limit so often. If a project is not just a small script, I usually reach that limit every 3-4 exchanges with the AI. And yes, a single change usually needs to modify hundreds of lines of code. For example yesterday I found a bug caused by a function that needed 4 parameters to work correctly but instead, it was only accepting 2 arguments (the error was to assume those 2 additional parameters were constants, instead, they were not). So aider changed the function definition from 2 to 4 arguments. But (and this is the main point) it also had to change all the places in the code that were calling that function, ensuring that each time 4 variables were correctly instanced and passed instead of only 2. Since that was a function used very often in the program, the code changes were easily above 4k tokens, and probably much more than that. That scenario is a common occurrence when fixing bugs. Anyway, I found a TEMPORARY WORKAROUND until this is fixed. I created a new custom configuration file called {
"claude-3-5-sonnet-20240620": {
"max_tokens": 3072,
"max_input_tokens": 200000,
"max_output_tokens": 3072,
"input_cost_per_token": 0.000003,
"output_cost_per_token": 0.000015,
"litellm_provider": "anthropic",
"mode": "chat",
"supports_function_calling": true,
"supports_vision": true,
"tool_use_system_prompt_tokens": 159
},
"claude-3-haiku-20240307": {
"max_tokens": 4096,
"max_input_tokens": 200000,
"max_output_tokens": 4096,
"input_cost_per_token": 0.00000025,
"output_cost_per_token": 0.00000125,
"litellm_provider": "anthropic",
"mode": "chat",
"supports_function_calling": true,
"supports_vision": true,
"tool_use_system_prompt_tokens": 264
}
} All you need to do is to create the above file, and then launch Aider with this command:
Note that you MUST use the ‘diff’ edit format for this trick to be effective. It is a temporary solution, but It works for me. I never got the token limit error again. PS: You can remove “--restore-chat-history” from the command if you want to save tokens ( it doesn’t work anyway...) |
Modifying the model metadata doesn't affect whether a token limit is hit. Aider doesn't use the All the OpenAI models and Opus have the same 4k output token limit. But literally no one has ever reported this output token limit issue previously until Sonnet launched. I have confirmed that Sonnet is really "chatty" with its SEARCH/REPLACE blocks. It often includes a ton of unchanging code in both the search and replace sections. For example, I made a code request to change 2 lines spread apart in a large source file. Sonnet made a SEARCH/REPLACE block that included all the intervening lines! I've updated the prompts to disuade Sonnet from this behavior. The change is available in the main branch.
I would really appreciate feedback from anyone who is able to try this improved version. |
Ok, but no matter what you say, my trick works. If I remove it and revert to the 4096 value, I get the token limit error again. Why? IMO all Aider needs to do for my trick to work is to pass the custom model configuration to LiteLLM. And it does. And that is all we need to avoid the limit error. Because then LiteLLM will use the max_tokens value when calling the completion API of Anthropic, and that will result in fewer tokens sent back by Anthropic as a response. Maybe Sonnet will ‘regulate’ itself by answering with 3072 tokens instead of 4096, I don’t know. That will be enough to prevent LiteLLM or Aider from (over)estimating (due to the wrong tokenizer, or whatever the issue behind this bug) a response length greater than 4096. Here is Aider passing the custom model to Litellm with register_model(): ...and here is LiteLLM register_model() overriding the max_tokens original parameter:
I will test it as soon as possible. |
I'm using the new version and got
Hope this is helpful in resolving issue In a case like this, I go back and ask Aider to break down the task and do one part at a time |
I also tested the new version (
This is the chat transcript: Hope this helps. |
@Emasoft , when using with the sonnet_cfg.json fife as u described, I'm getting : (aider) Y:\Projects\posexport>aider --model-metadata-file sonnet_cfg.json --model claude-3-5-sonnet-20240620 --weak-model claude-3-haiku-20240307 --edit-format diff --map-tokens 1024 --restore-chat-history Loaded 1 litellm model file(s)
summarizer unexpectedly failed for claude-3-haiku-20240307 |
@Yona544 No idea. Works perfectly fine for me. Have you tried omitting |
I just released v0.40.0 which has even stronger prompting to stop Sonnet from wasting so many tokens on unchanging chunks of code. If you have a chance to upgrade and try it, please let me know how it works for you. |
I don't know if this is helpful but : although anthropic has not released a tokenizer, they still gave access to a token counting function. As per their chatbor: import anthropic
vo = anthropic.Anthropic()
total_tokens = vo.count_tokens(["Sample text"]) (I'm thinking this can be used to reverse engineer the tokenizer so maybe look for that on github) |
@paul-gauthier I tested the 0.40.0 and this is the result:
It seems that the issue was not resolved. Hope this helps. |
I'm running 0.40.0 and got a token limit error today. I'd like to propose a different solution, which has worked for me in other cases (automated code) where I hit the output token limit. Get Claude to "keep speaking". Send another request, with the last message an "assistant" role with contents of previous response. Claude will pick up where he left off from before. I use this technique to get complete XML blocks that are well over 4k tokens. Claude handles it very well. Here is a simple example: The docs discuss prefilling the assistant response as a supported technique, but for different use-cases: So, getting Claude to "keep speaking" would allow for code search/replace blocks well in excess of the 4k token limit. |
I agree with this, this is what I recommended yesterday as well: #716 |
It may indeed be useful to add some "continue" functionality. But the root cause of the problem is that Sonnet has been wasting thousands of tokens by outputting tons of unchanging code. It's important to address that root cause first, before assessing the need for additional workarounds. |
Model claude-3-5-sonnet-20240620 has hit a token limit! Input tokens: ~24,884 of 200,000
|
I don't think that any prompt would make Sonnet output less code. Or at least not without negative consequences. Maybe the reason it's good at coding is the same reason why it outputs more code. Perhaps rewriting part of the code helps it be more accurate when predicting the next token. Token prediction is based on what precedes it after all. Maybe the best solution would be to implement the "continue" trick. |
I have to agree, when using Sonnet outside of aider it's a breath of fresh air when it returns the entire snippet of code instead of just giving you a diff. |
If you can get Sonnet to be more terse, that's a good quick win. But the problem of exceeding the output token limit will always be there, even if it doesn't happen as frequently (eg. I asked aider for a new file today, there was no diffing to make it smaller). Getting claude to continue speaking will solve that. If you can suggest which file you think the "loop & continue" logic belongs in, I can create a PR for it. (Python isn't my native.) |
I'm hitting the token limit due to trying to do too many things in one go, but usually the issue is sonnet goes to include too much context. Example I just hit:
(code is from the repo https://github.com/dandavison/delta) I applied some natural language coaxing as follows and you can see it was successful.
|
The main branch has experimental support for continuing Sonnet responses that hit the 4k output limit. It should happen seamlessly without any user intervention required. You can install the latest like this:
If you have a chance to try it, please let me know if it works better for you. |
I asked Claude to add some swagger doc comments to all endpoints, lots of tokens - it worked a treat. The terminal output got "messed up" when Claude was first asked to keep speaking; I wasn't paying attention enough to describe the symptoms - it was in middle of diff and the diff formatting disappeared when speaking resumed. The diffs applied cleanly though. (One of the diffs needed to be regenerated due to errors, but that worked normally.) So aside from some UI glitches, the "keep speaking" feature is working great for me. 👍 Thanks heaps for this feature - it's a real game changer. I can be more ambitious when asking Claude for larger features. |
The main branch has fixes for the rendering glitches that have been happening when aider asks Sonnet to continue its response. |
Tested 0.40.7. Works like a charm. It is a monster. It refactors files of any size like it is nothing. The continue trick with Sonnet is truly the holy grail. Aider beats Github copilot and Cursor hands down. I'm going to cancel both subscriptions. |
Much appreciated, giving it a try today! |
This all went out in v0.41.0 today. I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time. |
Issue
Asking for a large-scale change with sonnet, I see this output:
Model claude-3-5-sonnet-20240620 has hit a token limit!
Input tokens: 4902 of 200000
Output tokens: 3220 of 4096
Total tokens: 8122 of 200000
For more info: https://aider.chat/docs/troubleshooting/token-limits.html
None of these numbers is over the stated limit. However, here is what my anthropic API console returns:
Jun 21, 2024 at 7:56 PM claude-3-5-sonnet-20240620 39.10 5068 4096 sse None
The "None" is in the "Error" column. The 4096 is tokens generated. So, it looks like we actually got 4096 tokens out of Anthropic, but either we're using the wrong tokenizer for aider, or ... ?
Version and model info
Aider v0.39.0, called with
aider --sonnet
The text was updated successfully, but these errors were encountered: