-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chat template support #917
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis PR enables the use of chat templates with Sequence diagram for ensuring chat templatesequenceDiagram
participant ModelStore
participant GGUFInfoParser
participant LocalSnapshotFile
ModelStore->>ModelStore: new_snapshot(model_tag, snapshot_hash, snapshot_files)
ModelStore->>ModelStore: _ensure_chat_template(model_tag, snapshot_hash, snapshot_files)
alt ChatTemplate already in snapshot_files
ModelStore-->>ModelStore: return
else Model file exists
ModelStore->>GGUFInfoParser: is_model_gguf(model_file_path)
GGUFInfoParser-->>ModelStore: true
ModelStore->>GGUFInfoParser: parse(model_file_path)
GGUFInfoParser-->>ModelStore: GGUFModelInfo
ModelStore->>GGUFInfoParser: get_chat_template()
GGUFInfoParser-->>ModelStore: chat_template
alt chat_template not empty
ModelStore->>LocalSnapshotFile: Create LocalSnapshotFile(chat_template)
LocalSnapshotFile-->>ModelStore: LocalSnapshotFile
ModelStore->>ModelStore: update_snapshot(model_tag, snapshot_hash, [LocalSnapshotFile])
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
@ericcurtin Could we rebuild a new version of the ramalama container image? |
@rhatdan has the build infrastructure set up to do it, so it would be less effort to wait for him for a new release. But if you build a container image locally, you can just pass that to RamaLama for development purposes. |
ramalama/cli.py
Outdated
@@ -817,6 +819,9 @@ def run_cli(args): | |||
model.run(args) | |||
except Exception: | |||
raise e | |||
except Exception as ex: | |||
print(ex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run ramalama --debug, you should see tracebacks when they happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, forgot to remove that one. Removed now.
66ad4bf
to
d756093
Compare
ramalama/model_store.py
Outdated
ref_file.filenames.append(parts[0]) | ||
if parts[1] == RefFile.MODEL_SUFFIX: | ||
ref_file.model_name = parts[0] | ||
elif parts[1] == RefFile.CHAT_TEMPLATE_SUFFIX: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch elif to if, no need for else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
ramalama/model_store.py
Outdated
for file in snapshot_files: | ||
if file.type == SnapshotFileType.Model: | ||
ref_file.model_name = file.name | ||
elif file.type == SnapshotFileType.ChatTemplate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch elif to if. No need for else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
if file.type == SnapshotFileType.ChatTemplate: | ||
return | ||
if file.type == SnapshotFileType.Model: | ||
model_file = file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should break here? Can there be multiple SnapshotfileType.Model, if yes then we only see the last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ModelStore would allow multiple models at the moment and huggingface allows that as well (e.g. mradermacher/SmolLM-135M-GGUF). However, this would be invalid for ramalama as input. When the refs
file is serialized, the current approach is to use the last seen model file - the same applies to the chat template file. So currently, I think its probably worth to add some kind of validaton for this (i.e. only one model and chat template in the list of files) and raise an Exception. WDYT?
We will do a release on Monday. or Sunday, |
So, when I run the following, and access the openai endpoint I still get jinja errors on tool calls as this is foundational to jinja, but not yet jinja, right? $ gh pr checkout 917
$ python3 -m venv .venv && source .venv/bin/activate && pip install -e . && python bin/ramalama serve qwen2.5:3b |
ps if I make this change, and do --- a/ramalama/model.py
+++ b/ramalama/model.py
@@ -586,6 +586,7 @@ class Model(ModelBase):
else:
exec_args = [
"llama-server",
+ "--jinja",
"--port",
args.port,
"-m", ggml-org/llama.cpp#12279 has details on failures in general with hf qwen2.5, and the error of semantic-kernel dotnet, which also applies here. |
Signed-off-by: Michael Engel <[email protected]>
Signed-off-by: Michael Engel <[email protected]>
Signed-off-by: Michael Engel <[email protected]>
d756093
to
c1474e7
Compare
@codefromthecrypt By using the $ ramalama inspect qwen2.5:3b | grep chat_template You get basically the jinja template required by that model. This PR is part of enabling to detect and use the chat templates provided by the platformas such as ollama as well as extract this information from .gguf models. However, this will take a bit longer. So I think its best to use ramalama with your changes for the presentation. |
Long-term we want llama-server to just default to jinja without any manual intervention and fallback to other techniques, needs upstream llama.cpp work. |
thanks for the advice folks! ps please applaud @ochafik for the work upstream on llama-cpp! ggml-org/llama.cpp#12279 |
This PR enables the (automatic) use of the chat template file via
ramalama run
by passing the chat template file tollama-run
. The chat template can be either provided/downloaded directly or is extracted from the GGUF model and stored - preference is given to the provided chat template file.TODO:
Summary by Sourcery
This PR adds support for chat templates to
ramalama run
. It enables the automatic use of chat template files by passing them tollama-run
. The chat template can be provided directly, downloaded, or extracted from the GGUF model.