mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
improve llama.cpp skill
This commit is contained in:
parent
ce98e1ef11
commit
d6cf2cc058
4 changed files with 351 additions and 380 deletions
|
|
@ -2,6 +2,31 @@
|
|||
|
||||
Production deployment of llama.cpp server with OpenAI-compatible API.
|
||||
|
||||
## Direct from Hugging Face Hub
|
||||
|
||||
Prefer the model repo's local-app page first:
|
||||
|
||||
```text
|
||||
https://huggingface.co/<repo>?local-app=llama.cpp
|
||||
```
|
||||
|
||||
If the page shows an exact snippet, copy it. If not, use one of these forms:
|
||||
|
||||
```bash
|
||||
# Choose a quant label directly from the Hub repo
|
||||
llama-server -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0
|
||||
```
|
||||
|
||||
```bash
|
||||
# Pin an exact GGUF file from the repo tree
|
||||
llama-server \
|
||||
--hf-repo microsoft/Phi-3-mini-4k-instruct-gguf \
|
||||
--hf-file Phi-3-mini-4k-instruct-q4.gguf \
|
||||
-c 4096
|
||||
```
|
||||
|
||||
Use the file-specific form when the repo has custom naming or when you already extracted the exact filename from the tree API.
|
||||
|
||||
## Server Modes
|
||||
|
||||
### llama-server
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue