improve llama.cpp skill

2026-04-25 00:51:20 +00:00 · 2026-04-21 20:37:07 +02:00 · 2026-04-21 20:37:07 +02:00 · d6cf2cc058
commit d6cf2cc058
parent ce98e1ef11
4 changed files with 351 additions and 380 deletions
--- a/skills/mlops/inference/llama-cpp/references/server.md
+++ b/skills/mlops/inference/llama-cpp/references/server.md
@ -2,6 +2,31 @@

 Production deployment of llama.cpp server with OpenAI-compatible API.

+## Direct from Hugging Face Hub
+
+Prefer the model repo's local-app page first:
+
+```text
+https://huggingface.co/<repo>?local-app=llama.cpp
+```
+
+If the page shows an exact snippet, copy it. If not, use one of these forms:
+
+```bash
+# Choose a quant label directly from the Hub repo
+llama-server -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0
+```
+
+```bash
+# Pin an exact GGUF file from the repo tree
+llama-server \
+    --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf \
+    --hf-file Phi-3-mini-4k-instruct-q4.gguf \
+    -c 4096
+```
+
+Use the file-specific form when the repo has custom naming or when you already extracted the exact filename from the tree API.
+
 ## Server Modes

 ### llama-server