mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-26 01:01:40 +00:00
- Introduced new skills tools: `skills_categories`, `skills_list`, and `skill_view` in `model_tools.py`, allowing for better organization and access to skill-related functionalities. - Updated `toolsets.py` to include a new `skills` toolset, providing a dedicated space for skill tools. - Enhanced `batch_runner.py` to recognize and validate skills tools during batch processing. - Added comprehensive tool definitions for skills tools, ensuring compatibility with OpenAI's expected format. - Created new shell script `test_skills_kimi.sh` for testing skills tool functionality with Kimi K2.5. - Added example skill files demonstrating the structure and usage of skills within the Hermes-Agent framework, including `SKILL.md` for example and audiocraft skills. - Improved documentation for skills tools and their integration into the existing tool framework, ensuring clarity for future development and usage.
15 KiB
15 KiB
Comprehensive JSON Generation Guide
Complete guide to JSON generation with Outlines using Pydantic models and JSON schemas.
Table of Contents
- Pydantic Models
- JSON Schema Support
- Advanced Patterns
- Nested Structures
- Complex Types
- Validation
- Performance Optimization
Pydantic Models
Basic Models
from pydantic import BaseModel
import outlines
class User(BaseModel):
name: str
age: int
email: str
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, User)
user = generator("Generate user: Alice, 25, alice@example.com")
print(user.name) # "Alice"
print(user.age) # 25
print(user.email) # "alice@example.com"
Field Constraints
from pydantic import BaseModel, Field
class Product(BaseModel):
name: str = Field(min_length=1, max_length=100)
price: float = Field(gt=0, description="Price in USD")
discount: float = Field(ge=0, le=100, description="Discount percentage")
quantity: int = Field(ge=0, description="Available quantity")
sku: str = Field(pattern=r"^[A-Z]{3}-\d{6}$")
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, Product)
product = generator("Generate product: iPhone 15, $999")
# All fields guaranteed to meet constraints
Available Constraints:
min_length,max_length: String lengthgt,ge,lt,le: Numeric comparisonsmultiple_of: Number must be multiple of valuepattern: Regex pattern for stringsmin_items,max_items: List length
Optional Fields
from typing import Optional
class Article(BaseModel):
title: str # Required
author: Optional[str] = None # Optional
published_date: Optional[str] = None # Optional
tags: list[str] = [] # Default empty list
view_count: int = 0 # Default value
generator = outlines.generate.json(model, Article)
# Can generate even if optional fields missing
article = generator("Title: Introduction to AI")
print(article.author) # None (not provided)
print(article.tags) # [] (default)
Default Values
class Config(BaseModel):
debug: bool = False
max_retries: int = 3
timeout: float = 30.0
log_level: str = "INFO"
# Generator uses defaults when not specified
generator = outlines.generate.json(model, Config)
config = generator("Generate config with debug enabled")
print(config.debug) # True (from prompt)
print(config.timeout) # 30.0 (default)
Enums and Literals
Enum Fields
from enum import Enum
class Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
CANCELLED = "cancelled"
class Application(BaseModel):
applicant_name: str
status: Status # Must be one of enum values
submitted_date: str
generator = outlines.generate.json(model, Application)
app = generator("Generate application for John Doe")
print(app.status) # Status.PENDING (or one of the enum values)
print(type(app.status)) # <enum 'Status'>
Literal Types
from typing import Literal
class Task(BaseModel):
title: str
priority: Literal["low", "medium", "high", "critical"]
status: Literal["todo", "in_progress", "done"]
assigned_to: str
generator = outlines.generate.json(model, Task)
task = generator("Create high priority task: Fix bug")
print(task.priority) # One of: "low", "medium", "high", "critical"
Multiple Choice Fields
class Survey(BaseModel):
question: str
answer: Literal["strongly_disagree", "disagree", "neutral", "agree", "strongly_agree"]
confidence: Literal["low", "medium", "high"]
generator = outlines.generate.json(model, Survey)
survey = generator("Rate: 'I enjoy using this product'")
Nested Structures
Nested Models
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str
country: str = "USA"
class Person(BaseModel):
name: str
age: int
email: str
address: Address # Nested model
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, Person)
prompt = """
Extract person:
Name: Alice Johnson
Age: 28
Email: alice@example.com
Address: 123 Main St, Boston, MA, 02101
"""
person = generator(prompt)
print(person.name) # "Alice Johnson"
print(person.address.city) # "Boston"
print(person.address.state) # "MA"
Deep Nesting
class Coordinates(BaseModel):
latitude: float
longitude: float
class Location(BaseModel):
name: str
coordinates: Coordinates
class Event(BaseModel):
title: str
date: str
location: Location
generator = outlines.generate.json(model, Event)
event = generator("Generate event: Tech Conference in San Francisco")
print(event.title) # "Tech Conference"
print(event.location.name) # "San Francisco"
print(event.location.coordinates.latitude) # 37.7749
Lists of Nested Models
class Item(BaseModel):
name: str
quantity: int
price: float
class Order(BaseModel):
order_id: str
customer: str
items: list[Item] # List of nested models
total: float
generator = outlines.generate.json(model, Order)
prompt = """
Generate order for John:
- 2x Widget ($10 each)
- 3x Gadget ($15 each)
Order ID: ORD-001
"""
order = generator(prompt)
print(f"Order ID: {order.order_id}")
for item in order.items:
print(f"- {item.quantity}x {item.name} @ ${item.price}")
print(f"Total: ${order.total}")
Complex Types
Union Types
from typing import Union
class TextContent(BaseModel):
type: Literal["text"]
content: str
class ImageContent(BaseModel):
type: Literal["image"]
url: str
caption: str
class Post(BaseModel):
title: str
content: Union[TextContent, ImageContent] # Either type
generator = outlines.generate.json(model, Post)
# Can generate either text or image content
post = generator("Generate blog post with image")
if post.content.type == "text":
print(post.content.content)
elif post.content.type == "image":
print(post.content.url)
Lists and Arrays
class Article(BaseModel):
title: str
authors: list[str] # List of strings
tags: list[str]
sections: list[dict[str, str]] # List of dicts
related_ids: list[int]
generator = outlines.generate.json(model, Article)
article = generator("Generate article about AI")
print(article.authors) # ["Alice", "Bob"]
print(article.tags) # ["AI", "Machine Learning", "Technology"]
Dictionaries
class Metadata(BaseModel):
title: str
properties: dict[str, str] # String keys and values
counts: dict[str, int] # String keys, int values
settings: dict[str, Union[str, int, bool]] # Mixed value types
generator = outlines.generate.json(model, Metadata)
meta = generator("Generate metadata")
print(meta.properties) # {"author": "Alice", "version": "1.0"}
print(meta.counts) # {"views": 1000, "likes": 50}
Any Type (Use Sparingly)
from typing import Any
class FlexibleData(BaseModel):
name: str
structured_field: str
flexible_field: Any # Can be anything
# Note: Any reduces type safety, use only when necessary
generator = outlines.generate.json(model, FlexibleData)
JSON Schema Support
Direct Schema Usage
import outlines
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
# Define JSON schema
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer", "minimum": 0, "maximum": 120},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "age", "email"]
}
# Generate from schema
generator = outlines.generate.json(model, schema)
result = generator("Generate person: Alice, 25, alice@example.com")
print(result) # Valid JSON matching schema
Schema from Pydantic
class User(BaseModel):
name: str
age: int
email: str
# Get JSON schema from Pydantic model
schema = User.model_json_schema()
print(schema)
# {
# "type": "object",
# "properties": {
# "name": {"type": "string"},
# "age": {"type": "integer"},
# "email": {"type": "string"}
# },
# "required": ["name", "age", "email"]
# }
# Both approaches equivalent:
generator1 = outlines.generate.json(model, User)
generator2 = outlines.generate.json(model, schema)
Advanced Patterns
Conditional Fields
class Order(BaseModel):
order_type: Literal["standard", "express"]
delivery_date: str
express_fee: Optional[float] = None # Only for express orders
generator = outlines.generate.json(model, Order)
# Express order
order1 = generator("Create express order for tomorrow")
print(order1.express_fee) # 25.0
# Standard order
order2 = generator("Create standard order")
print(order2.express_fee) # None
Recursive Models
from typing import Optional, List
class TreeNode(BaseModel):
value: str
children: Optional[List['TreeNode']] = None
# Enable forward references
TreeNode.model_rebuild()
generator = outlines.generate.json(model, TreeNode)
tree = generator("Generate file tree with subdirectories")
print(tree.value) # "root"
print(tree.children[0].value) # "subdir1"
Model with Validation
from pydantic import field_validator
class DateRange(BaseModel):
start_date: str
end_date: str
@field_validator('end_date')
def end_after_start(cls, v, info):
"""Ensure end_date is after start_date."""
if 'start_date' in info.data:
from datetime import datetime
start = datetime.strptime(info.data['start_date'], '%Y-%m-%d')
end = datetime.strptime(v, '%Y-%m-%d')
if end < start:
raise ValueError('end_date must be after start_date')
return v
generator = outlines.generate.json(model, DateRange)
# Validation happens after generation
Multiple Objects
Generate List of Objects
class Person(BaseModel):
name: str
age: int
class Team(BaseModel):
team_name: str
members: list[Person]
generator = outlines.generate.json(model, Team)
team = generator("Generate engineering team with 5 members")
print(f"Team: {team.team_name}")
for member in team.members:
print(f"- {member.name}, {member.age}")
Batch Generation
def generate_batch(prompts: list[str], schema: type[BaseModel]):
"""Generate structured outputs for multiple prompts."""
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, schema)
results = []
for prompt in prompts:
result = generator(prompt)
results.append(result)
return results
class Product(BaseModel):
name: str
price: float
prompts = [
"Product: iPhone 15, $999",
"Product: MacBook Pro, $2499",
"Product: AirPods, $179"
]
products = generate_batch(prompts, Product)
for product in products:
print(f"{product.name}: ${product.price}")
Performance Optimization
Caching Generators
from functools import lru_cache
@lru_cache(maxsize=10)
def get_generator(model_name: str, schema_hash: int):
"""Cache generators for reuse."""
model = outlines.models.transformers(model_name)
return outlines.generate.json(model, schema)
# First call: creates generator
gen1 = get_generator("microsoft/Phi-3-mini-4k-instruct", hash(User))
# Second call: returns cached generator (fast!)
gen2 = get_generator("microsoft/Phi-3-mini-4k-instruct", hash(User))
Batch Processing
# Process multiple items efficiently
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, User)
texts = ["User: Alice, 25", "User: Bob, 30", "User: Carol, 35"]
# Reuse generator (model stays loaded)
users = [generator(text) for text in texts]
Minimize Schema Complexity
# ✅ Good: Simple, flat structure (faster)
class SimplePerson(BaseModel):
name: str
age: int
city: str
# ⚠️ Slower: Deep nesting
class ComplexPerson(BaseModel):
personal_info: PersonalInfo
address: Address
employment: Employment
# ... many nested levels
Error Handling
Handle Missing Fields
from pydantic import ValidationError
class User(BaseModel):
name: str
age: int
email: str
try:
user = generator("Generate user") # May not include all fields
except ValidationError as e:
print(f"Validation error: {e}")
# Handle gracefully
Fallback with Optional Fields
class RobustUser(BaseModel):
name: str # Required
age: Optional[int] = None # Optional
email: Optional[str] = None # Optional
# More likely to succeed even with incomplete data
user = generator("Generate user: Alice")
print(user.name) # "Alice"
print(user.age) # None (not provided)
Best Practices
1. Use Specific Types
# ✅ Good: Specific types
class Product(BaseModel):
name: str
price: float # Not Any or str
quantity: int # Not str
in_stock: bool # Not int
# ❌ Bad: Generic types
class Product(BaseModel):
name: Any
price: str # Should be float
quantity: str # Should be int
2. Add Descriptions
# ✅ Good: Clear descriptions
class Article(BaseModel):
title: str = Field(description="Article title, 10-100 characters")
content: str = Field(description="Main article content in paragraphs")
tags: list[str] = Field(description="List of relevant topic tags")
# Descriptions help the model understand expected output
3. Use Constraints
# ✅ Good: With constraints
class Age(BaseModel):
value: int = Field(ge=0, le=120, description="Age in years")
# ❌ Bad: No constraints
class Age(BaseModel):
value: int # Could be negative or > 120
4. Prefer Enums Over Strings
# ✅ Good: Enum for fixed set
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
class Task(BaseModel):
priority: Priority # Guaranteed valid
# ❌ Bad: Free-form string
class Task(BaseModel):
priority: str # Could be "urgent", "ASAP", "!!", etc.
5. Test Your Models
# Test models work as expected
def test_product_model():
product = Product(
name="Test Product",
price=19.99,
quantity=10,
in_stock=True
)
assert product.price == 19.99
assert isinstance(product, Product)
# Run tests before using in production
Resources
- Pydantic Docs: https://docs.pydantic.dev
- JSON Schema: https://json-schema.org
- Outlines GitHub: https://github.com/outlines-dev/outlines