Forks & Derivatives

hermes-skill-distillation

Generates agentic training trajectories from real-world tasks. Hackathon project for producing fine-tuning data at scale.

Why it matters

Profile

Generates agentic training trajectories from real-world tasks. Hackathon project for producing fine-tuning data at scale.

setup mediumintegration lowinterface cli
Provenance

Signals

Listed in the awesome-hermes-agent README

Sources: 2 / Surfaces: 1

Fast skim

What the upstream surface says

Short excerpt only, so you can decide whether to click out.

Demo concept: Use Hermes agent's real-world tool usage to generate high-quality agentic training trajectories for Hermes 4 fine-tuning.

Hermes agent already runs real tasks for real users. Every session is a potential training example. This project turns that latent signal into a closed learning loop:

Hermes agent runs tasks → trajectories captured → judge scores them → Atropos fine-tunes Hermes 4 → better model → better agent

Hermes Agent Hackathon — Skill Distillation PipelineThe IdeaWhat We BuiltQuickstartInstallGenerate SFT data (no training server needed)Run benchmark (evaluate before/after)Live RL training (connect to Atropos)
  • Runs a diverse task battery — 30 tasks across coding, web research, file ops, data analysis, sysadmin. Things users actually ask agents to do.
  • Scores trajectories automatically — multi-dimensional reward: task completion (via ToolContext verification), efficiency, error recovery.
  • Exports SFT-ready JSONL — drop-in for Atropos process mode.
  • Connects to Atropos for live RL — serve mode wires directly into GRPO training.
  • Before/after comparison — demo/compare_models.py shows Hermes 4-14B vanilla vs. fine-tuned on 500 trajectories.