Profile
Generates agentic training trajectories from real-world tasks. Hackathon project for producing fine-tuning data at scale.
Signals
Listed in the awesome-hermes-agent README
Sources: 2 / Surfaces: 1
What the upstream surface says
Short excerpt only, so you can decide whether to click out.
Demo concept: Use Hermes agent's real-world tool usage to generate high-quality agentic training trajectories for Hermes 4 fine-tuning.
Hermes agent already runs real tasks for real users. Every session is a potential training example. This project turns that latent signal into a closed learning loop:
Hermes agent runs tasks → trajectories captured → judge scores them → Atropos fine-tunes Hermes 4 → better model → better agent
- Runs a diverse task battery — 30 tasks across coding, web research, file ops, data analysis, sysadmin. Things users actually ask agents to do.
- Scores trajectories automatically — multi-dimensional reward: task completion (via ToolContext verification), efficiency, error recovery.
- Exports SFT-ready JSONL — drop-in for Atropos process mode.
- Connects to Atropos for live RL — serve mode wires directly into GRPO training.
- Before/after comparison — demo/compare_models.py shows Hermes 4-14B vanilla vs. fine-tuned on 500 trajectories.