Umwelten
Habitats / evals / sessions
Outline-first technical modernism / research toolchain

Every model lives inside a different world.

Umwelten gives you agent environments that can remember, observe, test, and compare themselves — across models, providers, sessions, and interfaces.

Core primitive
Habitats
Persistent environments with persona, tools, memory, sessions, and sub-agents.
Evaluation mode
Evals
Run the same task across models, score results, and compare cost and timing.
Reflection layer
Sessions
Index prior work, surface learnings, and feed them back into future runs.
Why Umwelten

The point is not just a model. It is the world around the model.

Most AI surfaces flatten everything into a chat box. Umwelten treats an agent as a living runtime with identity, tools, memory, and measurable behavior.

01 / Persistent identity

A habitat is a place, not a wrapper.

Bundle persona, config, memory, skills, secrets, tools, and sub-agents into one directory that can move across interfaces without losing itself.

02 / Comparative intelligence

Run the same task everywhere.

Evaluate one prompt across multiple providers and local models, then compare output quality, cost, timing, and failure modes instead of guessing.

03 / Reflective memory

Turn sessions into feedback.

Index Claude Code or Cursor history, search it semantically, and extract useful patterns that make future environments sharper.

04 / Interface portability

One environment, many doors.

Use the same agent runtime in the CLI, on Telegram, in Discord, or on the web without rebuilding the world each time.

Runtime map

From prompt surface to inspectable agent stack.

The architecture section leans diagrammatic rather than decorative: each layer exists so the agent can observe, remember, or compare itself better.

Layered system
Interfaces

CLI, Telegram, Discord, and web all connect to the same habitat instead of spawning isolated personalities.

Habitat core

Persona, loaded skills, tools, memories, sessions, and managed sub-agents live together in one inspectable environment.

Eval engine

Prompt suites, scoring, caching, resuming, leaderboards, and cost/timing metrics make model comparison routine.

Session analysis

Claude Code JSONL and Cursor SQLite histories become searchable evidence instead of dead logs.

Providers + locals

Hosted frontier models and local Ollama-style models can be tested side-by-side with the same evaluation contract.

Typical flow

How it behaves in practice

01
Create a habitat

Initialize an agent environment with the tools, sessions, memory, and persona you actually want to persist.

02
Run the same prompt across models

Use evals to expose differences in reasoning, instruction following, latency, and cost before choosing a default stack.

03
Analyze past sessions

Pull previous work into the loop so the environment can remember what worked, what failed, and where behavior drifts.

04
Reuse everywhere

Keep the environment itself stable while the surfaces around it change.

Install + examples

The commands should feel close enough to use immediately.

This section keeps the product page grounded in real operator behavior instead of abstract promise language.

Try it now

Three entry points: create a habitat, run a cross-model eval, compare local and cloud.

npx umwelten habitat

npx umwelten eval run \
  --prompt "Explain why the sky is blue in exactly three sentences" \
  --models "google:gemini-3-flash-preview,openrouter:openai/gpt-5.4-nano,openrouter:anthropic/claude-sonnet-4.6" \
  --id "sky-test" --concurrent

npx umwelten eval run \
  --prompt "Write a haiku about recursion" \
  --models "ollama:qwen3:30b-a3b,openrouter:openai/gpt-5.4" \
  --id "local-vs-cloud" --concurrent
Framing line

Not just another chat wrapper.

Build agents that can keep a world, inspect that world, and learn from what happened inside it.
Use this mockup tone as: technical, warm, assertive, and a little poster-like.