Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

Concepts

dgov is a local governor for AI coding work. It behaves more like a compiler than like an autonomous assistant: planning, ordering, retries, file claims, settlement, and merges are explicit state transitions over an event log. The worker can be probabilistic; the core lifecycle state machine is not.

Philosophy

Determinism over vibes

Planning, task ordering, retry policy, file claims, settlement, and merge decisions are explicit state transitions. Orchestration is inspectable even when the model output itself is not reproducible.

Isolation through git worktrees

Each task runs in its own git worktree rooted at a specific commit. This is the main safety boundary, not an implementation detail. Concurrent tasks do not share the same checkout, every attempt has a concrete filesystem snapshot, and rejected work does not merge to main through dgov.

Validation over trust

A worker is not trusted because it sounds confident. Worker candidates pass through scope checks, ruff auto-fix, and configured validation gates before merge. Those gates can include lint, type-check, targeted tests, sentrux comparison, and coverage when the corresponding project config and baselines are present. The sentrux baseline at .sentrux/baseline.json is governor-owned state; worker edits to .sentrux/baseline.json and .sentrux/dgov-baseline.json are rejected during review. A clean complete full-plan run refreshes accepted sentrux baseline metadata only after the post-run sentrux comparison passes.

Explicit claims over implicit intent

Worker tasks declare the files they will create, edit, delete, or read. Without file claims, parallel execution is guesswork. With claims, dgov rejects invalid plans at compile time and can explain why two tasks cannot run in parallel.

What dgov is

  • a DAG execution engine for AI coding tasks
  • a local git-worktree orchestrator
  • a settlement layer around model output
  • a tool for inspectable, event-backed automation in real repos

What dgov is not

  • a replacement for git
  • a remote task queue
  • a multi-provider abstraction layer
  • a chat frontend
  • a system that hides repo state from you

For "just do something smart in this checkout," simpler tools exist. dgov is for cases where structure and auditability matter.

Architecture

ModuleRole
kernel.pyPure (state, event) → (new_state, actions) — no I/O
runner.pyAsync DAG executor that bridges kernel actions to real I/O
worker.py, workers/OpenAI-compatible worker subprocess and its tools
planner.pyAuto-plan generator that powers dgov plan create
researcher.pyRead-only research role driver
worktree.pySnapshot isolation through git worktrees
settlement.py, settlement_flow.pyruff auto-fix, lint, type-check, tests, sentrux, coverage, integration candidates
semantic_settlement.pyDeterministic Python semantic checks on integration candidates
tool_policy.py, tool_audit.pyWorker tool allow/deny policy and telemetry audit
policy_drift.pyDetects drift between canonical policy sources and packaged mirrors
plan.py, plan_tree.py, dag_parser.pyTOML plan parsing, tree walk, DAG compilation
plan_review.pyPost-hoc debrief that powers dgov plan review
sop_bundler.py, prompt_builder.pySOP loading and final prompt assembly
bootstrap_policy.py, bootstrap_policy_data/Default SOPs and governor templates for dgov init
agent_skills.py, agent_skill_data/Shipped machine-agent skills for dgov agents sync
deploy_log.pyAppend-only JSONL deploy history
archive.pyPlan archival on success
config.pyProjectConfig and load_project_config()
persistence/SQLite event store, runtime artifact rows, slug history, ledger
cli/Click interface

Kernel model

The kernel does not know about subprocesses, git, HTTP, or the OpenAI client. It takes events and current state, returns new state and outgoing actions, and nothing else. The runner does the messy part.

That separation buys:

  • easier unit testing of lifecycle transitions
  • crash recovery through event replay
  • explicit failure boundaries
  • fewer hidden side effects in the kernel path

Authorities

Three stores carry different jobs:

  • Event log — lifecycle authority. Dispatch, retry, status, watch, and live cleanup decisions derive from events.
  • Deploy log — landed-output authority. The deploy log records what actually made it onto main. Its append order is the canonical order for selecting the newest upstream base ref for dependent worktrees, avoiding timestamp-tie ambiguity.
  • Runtime artifact rows — bookkeeping. These rows cache worktree paths, branch names, and similar operational crumbs for debugging and cleanup. They do not define lifecycle truth.

If a row in runtime_artifacts disagrees with the event log, the event log wins.

Snapshot isolation

Every dispatched task gets its own branch and worktree rooted at a specific commit. Root tasks use HEAD, while dependent tasks base from the latest upstream deploy record by deploy-log append order. That gives dgov a concrete notion of "attempt." The bootstrap path is careful because branching requires a real snapshot — dgov does not need GitHub, but it does need local git state.

Settlement

The worker implements. Settlement judges. The sequence:

  1. fast review checks (scope, claim integrity, transient tool activity)
  2. mechanical cleanup (ruff auto-fix, format)
  3. isolated validation gates (lint/format plus configured type-check, tests, sentrux, coverage)
  4. shadow integration candidate creation
  5. deterministic Python semantic checks on the candidate
  6. integrated-candidate validation with the same settlement gates
  7. merge or reject

Collapsing these roles produces a system that is harder to reason about and easier to fool. Keeping them split lets the kernel record exactly which gate rejected a candidate.

Gate map

GateWhen it runsWhat it seesMutates?Failure shape
Structural reviewBefore worker commitWorker git status, file claims, tool activity logNoscope, reserved path, empty diff, review hook
AutofixBefore worker commitWorker-changed Python filesYesautofix command failure
Isolated validationAfter worker commitWorker branch aloneNolint/format failure; configured type, test, coverage, or sentrux failure
Integration candidateAfter isolated validationTask commit replayed onto target HEADTemporary workspace onlytext conflict
Python semantic gateOn integration candidateCandidate Python files plus base/task/target symbol tablesNosame-symbol edit, duplicate definition, signature drift, syntax conflict
Candidate validationAfter semantic gateIntegrated candidateNobehavioral mismatch
Final mergeAfter all gates passTarget worktree and task branchYesgit merge failure

Semantic settlement

A clean git merge is necessary but not sufficient. The semantic layer is deterministic and Python-scoped: it checks the integrated candidate with AST and symbol-table evidence, not an LLM verdict. It catches a subset of Python-level integration conflicts where valid isolated task commits combine poorly.

Failure taxonomy

When semantic settlement rejects a candidate, it classifies the failure:

Failure ClassMeaning
TEXT_CONFLICTGit cannot replay the task commit cleanly on target HEAD
SYNTAX_CONFLICTThe integrated file no longer parses
SAME_SYMBOL_EDITBoth sides changed the same Python symbol
DUPLICATE_DEFINITIONThe integrated code defines the same symbol in multiple files
SIGNATURE_DRIFTA public callable changed its signature relative to base or target
BEHAVIORAL_MISMATCHParse-level checks pass but settlement gates fail

The class drives remediation: SAME_SYMBOL_EDIT points at coordination, SIGNATURE_DRIFT points at a stale task base. ORDERING_CONFLICT exists in the taxonomy for future gates but is not emitted by the current Python semantic gate.

Risk levels

Before attempting integration, dgov scores risk:

LevelMeaningAction
NONE / LOWNo or minimal detected riskContinue to integration candidate
MEDIUMElevated detected riskContinue to integration candidate
HIGHSignificant detected riskContinue to integration candidate
CRITICALNear-certain conflict signalReject before candidate creation

Risk scoring currently considers deterministic Python overlap evidence collected from the task commit, task base, and target HEAD. Non-critical risk is telemetry; CRITICAL risk rejects before candidate creation.

Symbol overlap as evidence

When two tasks touch the same code entity, dgov captures the overlap as structured evidence: symbol name, type, file path, and the line ranges each side modified. Instead of "merge failed," the system reports: "Task A and Task B both edited process_order() in checkout.py — Task A at lines 45–52, Task B at lines 48–55."

Integration candidates

The semantic layer builds ephemeral integration candidates before the real merge: a temporary workspace rooted at target HEAD, the task commit replayed onto it, and the settlement pipeline run against the result. Worker commits that pass this shadow integration proceed to the recorded merge.

Plans as control surfaces

Plans are not just prompts in a file. They are the primary control surface for execution:

  • dependencies define legal order
  • file claims define legal concurrency
  • prompts define local intent
  • commit messages define merge outcomes

dgov compile exists to produce that normalized artifact. Compile turns a human-edited tree into something the runner can execute without reinterpreting structure on the fly.

Why git worktrees

Git worktrees are the execution primitive because they give dgov three distinct properties: cheap branch-isolated sandboxes, ordinary git merge semantics, and native inspectability with standard git tools. You debug a broken task with git log, git diff, and git show, not with an internal state format.

Tradeoffs

dgov makes some tradeoffs on purpose.

Stronger setup requirements. You need a real local repo, a working toolchain, and an OpenAI-compatible endpoint. That is more than a browser chat box. The payoff is that the system operates on your real repo with real validation.

More structure up front. Plans, file claims, and a compile step add ceremony. The ceremony buys explicit concurrency, event-derived execution state, and fewer silent footguns.

Narrower provider model. dgov supports OpenAI-compatible endpoints rather than every provider-native API shape. Deliberate. It keeps the runtime surface small while still covering Fireworks, OpenAI, OpenRouter, and similar APIs.

Failure model

dgov treats these as hard failures:

  • invalid plan references stop at compile time
  • a missing API key fails before dispatch
  • out-of-scope edits fail review
  • configured test failures fail settlement
  • rejected work does not merge

Post-run sentrux degradation is reported separately as a degraded run status so the operator can remediate landed work. Degraded, partial, failed, or --only runs do not refresh the accepted baseline.

Mental model

A deterministic kernel around a probabilistic worker.