Quality

Testing and CI

Understand which Origin checks run locally, which run in GitHub Actions, and which evals stay manual.

Qi-Xuan LuUpdated Jun 1, 20265 min read

At a glance

Origin splits checks by cost and signal: fast local checks gate development; heavy evals stay manual.

PR CI must prove daemon correctness, but retrieval-quality claims need separate eval discipline.

Why checks are layered

Origin is a local daemon, CLI, MCP server, core library, and shared type crate. A single slow mega-check would make normal contribution work worse.

The repo separates correctness checks from quality measurement. Tests and clippy gate normal changes; coverage and evals inform decisions without pretending to be cheap smoke tests.

Check layers

Local iteration      targeted cargo test / cargo check
Pre-commit          cargo fmt --all + clippy on changed crates
Pre-push            workspace clippy + workspace library tests
PR CI               fmt, lint, tests for daemon crates
Coverage            informational on PR, not a local gate
Manual eval         GPU/API-backed benchmarks, run on demand

Local verification

Use targeted crate tests while iterating, then run full formatting, clippy, and tests before opening or merging a PR.

The public contributor path expects evidence. If a change affects behavior, include the smallest relevant test rather than relying on manual inspection.

Contributor checks

cargo fmt --check --all
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace

# faster iteration examples
cargo test -p origin-core --lib
cargo test -p origin-server

Git hooks

The repo includes hooks for routine local guardrails. Pre-commit handles formatting and changed-crate clippy. Pre-push runs workspace clippy plus library tests.

Hooks reduce CI churn, but they do not replace the final PR checks. Treat them as early feedback.

Hook setup

bash scripts/setup-hooks.sh

# hooks then run focused checks before commit/push
git commit
git push

CI and coverage

GitHub Actions runs the required PR gate: formatting, linting, and tests across the daemon workspace. Coverage runs separately as informational signal.

Coverage is not a pre-push percentage gate. The project intentionally avoids local coverage gates that are slow, brittle, and not mirrored by the required CI lane.

Manual evals

LoCoMo, LongMemEval, KG faithfulness, page faithfulness, and API-backed judge runs have different cost and hardware requirements. Some run only as ignored tests or manual eval workflows.

Do not cite new retrieval or quality numbers from a casual run. Public benchmark claims should follow the eval docs and state the fixture, model, run count, and limits.

Read evaluation

Before asking for review

A good PR says what changed, why it matters, and how it was tested. Include the commands that prove the change instead of saying it should work.

Docs-only changes still need a build. Code changes should include relevant tests or a clear explanation of why the behavior is covered elsewhere.

Read development conventions

Development Conventions

Codebase rules that keep Origin's daemon, CLI, MCP connector, shared types, and core logic maintainable.