How verification works
OGONG rests on one fact: checking an answer is far cheaper than producing it. Call the ratio ρ (rho): the cost of verifying divided by the cost of generating. On datacenter GPUs ρ ≈ 1% (about 100x cheaper); even on Apple Silicon, the weakest targeted backend, it’s ≈ 5% (about 20x cheaper).
That number is the whole game. If verifying is cheap, the network can check nearly every answer, and once almost everything is checked, you no longer need providers to post a big slashable deposit to keep them honest. Cheap full-coverage checking replaces the bond.
The reason verification is this cheap is that the checker never re-generates the answer. It teacher-forces a single forward pass over the prompt plus the claimed output and reads the model’s internal numbers off that one pass. Generation is autoregressive: one slow step per token. A teacher-forced prefill does the whole sequence in one batched pass, and that is where the ~100x comes from.
The commitment: proving what you did
When a provider answers a request, it generates in fixed windows of 32 tokens and emits a
small leaf per window. The leaves form a Merkle tree whose root is the commit_root.
Each leaf binds two complementary fingerprints of how the output was produced:
- Hidden-state sketch. A commitment over the model’s per-token last hidden states (the input to the language-model head), captured as a sign-random-projection (SRP) sketch: a fixed, public bank of random ±1 directions, identical for provider and verifier. Every direction mixes all coordinates, so the comparison is well-conditioned and the checked subspace is not the provider’s to choose; a substitute model cannot hide in a hand-picked corner of the activation space. This is the sharper of the two checks.
- Logprob digest. The top-k log-probabilities at every decode position in the window. Committing all positions is strictly stronger than sampling a few.
The provider signs a per-reply record. The signed payload binds the request, the response, and the model identity together:
record = ( reply_id, req_hash, resp_hash, model_root, commit_root, n_tokens, t0, t1 )
sig = Ed25519(record) ‖ ML-DSA-44(record)
The signature is hybrid post-quantum: a classical Ed25519 signature and an ML-DSA-44
(lattice) signature, so the record stays valid even if one scheme is later broken.
model_root is a SHA-256 over the model’s ordered shard content hashes, which implicitly
binds the quantization, since the quant format is part of the bytes being hashed. The record
is pushed to the handling validator at end-of-stream (a few hundred bytes), so the
commitment is anchored even if the provider later goes offline. A provider that cannot
produce its openings simply fails the audit.
The audit: re-checking without re-generating
A provider that serves a cheaper model in place of the one it promised is wearing a disguise. The network’s auditors, the Golden Eyes (named for the fiery gaze that sees through any transformation), are what catch it.
A validator decides whether to audit a given reply using a coverage rate α (alpha), drawn
from a verifiable random function (VRF) over the reply id. --alpha 1 audits every reply,
and the design target is full coverage. Because the draw is unpredictable and an audit may run
any time within the reply’s audit window, a provider cannot tell which replies are checked,
so it cannot serve the real model only when it thinks it’s being watched.
The randomness is a committee threshold-BLS beacon, not one node’s coin flip. The validators share a single BLS key, set up by a dealerless distributed key generation (DKG) so no one ever holds the whole key, and each epoch’s beacon is the unique threshold signature over that epoch. Because a BLS threshold signature is the same no matter which validators combine their shares, no coalition can predict, grind, or steer the draw, and a validator cannot move it by withholding (the remaining shares reconstruct the identical value). Anyone can verify the beacon against the group public key. It is a drand-style construction, and it is what closes the “watch, then decide” attack. (A per-validator VRF aggregate remains as a bootstrap fallback until the committee key is established.)
For a selected reply the validator assigns a verifier, and the selection is
model-aware: the verifier is drawn from the other providers already serving the same
model_root, because only a peer running the same model can teacher-force it. The verifier:
- fetches the original request and the claimed output,
- runs one teacher-forced prefill of (prompt + claimed output), with no re-generation,
- recomputes both fingerprints and scores the drift against the commitment,
- returns Accept or Reject.
Two checks, spanning the spectrum of cheating:
| Check | What it computes | Rejects if | Catches |
|---|---|---|---|
| Hidden-state | relative-L2 between recomputed and committed SRP sketch | > 0.10 | wrong/smaller model, distillation mimics, coarse quant (Q4_0 / Q3 / Q2) |
| Logprob | top-k Kolmogorov–Smirnov sup-norm (with total variation as a companion signal) | KS > 0.10 | near-lossless quant fraud (Q8–Q4_K_M) |
An honest re-run scores essentially zero on both. A substituted model scores ~1.1 on the hidden-state check: an order-of-magnitude margin, decisive on a single reply. The two checks are complementary. The hidden-state sketch rejects size swaps and aggressive quantization; the logprob check covers the near-lossless band a sketch might wave through. Matching the output distribution (distillation) does not help an impostor: it would still have to reproduce the reference model’s internal activations, a strictly harder target.
Only an Accept is allowed to settle. The verifier re-runs on its own engine instance; co-locating it with the provider is just a demo convenience, and soundness comes from the re-execution being independent. Verifiers are paid a flat fee per audit regardless of verdict, so they’re neutral on the outcome, and the validator periodically slips in honeypot audits carrying a known-bad output; a verifier that rubber-stamps one is itself slashed.
Settlement: money follows the check
Verification gates payment, and the validator never runs the model itself; it only
adjudicates the verifier’s scores. Before applying any threshold it checks Merkle
inclusion of the scored windows against the signed commit_root, so a score computed
against material the provider never committed is rejected as tampered.
- The consumer’s fee sits in on-chain escrow.
- The handling validator, having adjudicated Accept, gathers co-signatures from a quorum of registered validators (a stake-weighted supermajority, more than two-thirds of validator stake).
- With the quorum’s co-signatures it submits the on-chain settle.
- Escrow releases and the parties are paid.
A reply that fails its audit never settles: its fee is withheld and refunded to the consumer (the wronged party), not paid to whoever caught the cheat, so no one profits from a reject and there’s no incentive to fabricate one. A reply without a validator quorum never settles either. Correctness and consensus both have to hold.
The sequential audit: one strike is rarely the whole story
Individual verdicts feed a sequential probability ratio test (SPRT) per provider. An honest provider’s occasional cross-hardware noise won’t eject it; the lifetime false-ejection rate is held below a target β (the Ville bound, ~0.1%). A provider that cheats persistently crosses the threshold and is ejected in a number of audits that grows only logarithmically in 1/β. In the measured hidden-state regime the margin is so wide that a single audited reject is already conclusive.
Why there is no correctness bond
Most pay-for-work networks make the worker post a slashable bond: catch them cheating and you burn it. The bond exists for one reason, catching the cheat is expensive. If you can only afford to re-check one request in a thousand, a cheater is caught about once in a thousand tries, so the punishment has to be a thousand times the per-request gain, far more than a single fee. The bond is just the multiplier that compensates for rarely looking.
A bond, in other words, is a tax you pay for not being able to check the work, and OGONG removes the reason for it. Because verification is cheap enough to cover nearly every request, a cheat is caught essentially every time, so the deterrent can be the one thing already on the table: the escrowed fee for the cheated request. Since any working market prices a request above the compute saved by cheating, forfeiting that single fee already outweighs the cheat. Honesty wins with no bond and no reputation stake required. (The result is machine-checked in Z3 and Lean, and cross-checked as a game in PRISM-games.)
So staking on OGONG buys routing priority and availability, more stake means more routed work and earnings, not a deposit you lose for a wrong answer; on-chain, a slash against provider stake is rejected outright. The only slashable bond the system keeps is the validator’s, posted against issuing false verdicts (a different role). Sybil resistance costs no capital either: the proof that an identity is a distinct physical GPU is its verification duty, so the anti-Sybil work is the audit, not burned collateral. See Tokenomics.