Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

OGONG

Decentralized AI inference you don't have to trust. You can check it.

The idea in one paragraph

The world has far more capable GPUs than any one company runs. The reason inference still piles up inside a few big providers isn’t a shortage of compute. It’s trust. You can’t safely pay a stranger’s GPU for an answer you have no way to check. It could quietly swap in a cheaper model, hand back a stale cached reply, or just run the model badly, and you would never know.

OGONG removes that problem. Providers prove their work, independent validators re-check a sample of it, and payment stays in escrow until the check passes. Because checking an answer is roughly 100x cheaper than producing it, the network can verify nearly everything. So anyone’s GPU becomes safe to pay, and inference spreads back out.

CLI-first, no GUI required

Every role in OGONG runs headless from the command line, with no dependency on any hosted or branded stack, and no GUI to install. The command line is the real thing.

ogong-validatord   # validator / audit node
ogong-verifierd    # re-runs committed work to score an audit
ogong-routerd      # provider marketplace match engine
ogong-gatewayd     # OpenAI-compatible consumer endpoint
ogong-provider     # provider daemon, also a zero-signup local model server

Every piece above is real software you run from the command line. The sections that follow link straight into it.

What is OGONG

OGONG is a network where anyone can serve AI inference from their own GPU and get paid, and where the correctness of every answer is checked by the protocol instead of taken on faith.

The problem

There are far more capable GPUs in the world than any single company runs. So why does inference still concentrate inside a handful of big providers? Not because compute is scarce. It’s because trust is.

If you pay an anonymous machine for an answer, how do you know it actually ran the model you asked for? It could quietly swap in a cheaper, smaller model. It could hand back a stale cached reply. It could just run the model badly. You’d have no way to tell. That uncertainty is what forced inference to huddle inside a few operators you simply have to trust.

The fix

OGONG makes a stranger’s GPU safe to pay, using two ideas:

  1. Cheap verification. Providers commit to their work (a compact fingerprint of how they produced the answer), and an independent verifier re-checks it by teacher-forcing a single pass over the answer, with no re-generation. Checking is far cheaper than generating (about 100x cheaper), so the network can verify nearly everything.
  2. Hardware attestation (for privacy). A provider running inside a secure enclave (a TEE) can prove, with a hardware-signed certificate, exactly what code and model it’s running, and that the operator can’t even read your prompt.

Money rides on top of this: the consumer’s payment sits in escrow and is only released after the answer passes its check and a quorum of validators co-signs.

How an answer flows

consumer ──▶ gateway ──▶ router ──▶ provider (GPU + model)
                                        │  commits its work, signs a record
                                        ▼
                                   validator ──▶ audits a random sample
                                        │             (verifier re-runs it)
                                        ▼
                                   escrow ──▶ quorum co-signs ──▶ provider paid
  • The provider serves the model and commits a checkable record of what it produced.
  • A validator audits a random slice of that work and, on success, co-signs payment.
  • The chain holds the escrow and only releases on a validator quorum.

See Network roles for what each piece does, and How verification works for the checking mechanics.

More than text

OGONG verifies far more than chat. The same commit-and-check mechanism covers every modality a provider can serve:

  • Text and vision: chat completions, embeddings, and image-input (vision) prompts.
  • Image generation: diffusion image models.
  • Audio: music generation, text-to-speech (TTS), and speech-to-text (STT).
  • Video: latent video diffusion.

The trick that makes text cheap to verify carries to all of them. For diffusion, audio, and video, the provider commits a trajectory (the sampled denoising steps) and the verifier re-runs a single step to check it, so a full audit stays a small fraction of generation cost no matter the modality. See Images, audio & video for how that check works.

Bigger than one GPU

A model too large for any single GPU is served by a cohort of providers, each running a slice of its layers, each slice independently verified and paid. It is the capability that the whole zero-bond design exists to make possible. See Split inference.

What OGONG is not

  • Not a hosted product. OGONG is an open protocol and a set of CLI binaries you run. (A separate hosted service may use the network, but that’s a different product.)
  • Not private by default. Privacy and verifiability are different things. Only the TEE tier hides your content from the operator; the Verified tier proves correctness on an ordinary GPU but the provider can still see what it serves. The docs are careful about this; see Trust tiers.

Trust tiers

Every model on OGONG carries a trust tier that tells you exactly what guarantee you’re getting. The tier describes how the work is served and checked, not which company made the model.

Two guarantees, plainly

The whitepaper draws the line at two things, and they are not the same:

  • Privacy: can the machine operator read your prompt and the answer?
  • Correctness: can you be sure the answer really came from the model you asked for?

These are two independent axes, and OGONG offers exactly two tiers, one for each way of pinning down correctness. There is no unverified, “just trust me” tier: all supply on the network is checked one way or the other.

TierPrivacyCorrectnessHardware
Confidential (TEE)Operator is blind (enclave)Hardware attestationTDX / SEV-SNP + NVIDIA CC
VerifiedOperator sees the contentStatistical audit (re-check)Any GPU, incl. Apple Silicon

Confidential (TEE): verifiably private

The provider runs inside a Trusted Execution Environment and produces a hardware-signed attestation (a DCAP quote on Intel TDX, with NVIDIA Confidential Computing for the GPU). The quote proves what code and model are running and gives you an encrypted channel into the enclave. The operator of the machine cannot read your prompt or the response.

  • Trust root: hardware attestation, checked by you (or a validator on your behalf), and you must check it before you send anything, which is what turns “private” into verifiably private.
  • Per-reply guarantee: the enclave signs a receipt over each response, so correctness is hardware-attested for the specific answer you got; settlement won’t release on a receipt that doesn’t verify.
  • The honest caveat: confidentiality is only as strong as the TEE itself. A working enclave break on the host serving your request would defeat privacy, and attestation is only as current as the platform’s security version: a valid quote on revoked or out-of-date microcode is rejected. Note the asymmetry: under a compromised minority of enclaves, network correctness still holds (it rests on the audit and the honest validator majority); it’s privacy that degrades.
  • Use it for: anything sensitive. Content never leaves the enclave in the clear.

Verified: correct, but the provider can see it

This is the public-compute tier, on any GPU and no special hardware. The provider commits to its work and a verifier re-checks it cheaply by teacher-forcing a single pass (see How verification works). If the re-check agrees, the work is accepted and paid. This proves the answer came from the claimed model.

  • Trust root: a cryptographic commitment plus an independent statistical audit.
  • What kind of guarantee: probabilistic, not cryptographic. Empirically the separation between honest and cheating work is wide (a substituted model misses by ~10x), but it’s a statistical bound calibrated per hardware pair, not a worst-case proof. Keeping the honest cross-hardware drift clear of near-lossless quantization fraud is the network’s central open calibration gate, so thresholds are set conservatively. For a hard, per-reply correctness claim today, the Confidential (TEE) tier is the stronger basis.
  • Caveat: the serving machine sees the content. Your identity is stripped at the router (the provider doesn’t learn who you are), but the prompt text is visible to the GPU running it. If you need privacy, use Confidential.

Why no “private on commodity hardware” tier?

Because you can’t have it. Hiding content from the operator requires a TEE. On an ordinary GPU the operator can always read what the card is processing, so the Verified tier can prove correctness but never privacy. OGONG is honest about this rather than blurring it.

Running your own GPU

Pointing a tool at a GPU you own is a different thing from picking a network tier: there’s no third party to attest or audit, because you already trust the machine. That’s local mode: a zero-signup, no-account server you run for yourself or a friend over an encrypted tunnel. It isn’t part of the marketplace’s tier system. See Local mode.

How verification works

OGONG rests on one fact: checking an answer is far cheaper than producing it. Call the ratio ρ (rho): the cost of verifying divided by the cost of generating. On datacenter GPUs ρ ≈ 1% (about 100x cheaper); even on Apple Silicon, the weakest targeted backend, it’s ≈ 5% (about 20x cheaper).

That number is the whole game. If verifying is cheap, the network can check nearly every answer, and once almost everything is checked, you no longer need providers to post a big slashable deposit to keep them honest. Cheap full-coverage checking replaces the bond.

The reason verification is this cheap is that the checker never re-generates the answer. It teacher-forces a single forward pass over the prompt plus the claimed output and reads the model’s internal numbers off that one pass. Generation is autoregressive: one slow step per token. A teacher-forced prefill does the whole sequence in one batched pass, and that is where the ~100x comes from.

The commitment: proving what you did

When a provider answers a request, it generates in fixed windows of 32 tokens and emits a small leaf per window. The leaves form a Merkle tree whose root is the commit_root. Each leaf binds two complementary fingerprints of how the output was produced:

  • Hidden-state sketch. A commitment over the model’s per-token last hidden states (the input to the language-model head), captured as a sign-random-projection (SRP) sketch: a fixed, public bank of random ±1 directions, identical for provider and verifier. Every direction mixes all coordinates, so the comparison is well-conditioned and the checked subspace is not the provider’s to choose; a substitute model cannot hide in a hand-picked corner of the activation space. This is the sharper of the two checks.
  • Logprob digest. The top-k log-probabilities at every decode position in the window. Committing all positions is strictly stronger than sampling a few.

The provider signs a per-reply record. The signed payload binds the request, the response, and the model identity together:

record = ( reply_id, req_hash, resp_hash, model_root, commit_root, n_tokens, t0, t1 )
sig     = Ed25519(record) ‖ ML-DSA-44(record)

The signature is hybrid post-quantum: a classical Ed25519 signature and an ML-DSA-44 (lattice) signature, so the record stays valid even if one scheme is later broken. model_root is a SHA-256 over the model’s ordered shard content hashes, which implicitly binds the quantization, since the quant format is part of the bytes being hashed. The record is pushed to the handling validator at end-of-stream (a few hundred bytes), so the commitment is anchored even if the provider later goes offline. A provider that cannot produce its openings simply fails the audit.

The audit: re-checking without re-generating

A provider that serves a cheaper model in place of the one it promised is wearing a disguise. The network’s auditors, the Golden Eyes (named for the fiery gaze that sees through any transformation), are what catch it.

A validator decides whether to audit a given reply using a coverage rate α (alpha), drawn from a verifiable random function (VRF) over the reply id. --alpha 1 audits every reply, and the design target is full coverage. Because the draw is unpredictable and an audit may run any time within the reply’s audit window, a provider cannot tell which replies are checked, so it cannot serve the real model only when it thinks it’s being watched.

The randomness is a committee threshold-BLS beacon, not one node’s coin flip. The validators share a single BLS key, set up by a dealerless distributed key generation (DKG) so no one ever holds the whole key, and each epoch’s beacon is the unique threshold signature over that epoch. Because a BLS threshold signature is the same no matter which validators combine their shares, no coalition can predict, grind, or steer the draw, and a validator cannot move it by withholding (the remaining shares reconstruct the identical value). Anyone can verify the beacon against the group public key. It is a drand-style construction, and it is what closes the “watch, then decide” attack. (A per-validator VRF aggregate remains as a bootstrap fallback until the committee key is established.)

For a selected reply the validator assigns a verifier, and the selection is model-aware: the verifier is drawn from the other providers already serving the same model_root, because only a peer running the same model can teacher-force it. The verifier:

  1. fetches the original request and the claimed output,
  2. runs one teacher-forced prefill of (prompt + claimed output), with no re-generation,
  3. recomputes both fingerprints and scores the drift against the commitment,
  4. returns Accept or Reject.

Two checks, spanning the spectrum of cheating:

CheckWhat it computesRejects ifCatches
Hidden-staterelative-L2 between recomputed and committed SRP sketch> 0.10wrong/smaller model, distillation mimics, coarse quant (Q4_0 / Q3 / Q2)
Logprobtop-k Kolmogorov–Smirnov sup-norm (with total variation as a companion signal)KS > 0.10near-lossless quant fraud (Q8–Q4_K_M)

An honest re-run scores essentially zero on both. A substituted model scores ~1.1 on the hidden-state check: an order-of-magnitude margin, decisive on a single reply. The two checks are complementary. The hidden-state sketch rejects size swaps and aggressive quantization; the logprob check covers the near-lossless band a sketch might wave through. Matching the output distribution (distillation) does not help an impostor: it would still have to reproduce the reference model’s internal activations, a strictly harder target.

Only an Accept is allowed to settle. The verifier re-runs on its own engine instance; co-locating it with the provider is just a demo convenience, and soundness comes from the re-execution being independent. Verifiers are paid a flat fee per audit regardless of verdict, so they’re neutral on the outcome, and the validator periodically slips in honeypot audits carrying a known-bad output; a verifier that rubber-stamps one is itself slashed.

Settlement: money follows the check

Verification gates payment, and the validator never runs the model itself; it only adjudicates the verifier’s scores. Before applying any threshold it checks Merkle inclusion of the scored windows against the signed commit_root, so a score computed against material the provider never committed is rejected as tampered.

  1. The consumer’s fee sits in on-chain escrow.
  2. The handling validator, having adjudicated Accept, gathers co-signatures from a quorum of registered validators (a stake-weighted supermajority, more than two-thirds of validator stake).
  3. With the quorum’s co-signatures it submits the on-chain settle.
  4. Escrow releases and the parties are paid.

A reply that fails its audit never settles: its fee is withheld and refunded to the consumer (the wronged party), not paid to whoever caught the cheat, so no one profits from a reject and there’s no incentive to fabricate one. A reply without a validator quorum never settles either. Correctness and consensus both have to hold.

The sequential audit: one strike is rarely the whole story

Individual verdicts feed a sequential probability ratio test (SPRT) per provider. An honest provider’s occasional cross-hardware noise won’t eject it; the lifetime false-ejection rate is held below a target β (the Ville bound, ~0.1%). A provider that cheats persistently crosses the threshold and is ejected in a number of audits that grows only logarithmically in 1/β. In the measured hidden-state regime the margin is so wide that a single audited reject is already conclusive.

Why there is no correctness bond

Most pay-for-work networks make the worker post a slashable bond: catch them cheating and you burn it. The bond exists for one reason, catching the cheat is expensive. If you can only afford to re-check one request in a thousand, a cheater is caught about once in a thousand tries, so the punishment has to be a thousand times the per-request gain, far more than a single fee. The bond is just the multiplier that compensates for rarely looking.

A bond, in other words, is a tax you pay for not being able to check the work, and OGONG removes the reason for it. Because verification is cheap enough to cover nearly every request, a cheat is caught essentially every time, so the deterrent can be the one thing already on the table: the escrowed fee for the cheated request. Since any working market prices a request above the compute saved by cheating, forfeiting that single fee already outweighs the cheat. Honesty wins with no bond and no reputation stake required. (The result is machine-checked in Z3 and Lean, and cross-checked as a game in PRISM-games.)

So staking on OGONG buys routing priority and availability, more stake means more routed work and earnings, not a deposit you lose for a wrong answer; on-chain, a slash against provider stake is rejected outright. The only slashable bond the system keeps is the validator’s, posted against issuing false verdicts (a different role). Sybil resistance costs no capital either: the proof that an identity is a distinct physical GPU is its verification duty, so the anti-Sybil work is the audit, not burned collateral. See Tokenomics.

Verifying images, audio & video

OGONG verifies more than text. The same protocol covers image, audio, and video models, but the check itself is different, because these models work differently from a chat model.

Why the text check doesn’t transfer

A language model produces a probability distribution over the next token at every step, and that distribution is a fingerprint of the exact computation that made it. The Verified-tier check (see How verification works) commits and cheaply re-derives those distributions.

A diffusion or flow model hands you nothing like that. It starts from noise and runs N denoising steps down to a final latent, then decodes that latent to pixels or audio. There is no per-token distribution and no position to teacher-force. The only natural thing to look at is the final output, and the output is exactly what a cheaper computation can forge. So the text defenses are not weak here; they simply do not apply.

Check the process, not the output

The insight is that an output is not evidence of the computation that produced it; a process is. A diffusion model’s computation is not a single result, it is a sequence of N steps, each one a forward pass of the same network the provider claims to run. That sequence is checkable in exactly the way a lone output is not.

So the provider commits a trajectory: a Merkle root over the latent at sampled denoising steps, plus the final latent. To verify, an auditor:

  1. draws a step at random,
  2. asks the provider to open the committed latents at that step (Merkle proofs checked first),
  3. runs one reference denoising step from the committed input, and
  4. accepts if the result matches the committed output within a tolerance.

One step re-run against the N the provider performed: cost ρ ≈ 1/N, the same cheap-check economics as text. Sampling k steps instead of one raises both the cost and the per-request catch rate.

The Merkle-inclusion check runs before the tolerance check, so a provider cannot serve one trajectory, commit another, and reveal whichever is convenient. The only way to pass is for every committed step to match the reference model’s step, which is to say, to actually run the model. A final check then decodes the committed last latent and confirms it matches the served bytes, so a provider cannot run the honest trajectory and hand back a different output.

Measured on three engines

The primitive is implemented and measured on three independent engines, with no shared code:

ModalityEngineHonest re-runA cheat scores
Audio3.5B diffusion-transformer (flow)exact (rel-L2 = 0)0.27 (5% conditioning change)
Image1.5B Euler latent diffusionexact (rel-L2 = 0)1.0 (changed prompt)
VideoWan latent video diffusionexact (rel-L2 = 0)rejected (fabricated step)

An honest re-run reproduces each step exactly; a substituted computation lands two to three orders of magnitude away.

An honest caveat (and a happy one)

The accept tolerance is a measured quantity, not a proven constant. A different GPU or kernel reproduces a latent with a small nonzero drift, so the threshold is set from the honest cross-hardware drift, and the guarantee is that this drift stays clear of a cheat’s divergence. For diffusion that separation is comfortable: the honest drift is tiny and a cheat diverges by 0.27 to 1.0. That actually makes diffusion a cleaner verification target than text, where the near-lossless quantization band is the hard case.

One subtlety: guidance schemes that carry momentum across steps are not reproducible from a single committed latent inside the guidance window, so the auditor draws its re-check step from outside that window, where a step depends only on its input.

Split inference: models too big for one GPU

Some models are too large to fit on any single GPU a casual provider owns. OGONG serves them anyway, by splitting one model across a cohort of providers, each running a slice of its layers, with every slice independently verified and paid. This is verified split inference: decentralized inference of a frontier model across ordinary GPUs that no single machine could hold.

Why only a zero-bond network can do this

Sharding a model across machines is not new. Doing it across untrusted, unbonded machines is. In a design where each provider must post a slashable bond, sharding multiplies the capital barrier by the number of shards: ten segments, ten bonds. Casual nodes never clear that bar.

OGONG posts no correctness bond (see Why there is no correctness bond), so the barrier doesn’t multiply. A cohort of ordinary, unbonded GPUs can serve a frontier model, each segment paid only for the layers it ran.

How a cohort serves one model

  1. A provider advertises a segment: the range of layers it can run.
  2. The router assembles the cheapest cohort whose segments tile the whole model, end to end.
  3. A lead drives the request through the cohort: the first shard runs from the prompt, each interior shard runs its layers from the previous shard’s output, and the result flows down the chain.
  4. Each shard commits the hidden state at its layer boundary and signs it with its provider key. The commitments chain: one segment’s output is the next’s input.

So the model is computed in a relay, and the relay leaves a signed, checkable trail. It is architecture-agnostic because it rides the residual stream every decoder transformer exposes; the only per-model detail is the input embedding an interior shard skips.

Verified per segment

The same cheap audit that checks a whole model checks each segment. A validator re-runs a sampled segment and confirms its boundary reproduces:

  • An honest re-run reproduces the boundary essentially exactly (on a 2B model: ~0% on the same engine, ~0.6% drift across backends), while a substituted sub-computation lands ~30% off, a roughly 50x separation, on the same calibration the whole-model check uses.
  • Because each boundary is signed, a cheat is localized to the one provider that produced it, with no trusted lead. A caught segment withholds the whole request (the consumer is refunded) and ejects exactly that provider. An honest shard risks nothing.
  • The deterrence holds per segment, and it is a formally proven, machine-checked result. A shard’s compute saving and its fee share both scale with the layers it runs, but the stake it puts at risk does not shrink with its slice. So a smaller shard is, if anything, more deterred, and sharding never weakens the honesty guarantee that protects a whole model.

End to end, a two-shard cohort in which each shard loads only its own layers reproduces the single-machine model’s output to a relative difference of about 1e-5.

Settlement is a single cohort settle: each shard is paid for its slice under a conservation invariant, the per-shard amounts must sum to the provider’s share, so a release can never exceed the request’s fee. A non-conserving split is rejected on-chain with no funds moved.

A topology, not a tier

Split inference is a serving topology, orthogonal to the trust tiers. It composes with both: a cohort’s guarantee follows the tier of its shards, a cohort of Verified shards is Verified, a cohort of Confidential shards is Confidential. See How verification works for the per-segment audit it builds on.

It composes the whole stack

Split inference is not a bolt-on. It is the capstone that falls out of everything else OGONG already does:

  • the zero-bond result removes the per-shard capital barrier, so a cohort of casual nodes is even possible,
  • cheap per-segment verification catches a lying shard for a fraction of its compute,
  • signed boundary commitments localize a cheat to the one node that produced it,
  • the router assembles the cohort and on-chain cohort settlement pays each shard its slice under a conservation invariant.

Each of those was built for serving a whole model on one machine. Put together, they let a crowd of ordinary GPUs serve a model none of them could run alone, which is why it’s a headline capability rather than a feature.

Tokenomics

OGONG is the network’s unit of account: consumers pay it for inference, and providers and validators earn it for serving and securing the network. Model makers are attributed on-chain, with a royalty slot reserved for them (inactive at launch, deferred to governance).

The principles

  • Fixed supply. A hard cap of 5,000,000,000 OGONG, enforced as an on-chain invariant.
  • Earned by work. The dominant 80% (4B) is never pre-allocated. It is emitted only for verified contribution (served inference plus passed liveness challenges) on a 4-year halving schedule, Bitcoin-style, open to anyone on the same permissionless terms. The curve is asymptotic: roughly 97% is emitted within ~20 years, approaching but never quite reaching the cap.
  • Initial supply. The remaining 1B (20%) is allocated at launch: 625M to core team & advisors, 250M to the Ogong foundation, and 125M to public liquidity.
  • Stake is priority, not a bond. Staked OGONG buys routing priority and availability weighting. It is not a slashable correctness deposit. Cheap verification, not capital at risk, is what keeps answers honest. (See How verification works.)

How emission is earned

The earned tranche mints to the roles that produce and secure work, in proportion to what each verifiably contributes per epoch:

  • providers for settled inference,
  • validators and verifiers for audits performed,
  • routers for routes served,

plus a liveness credit for answering a random availability challenge, which decays with the halving as a bootstrap. Emission amounts are agreed by a validator quorum from the same on-chain record that settlement runs on, so nothing mints without consensus.

Where the money goes per request

When a verified request settles, the escrowed fee is split on-chain across the parties that produced and secured the result: the provider that served it, the router, and the validator with a verification reserve. The model’s maker is attributed too, though the maker-royalty slot is reserved and currently inactive. A reply that fails its audit releases no fee at all.

Why a deposit isn’t needed

In most pay-for-work networks an operator posts a large refundable bond so they have something to lose if they cheat. OGONG drives that to zero: because verification covers nearly every request, simply forfeiting the cheated request’s fee is deterrent enough. That frees stake to do what operators actually want, buy priority, instead of sitting idle as collateral.

Network roles

OGONG is a small set of cooperating processes. Each is a standalone CLI binary; you can run one, several, or all of them. The three infrastructure roles (provider, validator, router) are permissionless, and a single operator may run any of them.

Provider

Serves models from a GPU and earns OGONG for verified work. The provider daemon is ogong-provider. It:

  • runs an embedded inference engine (text, image, audio, STT, TTS) as a managed subprocess, or fronts an existing engine (llama-server, vLLM, Ollama),
  • commits a verifiable record of each reply, signed with a hybrid post-quantum key (Ed25519 ‖ ML-DSA-44), and pushes it to a validator at end-of-stream,
  • registers itself with the router marketplace and/or joins the on-chain network.

“Turning it on” is the whole onboarding step: a provider risks zero capital for correctness; there’s no bond to post. Optional stake buys routing priority (pure upside). A provider’s GPU also doubles as a verifier for peers serving the same model. It has several modes: a tunnel client for home contributors, a direct HTTPS server for TEE operators, and a zero-signup local server. See Provider node.

Validator

ogong-validatord, the security layer. It’s an attested CPU enclave with no GPU and no model weights; it never runs a forward pass. It:

  • reads the on-chain registry to discover peer validators (no manual peer config needed),
  • receives signed commitment records from providers,
  • drives the threshold-BLS randomness beacon (a per-validator VRF is the bootstrap fallback) that audit-selects a sampled fraction of replies (audit rate --alpha) and assigns a verifier,
  • adjudicates the verifier’s scores against the committed commit_root (Merkle inclusion first, then thresholds),
  • when it holds the settlement role, gathers peer co-signatures over QUIC and submits the on-chain quorum settle,
  • posts the one slashable bond the system keeps, forfeit if it issues a false verdict.

See Validator node.

Verifier

ogong-verifierd, the audit muscle. A verifier is really a provider GPU acting in audit duty: when a validator audit-selects a reply, it VRF-picks a verifier from the other providers serving the same model and dispatches the job. The verifier teacher-forces a single pass over the claimed output, scores the drift against the commitment, and returns Accept / Reject. It’s paid a flat fee per audit regardless of verdict, so it’s neutral on the outcome. See How verification works.

Router & gateway

The marketplace match layer:

  • ogong-routerd, the match engine, an attested enclave on the request hot path. Providers register (Upsert); consumers query (Route) for a provider that can serve a given ogong/<tier>/<maker>/<model>, matched on price, tier, and free capacity, and drawn proportionally to stake × reputation. It runs verified routing code that can’t read the plaintext it relays, holds no consensus stake, and is slashable for misrouting.
  • ogong-gatewayd, an OpenAI-compatible HTTP front door for consumers. It accepts a model request, matches via the router, and forwards to the selected provider. It can also act as a fiat on-ramp, paying the network in OGONG on a user’s behalf.

See Router & gateway.

Chain

The Solana program (Anchor) under chain/programs/chain. It holds the staking pool, the validator registry, escrow, the quorum-gated settle instruction (a release needs k registered-validator co-signers), and emission. Makers are attributed on-chain (their royalty slot is reserved but inactive). Inference never touches the chain; only consensus-critical metadata and a hash anchor of each commitment do.

Consumer

Any application and its users. The team’s hosted product is merely one consumer, with no privileged status. A consumer pays a per-request fee held in on-chain escrow, released to the provider only after the reply survives its audit window; on a reject, the consumer is the party refunded. Payment can be in OGONG directly, or via the gateway’s fiat path for zero crypto exposure.

Maker

Not a process but a role. A maker is the author of a model served on the network, identified in the model id ogong/<tier>/<maker>/<model> and recorded on-chain as a royalty payee. The protocol reserves a maker-royalty slot, but it is inactive at launch (deferred to governance, which has to settle who may legitimately claim a model). The fees that do settle go to the provider, router, validator, and a verification reserve. See Tokenomics.

Quickstart

This walks you from an empty machine to a live, end-to-end OGONG mesh running locally: providers committing work, validators auditing it, and quorum-gated settlement on a local Solana test validator. Everything is CLI, no GUI.

If you just want to serve a model with zero network and zero signup, skip to Local mode.

0. Prerequisites

  • Rust (stable) and cargo.
  • Solana CLI + Anchor (for the on-chain program and the local test validator).
  • Node + npm (the mesh provisioning scripts are TypeScript).
  • A model file for whichever modality you’re serving (e.g. a .gguf for text, or the ACE-Step audio model used by the live mesh demo).

1. Build the binaries

From the repository root:

# Validator + verifier (settlement feature enables on-chain settle)
cargo build --release --features settlement \
  -p validator-service --bin ogong-validatord --bin ogong-verifierd

# Router marketplace + consumer gateway
cargo build --release -p ogong-router-service --bin ogong-routerd --bin ogong-gatewayd

# Provider daemon
cargo build --release -p ogong-provider

The resulting binaries land in target/release/. Add it to your PATH or call them by full path.

2. Run the whole mesh with one script

The fastest way to see OGONG work end-to-end is the bundled mesh runbook. From llamamp/chain:

# Stage 1 - registry-driven discovery
bash scripts/run-mesh-stage1.sh

This brings up a local solana-test-validator, deploys the chain program, launches several ogong-validatord instances, stakes and registers each one on-chain with its real QUIC endpoint and cert, and then confirms every node discovered the others purely from the on-chain registry, with no manual peer wiring.

# Stage 2 - quorum settle over real QUIC
bash scripts/run-mesh-stage2.sh

Three validators; one holds the settlement role. A metered release is pushed, the handling node gathers the peers’ co-signatures over QUIC, and submits the on-chain quorum settle. Verified by the provider’s fee account balance increasing.

# Stage 3 - audit-gated settle, backed by a real engine
# (terminal 1) bring up the reference engine in commit mode:
LLAMAMP_COMMIT=1 llamamp-audio-server \
  -m ~/models/ace-step-v1-3.5B/ace-full-f16.gguf \
  --vocab ~/models/ace-step-v1-3.5B/vocab.json --port 11436
# (terminal 2) run the capstone:
bash scripts/run-mesh-stage3.sh

Stage 3 is the full thing: a real generation is audited before its escrow releases, and only an Accept settles. See The local mesh for exactly what each stage does.

3. Or wire the pieces by hand

To understand the moving parts, run them individually:

# A validator that audits every reply and dispatches to a verifier
ogong-validatord \
  --bind 0.0.0.0:4533 \
  --alpha 1 \
  --verifier-endpoint 127.0.0.1:4544 \
  --verifier-cert /path/to/verifier.der

# The verifier, pointed at an independent engine instance
ogong-verifierd \
  --bind 0.0.0.0:4544 \
  --provider-url http://127.0.0.1:11436

# The marketplace match engine (writes its cert so a gateway can pin it)
ogong-routerd --bind 0.0.0.0:4544 --cert-out router.der

# The OpenAI-compatible consumer front door
ogong-gatewayd --bind 0.0.0.0:4546 --router 127.0.0.1:4544 --router-cert router.der

# A provider serving an embedded text model and joining the network
ogong-provider configure \
  --embedded-text /path/to/model.gguf \
  --join-network \
  --validator-endpoint 127.0.0.1:4533
ogong-provider start

4. Call it

Once a gateway is up, talk to the network through any OpenAI-compatible client:

curl http://127.0.0.1:4546/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{
    "model": "ogong/verified/<maker>/<model>",
    "messages": [{"role":"user","content":"Hello from OGONG"}]
  }'

See the Consumer API for the model-id format and supported endpoints.

Next

Provider node

A provider serves models from your GPU and earns OGONG for verified work. The daemon is ogong-provider. It is fully headless and ships no GUI.

Configuration lives in ~/.ogong-provider/config.json (written by configure, re-runnable any time). Downloaded models go to ~/.ogong-provider/models/.

Modes at a glance

CommandWhat it doesNetworkAccount
ogong-provider localStandalone OpenAI + Ollama servernonenone
ogong-provider runOne-shot terminal REPL chatnonenone
ogong-provider startTunnel client (home GPU behind NAT)tunnelAPI key
ogong-provider serveDirect HTTPS server (TEE / marketplace operator)directmarketplace

Serving an engine

A provider either embeds an engine (spawns and manages it as a subprocess) or adapts to one you already run (Ollama, vLLM, llama-server). Embedded modes per modality:

FlagModalityEngine spawned
--embedded-text <gguf>chat + embeddingsllama-server
--mmproj <gguf>vision (with --embedded-text)adds image input
--embedded-image <model>image genllamamp-image-server
--embedded-music <gguf>musicace-server
--embedded-whisper <ggml>speech-to-textwhisper-server
--embedded-tts <gguf>text-to-speechaudio-server (VibeVoice)

Modalities compose: you can run text, image, and audio at once, each its own subprocess. For many models on a RAM budget, use --served-models <json> for on-demand LRU loading (engines load lazily and evict under a memory budget) instead of eager-spawning each.

Adapter alternatives (front an existing server): --upstream, --image-upstream, --audio-upstream, --video-upstream.

Joining the network

To earn on OGONG, a provider pushes each signed served-record to a validator (and, when metered, settles on-chain). It is off unless you opt in:

ogong-provider configure \
  --embedded-text ~/.ogong-provider/models/your-model.gguf \
  --machine mac-studio \
  --join-network \
  --validator-endpoint 127.0.0.1:4533
ogong-provider start
  • --join-network - opt into the verified-inference network.
  • --validator-endpoint <host:port> - the ogong-validatord to push signed records to.
  • --machine <name> - short machine id, combined with your user id to form your canonical provider id.

A Solana payout keypair is generated locally on first run. Print your payout address with:

ogong-provider wallet

Tunnel mode (home contributor)

ogong-provider start dials out to a tunnel server over QUIC and serves through it, so a home GPU behind NAT can contribute without exposing a public address. Requires an API key (--api-key, or OGONG_PROVIDER_API_KEY). The tunnel is only the transport: combined with --join-network, a home GPU serves on the Verified tier like any other provider, its work is committed and audited. (For a purely private server you run only for yourself, with no account and no audit, see Local mode.)

TEE / marketplace operator

ogong-provider serve runs a direct HTTPS server (skips the tunnel) for operators with a public, TEE-attested instance:

ogong-provider serve --listen 0.0.0.0:8443 --cert fullchain.pem --key privkey.pem

For TEE-tier attestation, fetch a DCAP quote bound to your report-data with ogong-provider quote, then submit your identity with ogong-provider marketplace-register. The operator-side admin runs DCAP chain verification before approving.

Inspecting

ogong-provider show     # print config (api key redacted)
ogong-provider wallet   # print Solana payout address
ogong-provider pull --list   # browse the model catalog
ogong-provider pull <name>   # download a model

See the CLI reference for the full flag list.

Advanced serving

Most providers just run ogong-provider local --model ... and never touch a flag. When you’re serving a large model, packing more concurrency onto a GPU, or splitting a model across hardware, there are extra knobs.

ogong-provider tunes its embedded engine (a llama.cpp fork’s llama-server, which it spawns and proxies) mostly through environment variables. For full control of how a model is placed on hardware, you front your own engine with --upstream instead.

It auto-sizes by default

You usually don’t need to set any of this. When the provider spawns the engine, it inspects the model (weight size, KV bytes per token, whether it’s MoE) against your machine’s memory budget (discrete VRAM, or a share of system RAM) and picks a config on its own:

  • if the model fits, it spends the spare memory on concurrent request slots (capped where a single GPU stops scaling),
  • if a MoE model is too big, it turns on the expert cache to stream cold experts,
  • if a dense model is too big, it falls back to mmap / SSD paging so it still runs.

The knobs below are overrides of those automatic choices, for when you want to tune it yourself.

Two ways to tune

  1. Embedded engine + env knobs. Keep using --embedded-text / local, and set LLAMAMP_* variables to control KV cache, MoE, speculative decoding, and GPU offload.
  2. Front your own engine (--upstream). Launch llama-server (or vLLM) yourself with any flags you like, then point ogong-provider serve (or start / local) at it with --upstream http://127.0.0.1:8080/v1. This is how you do multi-GPU tensor-split and multi-node splits, which the embedded path doesn’t expose directly. ogong-provider still commits and settles exactly the same; the placement is the engine’s concern.

GPU offload

By default the embedded engine offloads all layers to the visible GPU(s). To cap it (for example, partial offload on a small card):

LLAMAMP_NGL=40 ogong-provider local --model my-model.gguf

KV cache: quantize it and size it

The KV cache dominates memory once you serve many concurrent requests. Quantize it and grow the context to fit more slots:

LLAMAMP_CACHE_TYPE_K=q8_0 LLAMAMP_CACHE_TYPE_V=q4_0 LLAMAMP_PARALLEL=16 \
  ogong-provider local --model my-model.gguf --n-ctx 16384
KnobEffect
--n-ctx <n>per-slot context (default 8192). The engine’s total KV is n-ctx × parallel.
LLAMAMP_PARALLEL=<n>concurrent slots (auto-sized by default)
LLAMAMP_CACHE_TYPE_K, LLAMAMP_CACHE_TYPE_VKV quantization: f16 (default), q8_0, q4_0
LLAMAMP_FLASH_ATTN=on|offflash attention (auto by default)
LLAMAMP_BATCH_SIZE=<n>engine batch size
LLAMAMP_CACHE_REUSE=<n>prompt-cache reuse window (default 256)

Quantized KV needs flash attention on, and is incompatible with tensor-split mode.

MoE: run a model bigger than your VRAM

A Mixture-of-Experts model can be served even when it doesn’t fit in VRAM by keeping some experts on CPU. Three controls, in increasing order of how much they offload:

  • Partial offload (LLAMAMP_NCMOE=N): keep the experts of the first N MoE layers on CPU. The right knob when a model is only slightly too big.
  • Full offload (the engine’s -cmoe, via a catalog model’s args): all experts on CPU.
  • Expert cache (LLAMAMP_MOE_CACHE_SLOTS): for models well over budget, the OGONG engine streams cold experts through a slot cache, so an oversized MoE keeps running instead of failing to load. It auto-enables when a MoE exceeds the budget.
# partial: keep the first 12 MoE layers' experts on CPU
LLAMAMP_NCMOE=12 ogong-provider local --model big-moe.gguf

# or the streaming expert cache
LLAMAMP_MOE_CACHE_SLOTS=24 ogong-provider local --model big-moe.gguf

Speculative decoding

The engine accelerates generation by drafting tokens ahead and verifying them in a batch. ogong-provider turns it on automatically when it finds a drafter:

  • set LLAMAMP_DRAFT_MODEL=/abs/path/to/drafter.gguf, or
  • let a catalog model pull its own drafter (entries carry a draft_url).
LLAMAMP_DRAFT_MODEL=/models/drafter.gguf ogong-provider local --model gemma-4.gguf

The method defaults to MTP (multi-token prediction). Override it with LLAMAMP_SPEC_TYPE:

LLAMAMP_SPEC_TYPEMethodDraft model?
draft-mtp (default)multi-token predictionyes (mtp-*.gguf)
draft-simple, draft-eagledraft-model speculationyes
ngram-simple, ngram-map-kn-gram lookupno

The n-gram methods need no draft model at all, so they work on any model.

Multi-GPU: split a model across GPUs

The embedded engine already spreads a model across all visible GPUs in layer-split mode. For tensor-parallel across GPUs, pass the engine’s split flags through a catalog model’s args array (every flag there is appended verbatim to the engine command):

ogong-provider local --served-models \
  '[{"id":"big","kind":"Text","path":"/models/big.gguf",
     "args":["--split-mode","tensor","--tensor-split","1,1","--flash-attn","on"]}]'

Any flag the engine supports can be set this way, per model. Alternatively, launch your own llama-server with the split flags and front it with ogong-provider serve --upstream http://127.0.0.1:8080/v1 ....

Multi-node: split a model across machines

A single model can be sharded across the GPUs of several machines. Each worker node runs the rpc-server binary (shipped with the provider) to expose its GPU; the provider node lists the workers, and the engine splits the model’s layers across the pool.

# on each worker box, expose its GPU over RPC:
rpc-server --host 0.0.0.0 --port 50052

# on the provider box, point at the workers; the model is sharded across them:
LLAMAMP_RPC_SERVERS="10.0.0.2:50052,10.0.0.3:50052" \
  ogong-provider local --model big.gguf

Trusted networks only The RPC transport is unauthenticated and unencrypted. Run it only over a private network you control, never the public internet.

You run and point at your own rpc-server nodes. Automatic discovery and a sharding policy (the provider spawning and balancing remote workers for you) are a separate layer still to come; for now this is the manual enable-and-point path.

Your machines vs. the network The RPC path above shards a model across your own machines, which you run and trust. To serve a model too big for your hardware by joining a cohort of independent providers that each verify and get paid for their slice, see Split inference, the network-level capability.

Validator node

A validator secures the network: it discovers peers from the on-chain registry, receives signed commitment records from providers, audits a sampled fraction of replies, and (when it holds the settlement role) gathers peer co-signatures and submits the on-chain quorum settle. The binary is ogong-validatord.

Build

cargo build --release --features settlement \
  -p validator-service --bin ogong-validatord --bin ogong-verifierd

The settlement feature is what enables on-chain settle; build without it for an audit-/cosign-only node.

Run

ogong-validatord \
  --bind 0.0.0.0:4533 \
  --alpha 1 \
  --verifier-endpoint 127.0.0.1:4544 \
  --verifier-cert /path/to/verifier.der

Key flags

FlagDefaultMeaning
--bind <addr>0.0.0.0:4533UDP address for the QUIC endpoint
--alpha <0..1>1.0Audit coverage. 1 audits every reply; 0 audits none
--verifier-endpoint <host:port>-Verifier to auto-dispatch audit-selected replies to
--verifier-cert <path>-Pinned verifier cert (PEM/DER); required with the endpoint
--peer <host:port|cert>-Peer validator for the audit beacon; repeatable
--s <prob>1.0Verifier soundness (chance a substitute reply is rejected)
--eps <prob>0.0Verifier false-positive rate (honest reply rejected)
--beta <rate>0.001Target lifetime false-ejection rate (the Ville bound)
--consensusfalseRun the shared-ordered-log consensus driver atop quorum-settle

Without --verifier-endpoint, audited replies await a manual verdict submission instead of auto-dispatch. With no --peer, the node draws its audit beacon solo.

In production the audit randomness comes from the threshold-BLS committee beacon (set up by a dealerless DKG across the registered validators); the per-validator VRF described by --peer is the bootstrap fallback used until that committee key is established.

Discovery - no manual peering

A validator reads the on-chain registry every ~30s (at confirmed commitment) to discover peers. You stake and register the node on-chain with its real endpoint and cert; from then on the mesh finds itself. The --peer flag exists for the audit beacon and for setups without registry discovery.

consensus_id = sha256(cert) ties a registered validator to the cert it presents over QUIC, so peers pin each other by their on-chain-registered certs.

Settlement environment

When a node holds the settlement role it needs these (the settlement feature reads them):

Env varPurpose
OGONG_VALIDATOR_KEYPAIRthis validator’s keypair
OGONG_VALIDATOR_CERT_OUTwhere to write its QUIC cert for peer pinning
OGONG_PROGRAM_IDthe on-chain program id
OGONG_RPC_URLSolana RPC endpoint
OGONG_AUTHORITY_KEYPAIRsettlement authority
OGONG_MINTthe OGONG mint
OGONG_FEE_OWNERSfee/payout owners
OGONG_QUORUMk - required co-signers (including the authority)

A release settles only when authority + k co-signatures are assembled. Cosign-only peers run without the sink env (strip it with env -u if reusing a shell).

The verifier

ogong-verifierd is the audit muscle a validator dispatches to. It re-runs sampled steps of a committed trajectory on an independent engine and returns a verdict.

ogong-verifierd \
  --bind 0.0.0.0:4544 \
  --provider-url http://127.0.0.1:11436 \
  --k 2 \
  --cert-out verifier.der
FlagDefaultMeaning
--bind <addr>0.0.0.0:4544QUIC bind address
--provider-url <url>-the engine to re-run the committed work on
--ref-url <url>-reference model endpoint (when distinct)
--audio-engine-url <url>-audio engine for diffusion-audio audits
--k <n>2sampled steps per audit
--cert-out <path>-write the verifier’s pinned cert here

Run the verifier against a separate engine instance from the provider’s; soundness comes from independent re-execution, not co-location.

See How verification works for the audit theory.

Router & gateway

The router/gateway pair is OGONG’s marketplace match layer. The router maintains a registry of providers and answers match queries; the gateway is the OpenAI-compatible HTTP front door consumers actually call. Both are standalone QUIC binaries in ogong-router-service with no GUI and no on-chain dependency for pure matching.

Build

cargo build --release -p ogong-router-service --bin ogong-routerd --bin ogong-gatewayd

Router - ogong-routerd

The match engine. Providers register themselves (Upsert); consumers (or the gateway) query (Route) for a provider that can serve a given ogong/<tier>/<maker>/<model>. It starts with an empty registry and fills as providers register.

ogong-routerd --bind 0.0.0.0:4544 --cert-out router.der
FlagDefaultMeaning
--bind <addr>0.0.0.0:4544UDP address for the QUIC endpoint
--cert-out <path>-write the router’s bootstrap cert (DER) so a gateway can pin it
--relayoffput the router on the data path (select and forward)

Match-only vs relay. By default the router only does matching - consumers Route, then forward the request themselves (this is what ogong-gatewayd does, which lets it read provider response headers). With --relay the router sits on the hot path and forwards bytes to the provider’s endpoint itself. Env equivalents: OGONG_ROUTER_CERT_OUT, OGONG_ROUTER_RELAY.

Gateway - ogong-gatewayd

The consumer front door. Serves an OpenAI-compatible HTTP API, matches each request through the router, and forwards it to the selected provider.

ogong-gatewayd \
  --bind 0.0.0.0:4546 \
  --router 127.0.0.1:4544 \
  --router-cert router.der \
  --max-price 1000000
FlagDefaultMeaning
--bind <addr>0.0.0.0:4546TCP address for the HTTP API
--router <addr>127.0.0.1:4544the router’s QUIC address
--router-cert <path>-router bootstrap cert to pin (from routerd --cert-out)
--max-price <u64>u64::MAXbudget ceiling per 1k tokens (atomic OGONG units)

Env equivalents: OGONG_ROUTER, OGONG_ROUTER_CERT, OGONG_GATEWAY_MAX_PRICE.

End to end

# 1. router (writes its cert)
ogong-routerd --bind 0.0.0.0:4544 --cert-out router.der &

# 2. gateway (pins that cert)
ogong-gatewayd --bind 0.0.0.0:4546 --router 127.0.0.1:4544 --router-cert router.der &

# 3. a provider registers with the router (see Provider node)
# 4. call the gateway with any OpenAI client
curl http://127.0.0.1:4546/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"ogong/verified/<maker>/<model>","messages":[{"role":"user","content":"hi"}]}'

See the Consumer API for the model-id format and endpoints.

Local mode (no network)

ogong-provider local runs a standalone inference server on your own machine with no OGONG network, no tunnel, and no account. It serves an OpenAI-compatible API and an Ollama-compatible API on the same port, so existing tools point at it unchanged. This is the zero-friction on-ramp, and a drop-in local runner in its own right.

Serve a model

# Serve one model (downloads on demand into ~/.ogong-provider/models if a catalog name)
ogong-provider local --model ~/.ogong-provider/models/llama-3.2-3b-instruct.gguf

# Serve several at once, each routable by id
ogong-provider local \
  --model llama-3.2-3b-instruct \
  --model qwen2.5-7b-instruct
FlagDefaultMeaning
--model <path-or-name>-a .gguf path or a catalog name; repeatable
--mmproj <gguf>-multimodal projector for a single vision model
--n-ctx <n>8192context size
--listen <addr>127.0.0.1:11434bind address (Ollama’s port by default)
--upstream <url>-adapter mode: forward to an existing OpenAI server instead of spawning one (mutually exclusive with --model)
--served-models <json>-curated on-demand (LRU) set covering every modality; takes precedence over --model

Because it binds Ollama’s default port (127.0.0.1:11434) and speaks Ollama’s API, anything configured for Ollama works against it with no changes.

Use it

# OpenAI-style
curl http://127.0.0.1:11434/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"llama-3.2-3b-instruct","messages":[{"role":"user","content":"hi"}]}'

# Ollama-style
curl http://127.0.0.1:11434/api/chat \
  -d '{"model":"llama-3.2-3b-instruct","messages":[{"role":"user","content":"hi"}]}'

One-shot terminal chat

For a quick REPL without even setting up a server:

ogong-provider run llama-3.2-3b-instruct --system "You are concise."

Spawns the engine and streams replies to prompts read from stdin. Ctrl-D / Ctrl-C to quit. No account, no config.

Managing models

ogong-provider pull --list        # browse the catalog
ogong-provider pull <name|url>    # download into ~/.ogong-provider/models

When you’re ready to contribute to the network, the same binary joins it; see Provider node.

The local mesh

The repository ships a runnable, three-stage local mesh that brings up OGONG’s verification spine on an ephemeral solana-test-validator: independent ogong-validatord processes discover each other from the on-chain registry and co-sign settlements over real QUIC. It is a local, ephemeral test network you run on your own machine, the fastest way to watch the whole spine work end to end before pointing the same processes at a live network.

The scripts live in llamamp/chain/scripts/ and are documented in llamamp/chain/scripts/MESH.md. Run them from llamamp/chain.

Components

  • chain program (programs/chain) - staking pool, validator registry, escrow, and the quorum-gated settle (a release needs k registered-validator co-signers).
  • validatord (ogong-validatord, built --features settlement) - generates its own cert, reads the on-chain registry every ~30s to discover peers, and (when it holds the settlement sink) settles a release by gathering peer co-signatures over QUIC.
  • verifierd (ogong-verifierd) - re-runs a committed trajectory to score an audit (Stage 3).

Stage 1 - registry-driven discovery

bash scripts/run-mesh-stage1.sh

Chain up → N validatords (each writes its cert) → a provisioning script stakes and registers each one with its real endpoint and cert (consensus_id = sha256(cert)) → after one registry refresh, a discovery check shows every node found the others purely from the chain. No manual peer config.

Stage 2 - quorum settle over real QUIC

bash scripts/run-mesh-stage2.sh

Three validatords: val0 is the handling node (holds the settlement sink, OGONG_QUORUM=2 plus the authority/mint/fee-owner env); val1/val2 are cosign-only peers (sink env stripped). A driver pushes a metered RELEASE to the peers first (each holds its reply so it will co-sign), then to the handling node, which fires its sink, gathers the two peers’ co-signatures over real QUIC, assembles authority + 2 co-sigs, and submits the on-chain quorum settle. Verified by the provider’s fee account balance equalling the gross.

Stage 3 - audit-gated settle (fully live, engine-backed)

# 1. bring up the reference engine (ACE-Step) in commit mode:
LLAMAMP_COMMIT=1 llamamp-audio-server \
  -m ~/models/ace-step-v1-3.5B/ace-full-f16.gguf \
  --vocab ~/models/ace-step-v1-3.5B/vocab.json --port 11436
# 2. run the capstone:
bash scripts/run-mesh-stage3.sh

The capstone: a real generation is audited before its escrow releases, and only an Accept settles. A verifierd points at the engine; three validatords register on-chain where val0 audits every reply (--alpha 1) via the verifierd and holds the settlement sink (quorum 2), and val1/val2 are cosign-only (--alpha 0). A driver generates a real trajectory on the engine, commits its true root, and pushes the record. val0 audit-selects it, dispatches to the verifierd (which re-runs sampled denoising steps on the engine and scores the drift), adjudicates Accept, and only then gathers val1/val2’s co-signatures over QUIC and submits the on-chain quorum settle.

When it works, the reply audits Accept, the mesh quorum-settles it on-chain, and the provider’s fee account ends up equal to the gross.

Notes

  • --alpha 0 = no audit (immediate unaudited release), used in Stage 2 to isolate the settlement mesh. --alpha 1 + --verifier-endpoint/--verifier-cert wires the audit path.
  • The engine serves generic provider endpoints /v1/trajectory/:id (the committed dump) and /v1/replay/:id (the original request) so a stock verifierd audits it directly - no in-process bridge. Co-locating the verify-engine with the provider is a demo convenience; soundness uses a separate engine instance (--audio-engine-url).
  • Both the registry reader and the settlement client use confirmed commitment, so a settling node sees just-confirmed registrations/accounts without waiting for finalization.
  • examples/traj_audit_loop is the in-process, CI-friendly equivalent - no chain or live engine required.

CLI reference

Every OGONG role is a standalone binary. All flags are exposed via --help on each binary; this page is a curated index of the ones that matter. Defaults shown are the binaries’ built-in defaults.

ogong-validatord

Validator / audit node (QUIC).

FlagDefaultMeaning
--bind <addr>0.0.0.0:4533QUIC bind address
--alpha <0..1>1.0audit coverage (1 = audit every reply)
--s <prob>1.0verifier soundness
--eps <prob>0.0verifier false-positive rate
--beta <rate>0.001target lifetime false-ejection rate (Ville bound)
--peer <host:port|cert>-audit-beacon peer (repeatable)
--verifier-endpoint <host:port>-verifier to auto-dispatch audits to
--verifier-cert <path>-pinned verifier cert (required with endpoint)
--consensusfalseshared-ordered-log consensus driver

Settlement env: OGONG_VALIDATOR_KEYPAIR, OGONG_VALIDATOR_CERT_OUT, OGONG_PROGRAM_ID, OGONG_RPC_URL, OGONG_AUTHORITY_KEYPAIR, OGONG_MINT, OGONG_FEE_OWNERS, OGONG_QUORUM.

ogong-verifierd

Audit verifier: re-runs committed work on an independent engine.

FlagDefaultMeaning
--bind <addr>0.0.0.0:4544QUIC bind address
--provider-url <url>-engine to re-run the committed work on
--ref-url <url>-reference model endpoint (when distinct)
--audio-engine-url <url>-audio engine for diffusion-audio audits
--k <n>2sampled steps per audit
--cert-out <path>-write the verifier’s pinned cert

ogong-routerd

Marketplace match engine (QUIC).

FlagDefaultMeaning
--bind <addr>0.0.0.0:4544QUIC bind address
--cert-out <path>-write bootstrap cert (DER) for gateways to pin
--relayoffput the router on the data path

Env: OGONG_ROUTER_CERT_OUT, OGONG_ROUTER_RELAY.

ogong-gatewayd

OpenAI-compatible consumer front door.

FlagDefaultMeaning
--bind <addr>0.0.0.0:4546HTTP API bind address
--router <addr>127.0.0.1:4544router’s QUIC address
--router-cert <path>-router bootstrap cert to pin
--max-price <u64>u64::MAXbudget ceiling per 1k tokens (atomic OGONG units)

Env: OGONG_ROUTER, OGONG_ROUTER_CERT, OGONG_GATEWAY_MAX_PRICE.

ogong-provider

Provider daemon. Subcommands:

SubcommandPurpose
configurewrite/update ~/.ogong-provider/config.json
starttunnel client (home GPU behind NAT)
servedirect HTTPS server (TEE / marketplace)
localstandalone OpenAI + Ollama server (no network, no account)
runone-shot terminal REPL chat
pulldownload a model (--list to browse the catalog)
showprint config (api key redacted)
walletprint Solana payout address
quotefetch a DCAP attestation quote (TDX)
marketplace-registersubmit identity to a marketplace operator

Selected configure flags:

FlagMeaning
--api-keyaccount API key (env OGONG_PROVIDER_API_KEY)
--upstream <url>adapter mode: forward to an existing OpenAI server
--embedded-text <gguf>spawn llama-server for this model
--mmproj <gguf>vision projector for the embedded text model
--embedded-image <model>spawn llamamp-image-server
--embedded-music <gguf>spawn ace-server
--embedded-whisper <ggml>spawn whisper-server (STT)
--embedded-tts <gguf>spawn the audio-server (TTS)
--served-models <json>on-demand LRU multi-model serving set
--machine <name>short machine id for your canonical provider id
--join-networkopt into the verified-inference network
--validator-endpoint <host:port>validatord to push signed records to
--listen / --cert / --keybind + TLS for serve

local flags: --model (repeatable), --mmproj, --n-ctx, --listen, --upstream, --served-models.

Consumer API

Consumers reach the network through ogong-gatewayd, which speaks an OpenAI-compatible HTTP API. Any OpenAI client library works; point its base URL at the gateway.

Model id format

OGONG model ids encode the trust tier, the maker, and the model:

ogong/<tier>/<maker>/<model>
  • <tier> - verified or tee (see Trust tiers).
  • <maker> - the model author; attributed on-chain (royalty slot reserved, inactive at launch).
  • <model> - the model name.

Example: ogong/verified/<maker>/llama-3.3-70b.

The gateway parses the id, asks the router for a provider that can serve that tier + model, and forwards the request.

Chat completions

curl http://127.0.0.1:4546/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{
    "model": "ogong/verified/<maker>/<model>",
    "messages": [
      {"role": "system", "content": "You are helpful."},
      {"role": "user", "content": "Hello from OGONG"}
    ],
    "stream": true
  }'

Streaming uses standard OpenAI SSE framing.

Other modalities

Providers can serve image, audio (music / TTS / STT), and video. The corresponding OpenAI-style endpoints are forwarded to a provider that serves that modality:

EndpointModality
/v1/chat/completionstext (and vision input)
/v1/embeddingsembeddings
/v1/images/*image generation
/v1/audio/musicmusic generation
/v1/audio/speechtext-to-speech
/v1/audio/transcriptionsspeech-to-text
/v1/videos/*video generation

Availability depends on what providers in the marketplace are serving for the requested tier.

Pricing

The gateway enforces a budget ceiling per 1k tokens via --max-price (atomic OGONG units). Requests that would exceed it are rejected at match time. Settlement of paid work happens on-chain after a passing audit and validator quorum; see How verification works.

Glossary

α (alpha), a validator’s audit coverage in [0,1]. --alpha 1 audits every reply; --alpha 0 audits none. The design target is full coverage, affordable because checking is cheap.

β (beta), target lifetime false-ejection rate (the Ville bound) used by the SPRT when deciding to eject a provider.

Attestation, a hardware-signed proof of what code and model an enclave is running; the trust root of the Confidential (TEE) tier. See DCAP quote.

Commitment, a verifiable digest a provider emits over its work: a Merkle tree, root commit_root, whose leaves bind a hidden-state SRP sketch and a top-k logprob digest per 32-token window. Lets a verifier re-check a sampled slice cheaply.

commit_root, the Merkle root of a reply’s per-window commitment leaves, signed into the provider’s record.

consensus_id, sha256(cert); ties a registered validator to the QUIC cert it presents, so peers pin one another by their on-chain-registered certs.

DCAP quote, a hardware-signed attestation (Intel TDX) proving what code/model is running inside a TEE; the trust root for the Confidential tier.

Escrow, on-chain account holding a consumer’s funds until a release is audited and quorum-settled. On a reject the funds are refunded to the consumer.

Gateway, ogong-gatewayd; the OpenAI-compatible HTTP front door for consumers, and an optional fiat on-ramp.

Honeypot audit, a planted audit carrying a known-bad output, indistinguishable from a real one; a verifier that passes it (rubber-stamping accept) is itself slashed.

Hybrid PQ signature, the provider record is signed with Ed25519 and ML-DSA-44, so it survives the future break of either scheme.

KS test (Kolmogorov–Smirnov), the sup-norm distance between the committed and recomputed top-k logprob distributions; catches a localized probability shift that an averaged TV would dilute. Reject threshold ≈ 0.10.

LOGIC, the logprob-commitment primitive: top-k logprob digests at every decode position. The cheap first check, using values the engine already exposes.

Maker, the author of a model, identified in ogong/<tier>/<maker>/<model> and attributed on-chain. A royalty slot is reserved but inactive at launch (deferred to governance).

model_root, a SHA-256 over the model’s ordered shard content hashes; binds a commitment to a specific model identity (quantization included implicitly).

Provider, a node serving inference from a GPU; the daemon is ogong-provider. Risks no correctness bond and doubles as a verifier for peers serving the same model.

Quorum settle, the on-chain settlement that releases escrow; requires co-signatures from a stake-weighted supermajority of validators (more than two-thirds of stake).

Reputation, a per-provider score (with stake) that weights how much work the router routes to it.

ρ (rho), the ratio of verification cost to generation cost. Measured at ≈ 1% on datacenter GPUs (~100x cheaper than generation) and ≈ 5% on Apple Silicon (~20x). This is what makes full-coverage auditing affordable.

Router, ogong-routerd; the marketplace match engine, an attested enclave that draws a provider proportionally to stake × reputation and is slashable for misrouting.

Score mode, the engine path that returns per-token logprobs/hidden states from a single teacher-forced prefill without generating (what makes the audit cheap). The default verification path (capability-detected, with a fallback).

Settlement sink, the role/env that lets a validator submit the on-chain settle. Only the handling validator holds it; peers are cosign-only.

SPRT, sequential probability ratio test; accumulates per-reply verdicts into a running decision so a persistent cheater is ejected quickly while honest noise rarely is (bounded by β).

SRP sketch (sign-random-projection), the hidden-state commitment: the activations projected onto a fixed, public bank of random ±1 directions. Well-conditioned and not the provider’s to choose, so a substitute model can’t hide in a hand-picked subspace. Replaces the older provider-chosen magnitude-top-k scheme. Reject threshold (relative-L2) ≈ 0.10.

Stake, OGONG locked by an operator to buy priority and availability weighting. It is not a slashable correctness bond.

Teacher-forced verification, the audit method: the verifier runs one forward pass over (prompt + claimed output) and reads the model’s hidden states and logprobs off that pass, instead of re-generating. The source of ρ ≈ 1%.

TEE (Confidential tier), Trusted Execution Environment; the verifiably-private tier where the operator can’t read your prompt.

Threshold-BLS beacon, the committee randomness source for audit selection. Validators share one BLS key (via a dealerless DKG); each epoch’s beacon is the unique threshold signature over it, so no coalition can grind or steer the draw and withholding can’t move it. Anyone verifies it against the group public key. A drand-style construction; it closes the “watch then decide” attack.

TOPLOC, the hidden-state-commitment primitive (implemented as the SRP sketch); a stronger check than logprobs alone because it pins internal activations, which distillation can’t fake.

Total-variation (TV) distance, the distance between committed and recomputed top-k logprob distributions; a companion signal to KS (honest ~0.01, a quant cheat ~0.05). The logprob reject line itself is KS ≈ 0.10.

Trajectory, the recorded sequence of a generation (token windows, or sampled denoising steps for diffusion) that a verifier re-checks during an audit.

Validator, ogong-validatord; an attested CPU enclave (no GPU) that audit-selects work, adjudicates verifier scores, co-signs settlement, and posts the only slashable bond in the system.

Verifier, ogong-verifierd; a provider GPU in audit duty that teacher-forces the claimed output on an independent engine and returns Accept/Reject, paid a flat fee per audit.

VRF (verifiable random function), the per-validator audit-selection primitive, now the bootstrap fallback to the threshold-BLS beacon. Keeps audit selection unpredictable yet verifiable, so a provider can’t tell which replies are checked.