The Engine — 60/40 deterministic archetype scoring

A large language model can read a bio and a post history and tell you, in plausible prose, that someone is a Champion or a Builder. The output reads well. It also changes between runs, sometimes for the same input, often for inputs that differ in ways the prompt was never told mattered. Two operators looking at the same model output disagree about what to do next.

For a feature that classifies who gets activated, who gets ignored, and who gets paid, that level of stochasticity is unacceptable. CommunityOS is built on the opposite premise: the scoring engine is deterministic. Given the same input, the same parameters, and the same model version, the engine produces the same archetype score every time. That property is not a marketing claim. It is what allows every downstream surface — the queue, the report, the rewards — to be defensible.

What the engine is not

Not a wrapper around GPT, Claude, Gemini, or any open-weights model.
Not a vector-similarity search across cached embeddings.
Not a black box that produces an answer without an audit trail.

What the engine is

A rules-based scoring pipeline with explicit linguistic features and explicit metric normalizations.
A 60/40 weighted combination — 60% linguistic substance, 40% normalized vanity metrics.
A pre-stage Bot-Kill filter that removes farmers, low-signal accounts, and obvious automation before anything else runs.
An auditable score for every person, every time, every run.

The 60% — linguistic features

Three families of signal, each contributing to the linguistic component of the score:

Topical depth. How much of the person’s recent activity is substantively about the project’s subject area. A protocol’s real Champions are not generalists.
Voice substance. Whether the person’s posts contain original opinion, analysis, or technical content versus retweets and short reactions.
Engagement quality. Whether their replies and quotes generate substantive responses — a proxy for whether the audience treats them as a source.

The 40% — normalized vanity metrics

Vanity metrics are not useless. They are misleading when used as the only signal. The engine normalizes them in two ways:

Audience-relative. A 10,000-follower account that gets 200 engagements per post is more potent than a 1M-follower account that gets the same number. The ratio matters more than the raw count.
Recency-weighted. Engagement from the last 30 days is weighted higher than engagement from a viral post two years ago. The engine cares about current standing, not historical peak.

The most common failure mode in community scoring is to apply the archetype model to a follower list that contains a substantial percentage of bots, farmers, and inauthentic accounts. The model will produce scores. The scores will be garbage. The campaign will get built on top of the garbage.

Bot-Kill runs first. Before the 60/40 model touches anything, the candidate pool is filtered through a set of behavioral, structural, and engagement-pattern checks that remove the obvious cases. In the Mintlayer pilot, roughly 64 percent of the raw follower count was filtered at this stage — typical for Web3 audiences, lower for established brand accounts.

The filter does not require perfect bot detection. It requires high precision at the cost of recall. A real Champion accidentally filtered out can be added back on review. A farmer that survives the filter pollutes every downstream surface. So the filter is tuned to be aggressive, with explicit append paths for false positives.

Every follower in a scan receives a four-dimensional score: Champion, Amplifier, Builder, Early Adopter. The four scores are independent. A single person can rank high in two or three archetypes simultaneously — Champions often have Amplifier reach, Builders often start as Early Adopters.

Inputs

For each candidate: bio text, last 200 posts (mixture of original posts and replies), follower count, following count, account age, recent engagement velocity, and a graph signal indicating whether the candidate appears in the project’s own engagement history.

The pipeline

Bot-Kill filter.
Topical relevance scoring against the project’s seed corpus.
Voice substance scoring on the candidate’s post body.
Engagement quality scoring on the candidate’s reply tree.
Audience-relative vanity normalization.
Recency weighting.
Four-way archetype combination — different weights of the above for each archetype.
Bucket assignment — Act Now, Act Soft, Wait, Monitor, Ignore.

What gets published

The full methodology paper, including the exact weight matrix and the seed-corpus extraction rules, is in development. It will be published under CC-BY-SA so the methodology can be cited, reproduced, and challenged. The publication date is set for Q3 2026.

A deterministic engine.
60% language. 40% metrics.

Because community work is too expensive to score on vibes.

What the engine is not

What the engine is

Linguistic depth carries more weight than reach.

The 60% — linguistic features

The 40% — normalized vanity metrics

Remove the farmers before the campaign exists.

What the model does, in enough detail to evaluate.

Inputs

The pipeline

What gets published

See the engine produce a scan.