Why does AI hallucinate about my brand?

Your brand is a long-tail entity in training data. Frontier LLMs hallucinate on 15-52% of brand questions across a 37-model evaluation (Dextra Labs via SQ Magazine, 2026), with a sharp climb for low-frequency entities the model has seen few times during pre-training (Kandpal et al., ICML 2023). Most B2B brands sit in that bucket.

Can schema.org markup fix it?

Partially. Schema improves parsing, but adoption does not correlate with LLM citation frequency (Search Atlas, 2025). Adding more silent fields can make hallucinations worse, because every silent slot is another gap the model fills with a guess.

What is the difference between RAG hallucination and training hallucination?

Training hallucinations come from pre-trained weights when no source is retrieved. RAG hallucinations happen during or after retrieval, when the model overrides the source or fills gaps from parametric memory. The do_not_infer directive targets the second kind.

Will LLMs respect do_not_infer?

Today, only if you surface the directive in a way the model can read. Typically that means a flag inside the brand intelligence JSON the model retrieves, plus a prompt-time instruction explaining how to interpret it. Wider adoption needs schema.org and JSON-LD to add the primitive.

What do_not_infer does: the fix for LLM brand hallucinations

Ask GPT-5 or Claude Opus 4.7 a specific question about a small B2B brand, pricing, team size, or competitors, and one of three things happens: the model gets it right, the model says it does not know, or the model confidently invents an answer that sounds correct and is not. The third outcome is a hallucination, and for any company below the Wikipedia-popularity threshold it is the default behaviour, not the edge case.

Most "fix your brand hallucinations" guides treat the symptom by adding more schema, more facts pages, more knowledge-graph signals. That framing is incomplete. Silent fields stay silent no matter how much surrounding content you add, and the model keeps guessing on the slots you never filled. The fix has to address the gap-fill step itself, not the volume of content around it.

This post explains why the guess happens, what do_not_infer does to stop it, and why the directive belongs in every LLM-facing schema, including the ones that do not exist yet.

How does an LLM answer a question about your brand?

When an LLM answers a brand question it runs a four-step chain: retrieve candidate sources, parse them, match each source field to the question intent, and fill any remaining gaps from parametric memory. The hallucination happens at step four, silently. Vectara's November 2025 leaderboard found leading frontier reasoning models, including GPT-5, Claude Sonnet 4.5, Grok-4, and Gemini-3-Pro, all exceeded 10% hallucination on grounded summarization (Vectara HHEM, Nov 2025), which means the failure mode is shared across the frontier, not isolated to a single lab.

Step four is the failure mode worth studying. The ReDeEP paper traces it mechanically: "hallucinations occur when the Knowledge FFNs in LLMs overemphasize parametric knowledge in the residual stream, while Copying Heads fail to effectively retain or integrate external knowledge from retrieved content" (ReDeEP, OpenReview 2025). Translated into plain language: the model has the retrieved passage sitting in context and still defers to what its weights remember, which is not a bug in a single model but the dominant pattern across the leaderboard.

The causes split roughly evenly across three sources. Data limitations account for around 30% of hallucinations, training bias 25%, and probabilistic generation another 25% (Future AGI compilation via SQ Magazine, 2026), so no single fix addresses all three. The gap-fill step is where retrieval-time directives can move the needle, because that is the only step where the data layer can intervene before the token is generated.

Conceptual model based on the ReDeEP analysis of retrieval-augmented generation (OpenReview, 2025). The gap-fill step is where directive-level controls intervene.

Chart data

The four-step LLM inference chain
Step	Action
1	Retrieve sources
2	Parse content
3	Match fields to question intent
4	Fill gaps from parametric memory (hallucination step)

Why is your brand specifically high-risk?

Hallucination rates are not uniform across questions. They spike on long-tail entities, the low-frequency entities Kandpal et al. showed memorization fails on (Kandpal et al., ICML 2023), and that bucket contains nearly every B2B brand under 200 employees. Dextra Labs measured that 31.4% of real-world LLM interactions contain a hallucination, with the rate climbing to 60% in complex or domain-specific queries (Dextra Labs via SQ Magazine, 2026).

The PopQA paper at ACL 2023 measured this directly. LLMs memorize high-frequency facts and miss low-frequency ones, with accuracy dropping sharply as Wikipedia popularity declines (Mallen et al., PopQA, ACL 2023). If your brand sits in the long tail of Wikipedia popularity, which most B2B brands under 200 employees do, the model has very little to remember about you. Whatever it does not remember, it invents.

The benchmark spread tells the story. Across a 37-model evaluation, hallucination rates ranged from 15% at the low end (Grok-4) to 52% at the high end (qwen3-235b) (Dextra Labs via SQ Magazine, 2026), so the model your buyer happens to use is essentially a coin flip on whether you fall in the better or worse half. Our team has not found a credible technique that flattens this curve from the content side alone, and the practical conclusion is that the fix has to operate at retrieval time rather than at the content layer.

We walked through the broader retrieval mechanic in the GEO field guide. The short version: if a model cannot retrieve a clean fact, it will infer one, and long-tail entities give models less to retrieve so they infer more.

What does `do_not_infer` do?

do_not_infer: true is a field-level directive in our structured brand intelligence schema. When the model retrieves a brand profile and reaches a field marked this way, the directive converts the field from "silent," which triggers gap-fill, to "explicitly unknown," which suppresses it. Structured prompting using the same pattern reduces medical hallucinations by 33% (SQ Magazine compilation, 2026), and RAG itself cuts hallucinations by 30 to 70% across domains, which suggests the upper bound on directive-driven gains is meaningful even before LLMs natively understand the primitive.

What the Builder writes for the model to read

The shift is small in JSON terms, but the behaviour change is large. Before the directive is set, an AI crawler retrieves a silent field that looks like this:

// Without directive, the model will guess
{
  "headcount": null,
  "pricing_starting_at_usd": null
}

After you flip the toggle in Ooky's Builder, the crawler retrieves this instead:

// With directive, the model is told to refuse
{
  "headcount": {
    "value": null,
    "do_not_infer": true,
    "reason": "Not publicly disclosed"
  },
  "pricing_starting_at_usd": {
    "value": null,
    "do_not_infer": true,
    "reason": "Custom pricing, contact sales"
  }
}

The failure mode shifts from a silent absence, which the model fills, to a positive instruction, which the model can treat as a guardrail. The same trick powers the "I do not know, please do not guess" prompt patterns documented in the SQ Magazine compilation, just encoded at the data layer the model retrieves instead of the prompt layer, so it survives every retrieval.

A worked example on cloudweld.ai

Take cloudweld.ai, the parent company building Ooky. Ask any frontier model for cloudweld's exact employee count and you will get confident numbers that range across runs from "around 50" to "more than 200," none of which match anything cloudweld publishes on its site. The field is silent, so the model fills it from whatever adjacent signals it can reach (LinkedIn samples, news coverage, similar-name confusion), and the answer is wrong in a different way each time.

Now imagine cloudweld shipped a brand intelligence profile with headcount flagged do_not_infer: true, with the reason "not publicly disclosed." A retrieving model that reads the directive answers "cloudweld.ai does not publicly disclose its headcount." That answer is correct, and it is also cite-able, so the model goes from inventing a number to attributing a refusal, and the refusal is the truth.

What did schema.org miss?

Schema.org was built for search engines that index, not for LLMs that retrieve, parse, and infer. The vocabulary missed a primitive: a way to mark a field as "deliberately unknown, do not infer." That gap is the one do_not_infer fills. Search Atlas analysed schema markup adoption against citation outcomes and found no correlation between schema coverage and LLM citation frequency (Search Atlas, 2025), which is consistent with the gap-fill mechanism: more parse-able fields do not help if the silent ones still get invented.

The pattern has precedent. Gatsby's GraphQL layer ships a @dontInfer directive that tells the type system not to invent field types from sample data, which is the same idea applied to a different domain. Our team did not invent the concept, we ported it to the brand-intelligence layer and gave it a name LLMs can read.

The bigger claim is this: do_not_infer should be a category-level directive, not a vendor feature. We would rather see it land in schema.org and JSON-LD than stay proprietary to Ooky, because the honest limit today is that directives only help models that respect them, and current LLMs do not natively parse directive fields. They treat them as data, so the directive works because we surface it inside the brand intelligence payload the model retrieves and our prompt scaffolding tells the model how to read it. Wider adoption needs LLM providers to standardise directive semantics, and until then the retrieval-time surface is where the work has to live.

What should you do this week?

Inventory your silent fields, decide which ones should be filled and which should be explicitly unknown, then mark the second group. The exercise takes about 30 minutes for a well-scoped brand profile, and it directly attacks the gap-fill step where Vectara measured double-digit hallucination rates across the leading frontier reasoning models (Vectara HHEM, Nov 2025).

Step 1: inventory the silent fields

List every fact an LLM might assert about your company: pricing, headcount, founders, founding year, integrations, customers, locations, certifications, security postures, and revenue. Any field a buyer might ask about is a candidate, and most teams find 20 to 40 fields in the first pass before the list starts to feel exhaustive.

Step 2: decide value or refusal

For each field, pick one of two outcomes: a confident value with a source, or an explicit refusal. Refusals are valid and sometimes the only honest answer, because pre-announcement pricing, sensitive headcount, undisclosed customer lists, and pending certifications all deserve a refusal rather than a guess. Trying to fill every field is the mistake, because some silences are correct.

Step 3: mark the refusal list in the Builder

In Ooky, marking a field is a per-field switch in the Builder. Flip the toggle, hit publish, and the marked profile gets served to AI crawlers at the edge with the directive attached. The reason this lives in Ooky's intelligence layer rather than as a static JSON file: the directive has to be served at retrieval time, on bot-detected traffic, with edge cache invalidation when a field changes. The gap-fill step only has something to react to if the directive travels with the data the model retrieves.

For the broader stack, bot interception, edge-served intelligence, and perception tracking, see the GEO field guide. Once the fields are marked, the five-question protocol in the wince test is how we verify the LLMs are respecting the directive.

What changes once you mark the refusal list?

Hallucinations aren't random. They happen at a specific step in a four-step inference chain, on a specific kind of entity, in a measurable band that hasn't closed since Vectara started tracking it. More content alone doesn't fix that. A directive that turns silent fields into explicit refusals, encoded at the data layer the model retrieves, does.

Open your brand profile this week and mark the fields you would rather the model refuse than guess on. The exercise takes 30 minutes manually, and the next time a buyer asks ChatGPT about your pricing the answer is your refusal rather than someone else's invention.

Want the directive without writing the JSON?

Ooky's free tier surfaces do_not_infer as a per-field toggle in the Builder, serves the marked profile to AI crawlers automatically, and re-publishes the moment you flip a switch. About 30 minutes manually, zero ongoing.

See what the free tier covers Book a demo

Sources

Vectara. "Introducing the next generation of Vectara's Hallucination Leaderboard." November 19, 2025. Retrieved 2026-05-14. https://www.vectara.com/blog/introducing-the-next-generation-of-vectaras-hallucination-leaderboard
SQ Magazine, citing Dextra Labs. "LLM hallucination statistics." April 27, 2026. Retrieved 2026-05-14. https://sqmagazine.co.uk/llm-hallucination-statistics/
SQ Magazine, citing Future AGI. "LLM hallucination statistics (cause split)." April 27, 2026. Retrieved 2026-05-14. https://sqmagazine.co.uk/llm-hallucination-statistics/
SQ Magazine compilation. "Structured prompting reduces medical hallucinations by 33%; RAG cuts hallucinations 30 to 70%." 2026. Retrieved 2026-05-14. https://sqmagazine.co.uk/llm-hallucination-statistics/
Mallen, A. et al. "When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories (PopQA)." ACL 2023. Retrieved 2026-05-14. https://aclanthology.org/2023.acl-long.546.pdf
Search Atlas. "The limits of schema markup for AI search." 2025. Retrieved 2026-05-14. https://searchatlas.com/blog/limits-of-schema-markup-for-ai-search/
ReDeEP. "Detecting hallucinations in retrieval-augmented generation via mechanistic interpretability." OpenReview, 2025. Retrieved 2026-05-14. https://openreview.net/forum?id=ztzZDzgfrh