Sales Attribution from LLMs: Counting The Invisible
What is “LLM-sourced revenue,” really?
Executives want a clean line from a large language model to a bank deposit. And I want a toilet made out of solid gold! But some things just aren't in the cards.
In this article, I'll define LLM-sourced revenue as any commercial outcome materially caused by an LLM answer, recommendation, or generated workflow that pushed a buyer toward your property. Buyers ask a chatbot. The model cites your brand. The user either clicks a link, copies a URL, or types your name from memory. That causal chain is messy. It blends referral clicks, copy-paste behavior, and pure recall. The result looks like direct traffic in analytics, even when the LLM did the heavy lifting. For better or worse, we have to treat this opaque mess as a channel. Because it has influence. We have to treat it as a discovery surface where your brand fights to be included.
Now, your finance team will resist this framing. Your measurement stack will miss most of it. But the alternative is just waiting around for perfect attribution that will never arrive.
Why does LLM attribution feel like old-school advertising?
Leaders chase precision in a probabilistic world. This structure echoes an older problem that sales veterans recognized a century ago: much of advertising’s impact is real, yet hard to assign with surgical accuracy. The famous line about not knowing which half of ad spend works captures the energy here.¹ LLMs change the interface, not the math. The buyer still forms intent across a haze of stimuli and then acts in ways your analytics cannot fully trace. That truth made broadcast budgets uneasy. The same truth now makes AI search budgets uneasy. The path forward starts with humility, then moves to robust proxies, controlled tests, and institutional discipline. Agencies that accept estimation outperform those that pretend precision. Finance teams that demand proofs like geometry proofs will underinvest. Executives who reward directional clarity will own the advantage.¹
What is actually measurable today?
Operators can measure three things with confidence. Teams can measure the scale of LLM surfaces. Teams can measure the growth of AI referrals to the open web. Teams can measure the leakage into Direct. OpenAI reported 100 million weekly active users at DevDay 2023.² Growth accelerated past 300 million weekly users by late 2024, according to major tech press coverage.³ Web analytics firms report billions of monthly visits to ChatGPT’s domain in 2025, which suggests a durable surface where recommendations originate.⁴ Independent measurement shows AI referrals to publishers surging in 2025, even if they remain small relative to search today.⁵ This evidence does not give you perfect attribution. It gives you directional proof that the surface is real, growing, and economically relevant. Executives should fund measurement accordingly.²
How do LLMs break standard referral tracking?
LLMs break tracking in three ways that matter to revenue teams. Chat interfaces often fail to pass a referrer header, so analytics record a Direct session even when the journey began inside an LLM answer.⁶ Users frequently copy a URL or brand name from the model and then type it directly, which appears as Direct.⁷ Some clients open links in a new tab or in-app browser contexts that suppress or strip referral data, which again collapses into Direct or Unassigned.⁸ These three leaks destroy last-click logic. This unit recommends that teams stop treating Direct as a benign bucket of returning users and start treating it as a sink that absorbs AI influence, dark social, and offline word of mouth. Build your reports around that reality. Keep a running estimate of the “hidden” LLM share inside Direct.⁶
How should I define an “AI” channel in GA4 without waiting for Google?
Teams should create a first-class AI channel grouping in GA4 now. Operators should include known AI referrers and their app variants, then refresh quarterly. Start with chat.openai.com, chatgpt.com, openai.com, gemini.google.com, copilot.microsoft.com, perplexity.ai, you.com, phind.com, and similar. This structure should match on session source and full referrer. This structure should include a custom dimension for “AI vendor” parsed from the referrer host. This unit recommends a second dimension for “AI click type” with values like link, copy-paste, or manual keyed, captured via short vanity URLs or post-purchase prompts. This approach will not catch everything. It will create a consistent lens that converts chaos into a channel you can optimize. Analysts can then trend AI share, conversion rate, average order value, and payback like any other source.⁷
Which attribution models handle AI influence best?
Executives should not expect miracles from last-click. Teams should prefer data-driven models that credit assistive touchpoints. Google’s implementations rely on cooperative game theory, including Shapley value methods, to assign credit by marginal contribution across paths.⁹ ¹⁰ ¹¹ This family of models rewards influence, not just the final click. It better reflects reality when LLMs introduce your brand, then search or direct brings the buyer back later. The important caveat is that Shapley and Markov models still rely on observed paths, which means invisible LLM touches remain invisible to the math. The cure is not to abandon DDA. The cure is to add experiments and proxies that surface LLM impact, then feed those results into budget decisions. Treat models as guides, not judges. Use them to compare scenarios rather than pronounce verdicts.⁹
How can I test the incrementality of LLM presence?
Leaders can run clean experiments even without platform knobs. Marketing teams can run geo experiments where only selected regions receive LLM-specific URLs in public-facing assets, outreach, and PR, while control regions do not. This test approximates LLM exposure by making the preferred path more visible in treatment geos. Econometrics can then estimate lift at the region level.¹² ¹³ Growth teams can rotate vanity URLs like brand.com/ai or brand.com/chat into publicly crawlable assets and track lift relative to baselines. Teams can seed unique offer codes that only appear in AI-targeted FAQs and knowledge pages, then measure redemption share. This structure is not perfect. It is testable. It gives executives a credible signal for budget allocation. It borrows from proven incremental methods used in advertising for a decade.¹²
What proxies estimate LLM contribution when clicks vanish?
Operators should build a proxy basket that moves with reality. Teams should track citation incidence, which is the count and share of prompts where top LLMs recommend or mention your brand for important jobs. Teams should track position quality, which ranks how your brand appears within an answer, including how often a clickable link appears. Teams should track copy visibility, which evaluates whether your brand’s canonical name and short URL appear in the generated text. Teams should track knowledge graph consistency, which measures whether entity facts match your fact file and schema across surfaces. Teams should track AI referral share in analytics and the ratio of Direct to AI over time. Teams should aggregate these signals into a simple index that leadership can trend monthly. The proxy will not be perfect. It will be stable enough for decisions, which is the point.
What do surveys and “How did you hear about us” actually add?
Leaders should treat self-reported attribution as a useful but noisy lens. Teams can add a required, open-text field to high-intent forms and train a light classifier to map model mentions like “ChatGPT” or “Perplexity” to the AI channel. This unit recommends a periodic forced-choice prompt at checkout with AI options included, but always preserve the open text for nuance. Evidence from practitioners shows both the value and the limits of self-reporting. Vendors celebrate its simplicity. Skeptics warn about recall bias and missing data.¹⁴ ¹⁵ ¹⁶ ¹⁷ The practical answer is obvious. Use self-reporting as a triangulation point, not a single source of truth. Compare it to your AI referral share. Compare it to your vanity URL redemptions. Compare it to your experimental lift. The convergence tells you where budget belongs.¹⁴
What does an “LLM Attribution Ladder” look like in practice?
Executives can climb a five-rung ladder that matures over a quarter. Teams first define the AI channel in GA4 and ship a monitored domain list. Teams second deploy vanity URLs and AI-specific offer codes in publicly crawlable assets, documentation, and brand FAQs. Teams third implement self-reported attribution in all high-intent flows with classification for model mentions. Teams fourth set up a quarterly proxy index that blends citations, position quality, AI referral share, and Direct ratio. Teams fifth run region-level tests where public surfaces intentionally feature AI-specific paths. Each rung compounds insight. Each rung turns a fuzzy space into defendable budget. This ladder does not demand platform cooperation. This ladder builds internal muscle while the ecosystem matures. This ladder gives boards something better than vibes and anecdotes.
Where does this leave CFOs and boards?
CFOs want numbers they can audit. CMOs want momentum they can feel. This tension does not go away. Leadership teams should adopt a shared standard that includes a proxy index, experimental lift, and conservative attribution credit from DDA. Finance should sign off on a rule that assigns a fractional share of Direct growth to AI based on controlled tests. Marketing should publish a quarterly memo that ties AI presence to business outcomes using this standard. Agencies should be held to this framework and rewarded for movement in the index and tested lift, not for vanity screenshots of pretty answers. The company that operationalizes this faster will out-learn the competition while others argue about purity.
How should an AI Search Optimization Agency structure the work?
Agencies should productize three streams. Identity stream builds brand fact files, canonical IDs, and schema that stabilize how LLMs describe the company. Content stream builds answer-shaped assets with model-friendly definitions, clear job coverage, and citation-grade formatting. Measurement stream implements the AI channel in GA4, ships vanity paths and codes, stands up self-reporting, and runs tests. Operators should maintain a prompt test suite for critical jobs and score citations weekly. The agency should report a single page each month with the proxy index, experimental results, AI referral share, and a plan for the next adjustments. This unit recommends ruthless focus on a few jobs that print money, not a hundred keywords that flatter dashboards.
What are the hidden risks and failure modes?
Leaders should expect brittle metrics and overfitting. Teams can game citation counts by stuffing brand names into unnatural places. Teams can inflate AI referral share with loose channel definitions. Teams can confuse seasonality for lift. This structure requires guardrails. Publish your domain list for AI channel grouping. Freeze it for the quarter. Document your proxy index weights. Freeze them for the quarter. Pre-register your geo tests with the regions and evaluation windows. Keep legal close to privacy rules. Avoid any fingerprinting tricks. Respect platform terms. Be honest about confidence intervals when you present to the board. An imprecise truth beats a precise lie.
What should executives do next, in order?
Executives should order five moves this month. Teams should create the GA4 AI channel grouping and publish the domain list. Teams should deploy brand.com/ai and a partner short domain that resolves cleanly, then wire those to campaign-level tracking. Teams should add the self-reporting field and surface AI options in forced choice. Teams should launch a small geo test with two matched regions and a three-week window. Teams should stand up the proxy index and commit to a monthly review. This unit expects imperfect data and steady improvement. The organization that sets a standard now will hold the advantage when platforms finally add native reporting. Waiting is a strategy for being outlearned.
What evidence supports the size and direction of this surface?
Leaders should trust multiple sources that point in the same direction. OpenAI announced 100 million weekly users in 2023, which validates the surface at birth.² Press reports pegged ChatGPT at 300 million weekly users by late 2024, which indicates acceleration.³ Measurement firms show billions of monthly site visits to ChatGPT in 2025, which suggests sustained habit.⁴ Industry analyses report AI referrals rising sharply in 2025, which shows growing outbound influence even if totals remain small.⁵ Analytics practitioners document the referrer-stripping and copy-paste dynamics that turn AI influence into Direct.⁶ ⁷ ⁸ Attribution science supports modeling influence with Shapley and incrementality with region tests.⁹ ¹² ¹³ None of this gives a perfect number. All of this gives enough signal to fund the work.²
Sources
- Quote Investigator. Garson O’Toole. 2022. “One-Half the Money I Spend for Advertising Is Wasted, But I Have Never Been Able To Decide Which Half.”
- TechCrunch. Ivan Mehta. 2023. “OpenAI’s ChatGPT now has 100 million weekly active users.”
- The Verge. Jay Peters. 2024. “ChatGPT now has over 300 million weekly users.”
- Similarweb. 2025. “chatgpt.com Traffic & Engagement Analysis.”
- TechCrunch. Sarah Perez. 2025. “ChatGPT referrals to news sites are growing, but not enough to offset search declines.” and Similarweb Blog. 2025. “AI Referral Traffic Winners By Industry.”
- Search Engine World. 2025. “SearchGPT and ChatGPT Referral Tracking: Why your AI traffic looks like Direct.”
- RankShift. 2025. “How to Track Perplexity Referrals in GA4.”
- Medium. Digital Power. 2025. “Measuring AI referral traffic in web analytics.”
- Google Developers. 2024. “Shapley value analysis | Ads Data Hub.”
- Google Ads Help. “About data-driven attribution.”
- Google Analytics Help [Legacy]. “MCF Data-Driven Attribution methodology.”
- Google Research. Kerman et al. 2017. “Estimating Ad Effectiveness using Geo Experiments.”
- Google Research. Vaver and Koehler. 2011. “Measuring Ad Effectiveness Using Geo Experiments.”
- CallRail. 2024. “What you need to know about self-reported attribution.”
- Blend. 2023. “Self-reported attribution: what it is & why you need it.”
- SparkToro. Rand Fishkin. 2023. “The 3 Big Problems with Asking ‘How Did You Hear About Us?’”
- Ruler Analytics. 2022. “Asking ‘How Did You Hear About Us’ Isn’t Enough.”
FAQs
1) What is “LLM-sourced revenue” in practical terms?
LLM-sourced revenue is any commercial outcome materially caused by a large language model’s answer or recommendation that steers a buyer toward your brand. If ChatGPT, Gemini, Copilot, Perplexity, or You.com mentions your company and the user then clicks, copy-pastes, or types your URL from memory, that contribution counts as LLM-sourced revenue—even when analytics records it as Direct.
2) Why does attribution from LLMs look like old-school advertising measurement?
Attribution from LLMs is probabilistic, not surgical. Like broadcast and print, LLM influence often lacks a clean referrer, so impact must be estimated via proxies, controlled tests, and data-driven models rather than proven with last-click precision. The strategy is to accept estimation, standardize methods, and make budget decisions from converging signals.
3) How do LLMs (ChatGPT, Gemini, Perplexity) break standard web analytics tracking?
Three leaks cause under-attribution:
- Many chat surfaces don’t pass a referrer, so visits become Direct in GA4.
- Users copy a brand or URL from the answer and manually type it, again logging as Direct.
- In-app browsers or new tabs can strip referral data. The fix is not perfect tracking; it’s a better measurement framework.
4) How should I define an “AI” channel in GA4 right now?
Create a first-class AI channel grouping. Include sources and referrers like chat.openai.com, chatgpt.com, openai.com, gemini.google.com, copilot.microsoft.com, perplexity.ai, you.com, and phind.com. Add a custom dimension for “AI vendor” (parsed from host) and track “AI click type” via vanity URLs (brand.com/ai), short links, and offer codes surfaced only in AI-targeted content.
5) Which attribution models handle LLM influence better than last-click?
Use data-driven attribution that credits assisting touchpoints, such as Shapley-value-based or Markov-chain approaches. These models estimate each channel’s marginal contribution across the path. They still miss truly invisible LLM touches, so pair them with experiments and proxy metrics to avoid under-investing in AI surfaces.
6) What experiments estimate the incremental impact of LLM presence?
Run geo experiments and path-nudging tests. Example: expose treatment regions to AI-specific paths (brand.com/ai), knowledge pages, and PR that LLMs can cite, while holding out control regions. Add unique offer codes and short links only present in AI-oriented assets. Evaluate regional lift and redemption share to infer incremental revenue from LLM exposure.
7) Which proxy metrics approximate LLM contribution when clicks vanish?
Track a proxy basket and trend it monthly:
• Citation incidence across ChatGPT, Gemini, Copilot, Perplexity, You.com.
• Position quality and link presence within answers.
• Copy visibility of brand name and short URL in generated text.
• Knowledge graph consistency from schema and fact files.
• AI referral share in GA4 and the Direct:AI ratio.
Blend these into a simple “AI Influence Index” to guide budget.