← Newsletter
Newsletter · April 2026

NLP is reshaping how machines understand intent

N
Noah Kanji·April 30, 2026·19 min read

The second half of April 2026 turned "intent" into the operative word of artificial intelligence. In a single ten-day stretch, OpenAI shipped GPT-5.5 with marketing copy that mentioned "intent" four times in three paragraphs, Google's Head of Search told Bloomberg that users have "stopped pre-compressing their real intent into keyword shorthand," BrightEdge declared that AI agents now generate 88% of the request volume of human organic search, and a wave of arXiv papers quantified — for the first time — exactly how often frontier LLMs misread what users actually want. The shift matters because the entire economic logic of the web, from a $300-billion search-advertising market to enterprise SaaS roadmaps, was built on keyword-shaped queries entering keyword-shaped indexes. That assumption is now broken. What replaces it is a stack of dense vectors, late-interaction retrievers, knowledge graphs, and reasoning models that try — imperfectly, expensively, and with measurable failure modes — to answer the question users meant to ask.

This article traces both halves of that story: the technical machinery making semantic intent understanding work, and the business reality reshaping how content gets discovered, cited, and monetized.

From bag-of-words to context vectors

For roughly thirty years, search ran on two equations. TF-IDF scored a term as its frequency in a document multiplied by the logarithm of its rarity across the corpus. BM25, the Okapi refinement that still powers Lucene, Elasticsearch, and OpenSearch, added two crucial fixes: term-frequency saturation (the hundredth mention of "mars" should not count ten times more than the tenth) and document-length normalization. Both algorithms are fast, deterministic, and fundamentally bag-of-words: they ignore order, syntax, and synonymy. "Automobile" and "car" are unrelated tokens; "river bank" and "investment bank" share an embedding of zero.

Dense retrieval, introduced by Karpukhin et al. with Dense Passage Retrieval (DPR) in 2020, replaced sparse term vectors with learned 384-to-4,096-dimensional embeddings produced by transformer encoders. The same word picks up different vectors depending on its neighbors, which is the mechanical foundation of intent capture. The contrast is stark: BM25 vectors are roughly 99.99% zeros over a 30,000-to-1,000,000-token vocabulary; dense vectors are fully populated, indexed via approximate-nearest-neighbor structures like HNSW or IVF-PQ, and synonym-aware by construction. Hybrid retrieval — BM25 plus dense, fused via Reciprocal Rank Fusion — typically improves recall 15-30% over either method alone, and remains the production default at every major vector database vendor (Pinecone, Weaviate, Qdrant, Vespa, Elasticsearch).

The architecture choice underneath matters as much as the embedding. Bi-encoders (Sentence-BERT, DPR, BGE, Cohere Embed, Voyage, Gemini Embedding 001) encode query and document separately, allowing offline indexing and millisecond retrieval at billion-document scale; cross-encoders (MonoBERT, Cohere Rerank 3.5, BGE-Reranker-v2) concatenate the pair and run full attention over both, achieving higher precision at the cost of N forward passes for N candidates. The standard production pattern by April 2026 is a two-stage funnel: hybrid bi-encoder retrieval pulls 100-1,000 candidates, then a cross-encoder re-ranker compresses them to the top five-to-ten — a pipeline that has been shown to lift nDCG@10 by as much as 28%. ColBERT-style late interaction offers a middle path, encoding per-token vectors and computing fine-grained MaxSim scores at retrieval time.

The MTEB leaderboard snapshot heading into late April 2026 tells the competitive story: Google's Gemini Embedding 001 holds the #1 English position at 68.32 average score (67.71 retrieval) with a 3,072-dimensional Matryoshka-truncatable representation; NVIDIA's Llama-Embed-Nemotron-8B tops multilingual; Alibaba's Qwen3-Embedding-8B scores 70.58 on the newer MTEB v2; and Microsoft's open-weight Harrier-OSS-v1 (27B parameters, MIT-licensed) reportedly hit 74.3 on v2, though those numbers come from aggregator blogs rather than the official leaderboard and should be treated as provisional.

Attention is what makes intent legible

The mechanism that made any of this possible is self-attention, the Vaswani et al. 2017 construction in which every token computes Query, Key, and Value vectors and softmax-weights its representation against every other token. Multi-head attention runs the operation in parallel across specialized subspaces; interpretability work has shown specific heads track coreference, syntactic dependencies, negation, and semantic roles. For intent disambiguation, this means a single forward pass simultaneously resolves entity references, action verbs, modifiers, and pragmatic constraints — the entire structure that lets "book me somewhere quiet for Saturday, vegan-friendly, kids welcome" become a coherent retrieval target rather than a soup of tokens.

The 2022-2026 efficiency stack — FlashAttention-1/2/3 (Dao et al.), sliding-window attention (Mistral, Longformer), Native Sparse Attention (DeepSeek), and the December 2025 Block-Sparse Flash Attention drop-in replacement (arXiv 2512.07011, ~1.10× speedup on Llama-3.1-8B reasoning at >99% baseline accuracy) — has pushed practical context windows from BERT's 512 tokens to Gemini 2.5 Pro's 1-2 million, with reported needle-in-a-haystack recall of 99.7% at 1M and 99.2% at 10M tokens. The business consequence is concrete: persistent assistants no longer need RAG for many session-scoped tasks, because intent stated on turn five is reliably retrievable on turn five hundred.

Retrieval-augmented generation grows up — and grows agents

Naive RAG (Lewis et al., NeurIPS 2020) — encode query, vector-search top-K, generate with context — was always brittle on ambiguous or multi-intent queries. The 2024-2026 literature reads as a sustained attempt to fix that. HyDE (Hypothetical Document Embeddings, Gao et al.) inverts the problem: ask the LLM to write the answer first, embed that, then retrieve real documents geometrically near the hallucination. Self-RAG (Asai et al., ICLR 2024) trained a single model to emit reflection tokens that decide whether to retrieve at all. Corrective RAG (Yan et al., 2024) adds a lightweight evaluator that classifies retrieved passages as Correct, Ambiguous, or Incorrect and triggers query rewriting plus web search when confidence drops — yielding up to +12.97% accuracy improvements. Adaptive RAG routes simple queries to no retrieval, medium queries to single-step RAG, and complex queries to multi-hop iteration.

The dominant 2025-2026 pattern, captured in the much-cited Singh et al. survey "Agentic Retrieval-Augmented Generation" (arXiv 2501.09136, with v4 published in April 2026), is agentic RAG: pipelines that reflect, plan, use tools, and collaborate across multiple agents, looping until grounded. Microsoft's GraphRAG — and its November 2024 cost-reduced variant LazyGraphRAG (indexing cost dropped to 0.1% of full GraphRAG) — extracts entities and relationships into a knowledge graph, clusters nodes into hierarchical communities via the Leiden algorithm, and supports both entity-centric "local" search and theme-aggregating "global" search. Cedars-Sinai built a 1.6-million-edge Alzheimer's research graph on it; Precina Health reports a 1% monthly HbA1c reduction in diabetic patients, twelve times faster than standard care, though those clinical numbers come from vendor disclosures rather than independent trials.

The late-April 2026 academic crop is explicit about what intent-aware retrieval now means in practice. SG-RAG (arXiv 2604.22843, submitted April 21) models retrieval as embedding-based subgraph matching and posts absolute gains of 20.68 to 50.88 points across all metrics on its new ERQA benchmark of 120,000 fact-oriented QA pairs. UAE — Utility-Aligned Embeddings (arXiv 2604.22722) trains a bi-encoder to imitate the utility distribution derived from an LLM's perplexity reduction, lifting Recall@1 by 30.59% and Token F1 by 17.3% on QASPER over BGE-Base. TRACER (arXiv 2604.14531) makes the inverse argument: on 77-class and 150-class intent benchmarks (Banking77, CLINC150), frozen embeddings plus classical machine learning can fully replace a Claude Sonnet 4.6 teacher, because intent classes have clean boundaries in embedding space. The implication: the hard part of modern intent recognition is not classification — it is ambiguity resolution, which is exactly where reasoning models earn their cost.

Entity recognition is no longer a side task

Modern named entity recognition replaced BiLSTM-CRF stacks with transformer fine-tunes (BERT/RoBERTa on CoNLL-2003 and OntoNotes), then with instruction-tuned zero-shot LLM extractors (GLiNER, UniNER). The frontier moved to entity linking — disambiguating spans against Wikidata, Wikipedia, or proprietary knowledge graphs. DeepEL (arXiv 2511.14181, November 2025) uses LLMs at every stage with self-validation against global context and reports +2.6% average F1 across ten benchmarks and +4% out-of-domain.

The business significance is that entity resolution now feeds intent resolution directly. "Apple" as company versus fruit is decided by entity neighborhood: CEO and stock price versus orchard and recipe. Google's Knowledge Graph (over 500 billion facts since 2012), Microsoft's Satori (powering Bing and the Microsoft 365 Copilot grounding layer), and Wikidata are the canonical references that LLMs blend with parametric memory and retrieved context. The corollary for content owners is that entity-rich, schema.org-marked, Wikidata-aligned content is structurally preferred by both classical search and LLM-grounded retrieval — not because someone optimized a ranking factor, but because that is the data shape RAG pipelines find easiest to ground against.

Reasoning models change the meaning of "query"

The reasoning-model wave that started with OpenAI's o1 in September 2024, accelerated through o3 (December 2024), DeepSeek R1 (January 2025, with its open-weight pure-RL "aha moment"), Claude 3.7 Sonnet's extended thinking mode (February 2025), Gemini 2.5 Pro (March 2025), and culminated for now in GPT-5.5 (April 23, 2026) and Claude Opus 4.7 (April 16, 2026), has changed what "understanding intent" mechanically means. Instead of one forward pass, a reasoning model generates thousands of internal deliberation tokens before answering. It can restate the user's intent in its own words, generate clarifying sub-questions, verify against retrieved evidence, and reformulate when confidence is low.

The cost is real — two-to-ten times the latency and inference spend — and the routing pattern that has emerged in production is to send easy queries to fast models and hard queries to reasoning models. Anthropic explicitly built this trade-off into the Opus 4.7 launch by adding an "xhigh" effort tier between high and max, and raising Claude Code's default to xhigh; OpenAI's GPT-5.5 launch post described the model as one that lets users "give a messy, multi-part task and trust it to plan, use tools, check its work, navigate through ambiguity, and keep going."

The hardest finding in the late-April 2026 literature is uncomfortable. The CarryOnBench paper (arXiv 2604.27093) constructed 398 seemingly-harmful queries with benign underlying intents, ran 5,970 simulated multi-turn conversations across 14 frontier models, and found that at turn one models satisfied only 10.5-37.6% of the user's actual benign information need. When the same query was rephrased with the benign intent stated upfront, satisfaction jumped to 25.1-72.1%. The paper's blunt conclusion: most refusals stem from intent misinterpretation, not lack of knowledge. That is a quantified failure mode at the heart of the alignment-meets-utility trade-off, and it is now measurable.

What executives said in late April 2026

The industry's framing of intent went on the record this month. Liz Reid, VP and Head of Google Search, on the Bloomberg Odd Lots podcast (April 23, 2026): "It's not just that queries are getting longer. It is that users have stopped pre-compressing their real intent into keyword shorthand." Reid illustrated the point with a restaurant scenario — five people, one vegan, two kids, not expensive, specific neighborhood, Saturday evening — that users had previously truncated to "restaurants New York" because keyword engines couldn't handle specificity. AI Mode users, she said, submit queries two-to-three times longer than in classic search, "giving Google more context and intent."

Sundar Pichai, on the Cheeky Pint podcast (April 7, 2026): "If I fast-forward, a lot of what are just information-seeking queries will be agentic in Search. You'll be completing tasks. You'll have many threads running. Search would be an agent manager in which you're doing a lot of things." On Alphabet's Q1 2026 earnings call (April 29): "Since upgrading AI Overviews and AI Mode to Gemini 3, we've reduced the cost of core AI responses by more than 30%." Pichai also disclosed that Search latency is down 35% over five years even as AI features have proliferated — a signal that the inference-cost curve is bending fast enough to make agentic search financially viable.

OpenAI's GPT-5.5 launch post (April 23, 2026) was unusually direct: "Because the model is better at understanding intent, it can move more naturally through the full loop of knowledge work: finding information, understanding what matters, using tools, checking the output, and turning raw material into something useful." Sam Altman, on the Big Technology Podcast in mid-April 2026, framed the long arc as a memory problem rather than a parsing problem: "Even if you have the world's best personal assistant, they can't remember every word you've ever said in your life. They can't have read every email. They can't have read every document you've ever written" — the canonical pitch for infinite-context, infinite-memory assistants whose intent understanding compounds across years rather than turns.

Demis Hassabis, on Harry Stebbings's 20VC podcast (April 7, 2026), predicted that "a year from now, we will have agents that are 'close' to reliably accepting and completing entire delegated tasks," a reframing of intent understanding from query-level to task-level. Dario Amodei met with White House Chief of Staff Susie Wiles on April 17 in a session both sides described as "productive and constructive," but produced no public April 15-30 quotes specifically on language understanding.

The loudest dissenting voice was Yann LeCun, delivering the Lemley Family Leadership Lecture at Brown University on April 1, 2026: "AI sucks. We have systems that can manipulate language, and they fool us into thinking they are smart because they manipulate language. But in fact, they are completely helpless when it comes to the physical world. There's literally hundreds of billions invested in an industry that basically is counting on the fact that LLMs are going to reach human-level intelligence. It's complete BS." LeCun's argument — that LLMs operate purely on statistical patterns in language and have no genuine understanding — is the load-bearing counter-narrative to the consensus that semantic embeddings plus reasoning chains equal comprehension.

The business stakes have stopped being theoretical

The agency data this month has the texture of a market past the inflection point. BrightEdge's April 8, 2026 release reported that AI agent requests reached 88% of human organic search activity, projected agent traffic to surpass human search by year-end 2026, and noted that only 19% of websites have specific directives for ChatGPT-related bots — leaving an estimated $40 billion in unoptimized search opportunity even under an optimistic 80%-compliance scenario. Gartner published its first standalone Hype Cycle for Agentic AI in April 2026, mapping 27 innovations and placing AI Agent Development Platforms at the Peak of Inflated Expectations with a 2-5 year horizon to plateau; the same firm now predicts that more than 40% of agentic AI projects will be canceled by end of 2027 on cost, unclear value, and weak risk controls, and that by 2028, 90% of B2B buying will be AI-agent intermediated, pushing over $15 trillion in B2B spend through agent exchanges.

Search-traffic economics are in genuine flux. Ahrefs' February 2026 update, replicating an April 2025 study across 300,000 keywords against Google Search Console data, found position-1 click-through rate falls 58% on AI-Overview queries (from 0.073 to 0.016), with position-2 down 50.8% and position-3 down 46.4%. But Seer Interactive's April 24, 2026 update — drawing on 53 brands, 5.47 million tracked queries, and 2.43 billion impressions — found organic CTR on AIO queries rebounded 85% in two months from a December 2025 floor of 1.3% to 2.4% in February 2026, with paid CTR rising from 14.6% to 16.2%. Pages cited inside AI Overviews now collect 35% more organic clicks and 91% more paid clicks. Wil Reynolds posted the underlying data on X on April 24: "Looks like CTR for AIOs is coming back from a low of 1.3% CTR in Dec 2025 to 2.4% in Feb 2026." The clearest reading is that classical SEO traffic took a real hit, that the floor wasn't permanent, and that being cited inside the AI summary is now worth more than ranking below it.

The intent-vs-shortlist behavior shift is even sharper. Growth Memo's April 2026 analysis showed that in classic search, 56% of users built shortlists from multiple sources; in AI Mode, 88% of users took the AI's shortlist without external check, and the AI's top pick became the user's top pick 74% of the time. McKinsey's October 2025 "New front door to the internet" report (still the most-cited reference in 2026 strategy decks) projected that 20-50% of traditional search traffic is at risk and that roughly $750 billion in US revenue will funnel through AI-powered search by 2028. Bain's Natasha Sommerfeld put it bluntly: "AI-generated search results are rewriting the rules, and SEO optimisation is no longer enough."

The citation-pattern data tells brands what to actually do. Yext's analysis of 6.8 million AI citations across ChatGPT, Gemini, and Perplexity (with a 17.2-million-citation follow-up) found that 86%, now 90%, of AI citations come from brand-managed sources — websites and listings — while Reddit accounts for only 2% once you control for location and intent. Conductor's 2026 AEO/GEO Benchmarks (13,770 domains, 21.9 million Google searches, 17 million AI-generated responses, 100 million citations) reported that AI Overviews trigger on 25.11% of searches, that 97% of digital leaders saw positive impact from AEO in 2025, and that 32% rank Generative Engine Optimization as their #1 priority for 2026, with an average of 12% of digital budgets already allocated to GEO. Profound's research showed LinkedIn jumping from the eleventh to the fifth most-cited domain in ChatGPT in three months, the largest shift the firm has tracked.

The strategic frame is converging across agencies. SparkToro's Rand Fishkin published "5 Strategic Features that Predict Survival in the Zero-Click Era" on April 20, 2026, arguing that tactical excellence cannot save businesses whose models Google or AI can replicate — brand and direct relationships are the moat. Fishkin's parallel January 2026 study, in which 600 volunteers ran 12 prompts across ChatGPT, Claude, and Google AI Overviews 2,961 times, concluded that "AIs do not give consistent lists of brand or product recommendations" — a real challenge to the validity of the entire AI-visibility-tracking software category that vendors like Profound (now valued at $1 billion after a $96 million Series C) are selling into. Both can be true: the lottery is real, and the lottery still has odds you can shift.

What businesses should actually do differently

The stack of techniques described above — dense and hybrid retrieval, cross-encoder re-ranking, HyDE, agentic RAG, GraphRAG, entity linking, reasoning-model routing — implies a content and discoverability strategy that no longer rewards keyword density and no longer punishes synonymy. Modern retrieval matches meaning against meaning, often by comparing your content against an LLM-hallucinated expected answer rather than against the user's literal phrasing. The implication is that content that answers questions completely, names entities precisely, and includes the kind of factual scaffolding (statistics, expert quotes, citations) that Princeton's GEO research showed boosts AI visibility by 30-41% will outperform content optimized for SERP features. Wei Zheng, Conductor's Chief Product Officer, framed the shift cleanly: "What truly matters is brand presence inside AI answers."

The tactical playbook that converges across Bain, McKinsey, Conductor, BrightEdge, and Yext has four moves. Diagnose AI presence across engines — ChatGPT, Gemini, Perplexity, Claude, Copilot, and Google AI Mode each cite different sources with different patterns (Yext's data shows Gemini favors websites at 52.1%, OpenAI leans on listings at 48.7%, Perplexity diversifies into MapQuest and TripAdvisor). Diversify content formats — Ahrefs' April 2026 data identified YouTube mentions in titles, transcripts, and descriptions as the strongest correlate with AI Overview visibility. Rebuild measurement — only 16% of brands currently track AI search performance per McKinsey, and AI referrals already account for 1.08% of total website traffic across ten industries, growing roughly 1% month over month. Audit bot policies — the BrightEdge finding that 81% of sites still treat AI agents as legacy bots is the single largest unforced error in enterprise digital strategy right now.

The honest uncertainty

Three things remain genuinely unsettled. First, methodology drift makes headline statistics unreliable: Conductor reports 25.11% of searches trigger AIOs, Semrush reports 15.69%, BrightEdge tracks closer to 50% — three samples produce three numbers, and aggregator blogs routinely re-date Q3-2025 studies as "April 2026 data." Second, the LeCun critique is not dismissable: if statistical language manipulation is not understanding, the entire frame of this article describes very expensive pattern-matching, and Fishkin's inconsistency study is partial empirical support for that view. Third, the agent economics are speculative: Gartner's projection that 40% of agentic AI projects will be canceled by 2027 sits alongside Bain partner Chuck Whitten's observation that "boards are losing patience. They're saying 2026 is the year we need to see this translate into bottom-line results."

Closing the loop

The April 2026 evidence makes one shift unambiguous: the unit of search is no longer a keyword but an intent, and the unit of an intent is no longer a token but a chain of reasoning over entities and retrieved context. The technical machinery — dense retrieval, late interaction, agentic RAG, knowledge graphs, reasoning models, million-token contexts — has matured to the point that frontier products like GPT-5.5, Claude Opus 4.7, and Gemini AI Mode can sustain coherent intent across multi-step tasks, even if CarryOnBench shows they still misread benign intent more than half the time on first contact. The business machinery — GEO budgets averaging 12% of digital spend, 88% of users accepting AI shortlists without external check, $40 billion in unoptimized agent traffic, 90% of B2B spend projected to flow through agent intermediaries by 2028 — has matured faster than most marketing organizations have noticed.

The companies that win the next cycle will be the ones that treat intent as the real currency: writing content that resolves entities cleanly, structuring data that grounds against knowledge graphs, measuring presence inside AI answers rather than position above them, and routing internal AI workloads between fast models and reasoning models the same way Google now routes between AI Overviews and AI Mode. The keyword era ended quietly. The intent era is loud, expensive, and already underway.

N

Noah Kanji

Team Indexy

The Indexy editorial team covers AI search visibility, generative engine optimisation, and the strategies brands use to get cited and selected in AI answers.

Stay ahead of AI search

Start being the answer.

AI selects a few sources. Indexy helps you become one of them.