The SEO Pub

Build Validated, Entity-Linked Schema for Any Page (Free Claude Skill)

Mike Friedman — Tue, 09 Jun 2026 12:31:32 +0000

A few weeks ago I wrote about adding custom schema to WordPress and why the templated output from Yoast and Rank Math leaves value on the table. The follow-up question I got most often was some version of: “Okay, but how do I actually generate the schema?”

This is the answer. I built a Claude skill that takes a page, figures out every schema type that applies, builds them into a single connected graph, and links the page’s main entities to their verified Wikipedia and Wikidata entries. It’s free.

The part I’m most happy with is the verification step, which I’ll get to below. Most schema generators will happily invent a Wikidata link that doesn’t exist. This one checks first.

What the Skill Does

You give it a published URL or a draft of the page content. It does the rest.

It reads the page and classifies it. Article, product, local business, service page, event, recipe, whatever it is. A page is often several of these at once, so it captures all of them.

It selects every schema type that applies. Not just the obvious one. A product page also has a publishing Organization, sits in a BreadcrumbList, and may have Review and Offer data. The skill draws from the full Schema.org library, including the specialized types (Event, JobPosting, Course, Recipe, VideoObject, SoftwareApplication, Dataset, and more), and it picks the most specific type available. Plumber instead of LocalBusiness instead of Organization, because specificity helps both Google and AI systems.

It builds one connected graph. Instead of scattering separate schema blocks across the page, it outputs a single @graph with every entity cross-referenced by @id. The Organization is defined once and referenced everywhere it’s needed. This is the current standard, and it’s the format AI systems parse most cleanly, since connected entities are easier to interpret than scattered, disconnected blocks.

It links your main entities to authoritative sources. This is the entity work from the disambiguation note put into practice. The skill identifies the real-world entities your page is about and adds sameAs links to their Wikipedia, Wikidata, and Google Knowledge Graph entries. That’s how you tell Google and the AI systems exactly which “Apple” or “Mercury” or “Springfield” you mean.

It verifies those links before including them. Here’s the part that matters. The skill searches for each entity, confirms the page actually exists, and confirms it refers to the correct entity using the page’s context. It will not output a link it hasn’t verified. If it can’t find an authoritative entry for an important entity (your own brand, for example, which may not be in Wikidata yet), it tells you and asks whether you have a page to use instead.

It flags deprecated rich results. If your page warrants FAQPage or HowTo schema, the skill includes it but notes that Google retired the visible rich result. The schema still helps with understanding and AI citation, so it’s worth keeping, but you won’t see the SERP enhancement you might expect.

Why the Verification Step Matters

Ask most AI tools to generate schema with entity links and they’ll produce something that looks right. The Wikidata Q-numbers will be formatted correctly. The Wikipedia URLs will look plausible. And some meaningful percentage of them will point to pages that don’t exist, or worse, point to the wrong entity entirely.

A sameAs link to the wrong entity is actively harmful. If your page is about your software company and the schema links it to a Wikidata entry for an unrelated company with a similar name, you’ve just told Google and every AI system that your business is something it isn’t. A hallucinated link that points nowhere is wasted. A confident link that points to the wrong thing is damage.

The skill handles this by treating verification as mandatory. It uses Wikidata’s actual entity search, confirms the entry resolves, and checks that the entity’s type and description match your page’s context before it includes anything. The output includes a verification report showing exactly what was confirmed, what wasn’t found, and where it used something you provided. You can see the work.

What Are Claude Skills?

If you haven’t used them, skills are reusable instruction sets for Claude. You install one, and Claude follows that workflow whenever you ask it to do that kind of task, without you having to write a detailed prompt each time. I covered them in more detail in the Entity Analysis note.

Skills work in Claude.ai (web and app), Claude Code, and the API. You need a paid Claude plan.

One important note for this skill: it needs web search enabled, because the entity verification step depends on it. If web search is off, the skill will tell you it can’t verify the entity links rather than guessing.

How to Install It

Download the skill file: [DOWNLOAD LINK]
Open Claude.ai
Go to Settings > Capabilities
Scroll to the Skills section and upload the file
Toggle the skill ON
Make sure web search is also enabled
Start a new chat

To use it, give it a URL:

“Using the schema builder skill, build schema for https://example.com/your-page“

Or paste a draft, or attach a document, if the page isn’t published yet. The skill will run in Draft Mode and remind you to update the URLs and re-verify before publishing.

The output is raw JSON-LD, ready to drop into the custom_schema field from the WordPress note. If you’d rather place it directly in a template, the skill also gives you the ' . "\n"; } add_action( 'wp_head', 'seopub_output_custom_schema' );

Paste the above code into the functions.php file. Remember to change the ‘custom_schema’ part in the code if you used a different name for your custom schema field.

In plain language, the function checks whether the current request is for a single page, post, or custom post type. If yes, it reads the value of the custom_schema custom field for that specific content. If there’s a value, it outputs it as a JSON-LD script tag in the page’s . If there’s no value, the function exits silently.

Schema only appears on pages where you’ve explicitly added it. No templated output for every page. No plugin database dependency.

Step 5: Generate Your Schema

For each page where you want custom schema, build the appropriate JSON-LD. The easiest options:

Schema.org directly for the full reference of available types and properties
Schema Markup Generator from Merkle for common types with a fill-in-the-blanks interface
Schema App or other paid tools for more complex implementations
Or my favorite, just use Claude or ChatGPT. Give them the page URL or a content draft and tell them what schema you want on the page.

Whatever type you’re implementing, the schema needs to accurately reflect what’s actually on the page. Don’t claim review stars you don’t have. Don’t list product prices that aren’t real. Google has been cracking down on schema that doesn’t match page content.

Insert this into the custom field on the WordPress page or post you want it added to.

Step 6: Validate Before Publishing

Always validate your schema before relying on it. Two free tools:

Schema Markup Validator at validator.schema.org
Google’s Rich Results Test at search.google.com/test/rich-results

If the JSON is malformed, both tools will catch it. The Rich Results Test will also tell you whether your schema is eligible for any of Google’s current rich result types.

A Few Practical Examples

What this looks like in practice across common page types:

A product page gets Product schema with name, image, description, brand, offers (price, availability, currency), and aggregateRating if you have eligible reviews.

A service page gets Service or LocalBusiness schema with service areas, hours, contact info, and any relevant offers.

A blog post with frequently asked questions gets FAQPage schema with the actual Q&A pairs. The rich result is gone in Google, but the schema still helps with understanding and AI citation.

An event page gets Event schema with date, location, organizer, and ticket info.

A how-to article gets HowTo schema. Same situation as FAQ. Google killed the rich result, but the schema still helps AI systems and other platforms parsing the content.

Adding SEO entities to schema markup translates your website’s content into a language search engines understand. Instead of relying on keyword matching, this semantic approach accurately defines your brand’s people, places, and concepts, allowing you to control how machines interpret your data. (If you are unfamiliar with this practice, it is something we will cover in a future note.)

The point isn’t to add schema for the sake of adding schema. It’s to communicate the specific structured information that’s actually present on the page, in a format machines can read, in a way that lives with your content rather than your plugin.

The Takeaway

Yoast, Rank Math, and similar plugins handle schema. Both do real work. This note isn’t arguing against using them for everything else they do.

But the schema they generate lives in their database structures. The day you switch plugins or platforms, that schema work disappears with them. The Product offers you configured, the events you set up, the FAQ pairs you entered. Entity relationships you added. All of it tied to a plugin you may not be running in three years.

Custom schema in a custom field is different. It lives with the post. It survives plugin changes, theme updates, and platform migrations. It gives you complete control over the JSON-LD without UI limitations or Pro upgrades. And it works for every modern schema type, including the ones that drive Google rich results, Pinterest Rich Pins, voice assistant answers, and AI search citation.

A custom field, a small function in your child theme, and a few minutes per page. The schema you write today will still be there in five years, regardless of what plugins you’re running.

If you’re going to invest the time to add custom schema, own it.

How AI Visibility Tools Actually Know What People Are Asking ChatGPT

Mike Friedman — Tue, 19 May 2026 12:55:00 +0000

How AI Visibility Tools Actually Know What People Are Asking ChatGPT

If you’ve used any AI visibility tool (Semrush’s AI Visibility Toolkit, Ahrefs Brand Radar, Otterly, Peec, SE Ranking, HubSpot AEO, Profound, Promptmonitor), you’ve probably seen claims like “13.5 million prompts tracked” or “239 million prompts in our database.” Most SEOs accept these numbers at face value without asking the obvious question.

Where do those prompts come from? How does a tool know what real people are asking ChatGPT in private sessions?

The answer involves clickstream data, third-party panels, and an infrastructure that’s been quietly powering SEO tools for over a decade. The methodology determines what the data actually means, and which tool you should trust for which question.

What Clickstream Data Is

Clickstream data is the chronological record of every action a user takes online. Pages visited. Time on page. Clicks. Search queries entered. Results clicked. The path through a session from start to exit.

The term goes back to the early web when “clicks” described most of what users did. TechTarget and Matomo both define it in roughly the same way: a record of user activity that, when aggregated, reveals behavioral patterns.

DataForSEO splits clickstream data into two forms. Aggregated data shows totals over time periods. Unaggregated data shows individual user journeys, click sequences, and visit durations. SEO tools mostly use the aggregated form, processed through algorithms that strip personally identifying information.

How Clickstream Data Is Collected

There are two collection categories, and the second one is where SEO and AI visibility tools get their data.

First-party clickstream data is collected by the site owner. You install tracking on your own site through Google Analytics, Hotjar, Amplitude, Matomo, or server log analysis, and you see what your own users do. This is the kind of data you have direct access to and complete control over.

Third-party clickstream data is collected by data providers who recruit panels of users willing to have their browsing observed. The user installs some piece of software, agrees to data collection in the terms of service (sometimes prominently, sometimes not), and their activity gets aggregated into a panel that data providers sell to third parties.

The software users install typically falls into a few categories:

Browser extensions, often free utilities like coupon finders, ad blockers, or tab managers
Free or freemium antivirus and security software
Free VPN services
Free toolbars
Paid research panels with explicit opt-in
Less commonly today, ISP-level partnerships

Victorious explicitly notes that “SEO tools like Ahrefs and Semrush typically obtain clickstream data by purchasing it from these third-party data providers.” The tools themselves don’t run the panels. They buy the data.

How SEO Tools Have Used This Data for Over a Decade

This part is worth understanding because it sets up the AI visibility section. Clickstream data isn’t a new ingredient in SEO tools. It’s been powering features SEOs interact with every day for years.

Keyword search volume estimates after Google obscured the real numbers in Keyword Planner. Competitor traffic estimates in Semrush Traffic Analytics, SimilarWeb, and similar tools. SERP click-through rate data. Keyword difficulty scoring. Audience demographics.

Semrush’s own KB article is explicit about the source: “The data in our Traffic & Market toolkit comes from our panel of over 200 million real, anonymized internet users across more than 190 countries and regions. We partner with hundreds of clickstream data providers to build this panel, which records billions of events on the internet each month.”

Shahid Shahmiri’s breakdown of Ahrefs’ data sources explains that Ahrefs runs three data pipelines in parallel: their own crawler (AhrefsBot) for link data, third-party clickstream panels for behavioral data, and Google Keyword Planner for keyword existence. The clickstream layer is what powers their traffic and volume estimates.

Every time you’ve looked at a search volume number in Ahrefs or Semrush, you’ve been looking at clickstream-derived data. You just may not have known it.

A Brief Word on Avast

In January 2020, a joint Vice and PCMag investigation revealed that Avast antivirus, with over 100 million users, was selling clickstream data through its subsidiary Jumpshot. The data was detailed enough to identify individuals despite being marketed as anonymized. Customers included major retailers, analytics firms, and SEO platforms. Avast shut Jumpshot down within weeks of the investigation publishing.

This matters because it disrupted the clickstream data market for years. The supply hasn’t disappeared, but it’s been consolidated and diversified. Tools that depend on this data now talk about partnering with “hundreds of providers” rather than a single source, which is partly a hedge against any single provider blowing up the same way Jumpshot did.

Larry Ludwig’s piece on clickstream data providers covers this history and is worth reading if you want the full context. The short version: the clickstream pipeline that powers SEO tools is real, but the sourcing is often deliberately opaque because the industry learned a lesson from Avast.

How AI Visibility Tools Use Clickstream Data: Method 1

The first approach used by AI visibility tools is capturing real prompts from clickstream panel members who use AI platforms. When a panel member opens ChatGPT, Perplexity, Gemini, or Claude and types a prompt, that prompt, the response, and any cited sources get captured by the panel software and aggregated into the tool’s database.

Semrush’s AI Visibility Toolkit KB article states the methodology directly. The exact quote: “We source billions of real prompts from AI search clickstream data and Google’s keyword dataset for AI Overviews.” The toolkit has 239 million prompts and responses across ChatGPT, Gemini, Google AI Overviews, and AI Mode.

Ahrefs Brand Radar uses the same model. Their database currently sits at 13.5 million existing prompts. From Ahrefs’ own piece: “You can track your ChatGPT visibility across 13.5 million existing prompts inside Ahrefs Brand Radar database.”

What this means in practice. When these tools show you “Topics” or “what people are asking” reports, they’re showing aggregated prompts from real users in their panels who happened to use AI platforms during the data collection window. It’s not synthetic. It’s not Google search data dressed up to look like prompt data. It’s actual prompts from a panel large enough to be statistically meaningful at scale.

How AI Visibility Tools Use Clickstream Data: Method 2

The second approach is different. Many AI visibility tools don’t have access to clickstream data at all, or they use it as a supplementary source. Instead, they rely on running prompts that users (or the tool’s AI suggestions) define, on a schedule, and capturing the responses.

Otterly describes the methodology in their own words: “An AI visibility tracker works by automatically sending queries (search prompts) to AI search engines like ChatGPT, Perplexity, Google AI Overviews, and AI Mode, and analyzing the responses for brand mentions, citations, and source links.”

Peec AI runs prompts “once every 24 hours across your selected AI models.” SE Ranking’s ChatGPT Visibility Tracker “scans ChatGPT answers for your target keywords and analyzes which of them end up with your brand being mentioned.” Promptmonitor lets users “track specific prompts or questions in AI optimization.” HubSpot AEO suggests prompts based on company data, then tracks visibility across them.

The methodology is essentially rank tracking applied to AI platforms. The user (or the tool) defines prompts. The tool runs them through APIs or by scraping the AI interface. The tool captures and analyzes the responses on a schedule.

The strength of this approach is precision. You know exactly what was prompted because the tool prompted it. You can monitor specific prompts you care about over time. You can set up brand-specific or competitor-specific tracking and watch trends.

The limitation is that you’re tracking prompts you defined, not necessarily prompts real users are entering. If you assumed the wrong prompts mattered, you’re tracking the wrong data.

Surfer SEO’s overview of AI visibility tools makes the methodology distinction explicit: “Some rely on API responses, while others track what a real user sees in the interface. On top of that, AI answers vary depending on the model, the user’s location, language settings, and even the day/time the prompt is run.”

Why Both Methodologies Have Value

Each approach answers different questions.

Real prompt data (Method 1) tells you what real people are actually asking. This is useful for content strategy: discovering prompts you didn’t know existed, understanding the actual language users employ when talking to AI, identifying topics where there’s measurable search behavior. The limitation is that you’re seeing what was asked across the panel, not necessarily by your specific audience.

Synthesized prompt data (Method 2) tells you how AI platforms answer specific prompts you care about. This is useful for monitoring: tracking whether your brand appears when potential customers ask specific questions, watching trends over time on defined prompts, comparing your visibility to competitors on the same prompts. The limitation is that you’re tracking prompts you assumed mattered, which may or may not reflect real user behavior.

Most mature AI visibility tools combine both. Semrush, Ahrefs, and HubSpot AEO all offer “real prompt” databases for discovery alongside synthesized prompt tracking for monitoring. Smaller or newer tools tend to rely entirely on Method 2 because they don’t have access to clickstream panels at the scale required for meaningful Method 1 data.

The practical implication for you. When you see numbers from these tools, ask which methodology they reflect. If a tool says “We tracked 50 prompts and you appeared in 12,” that’s Method 2. If a tool says “Across our database of 239 million prompts, you were mentioned in 1.2%,” that’s Method 1. They’re measuring different things, and conflating them leads to bad strategic decisions.

The Honest Caveats

Two worth flagging.

Privacy and sourcing transparency. SEO tools are sometimes vague about their exact data sources, and some of that vagueness is deliberate. The Avast situation taught the industry that aggressive data collection can blow up publicly. Tools that buy from “hundreds of providers” have plausible deniability about any single provider’s practices. This isn’t necessarily wrong, but it’s worth knowing that the sausage-making is more complicated than the marketing copy suggests.

Sample bias. Clickstream panels skew toward certain users. People who install free antivirus software, free VPNs, or browser extensions aren’t a representative sample of the entire internet. They tend to be more price-sensitive, more technically casual, and over-indexed in certain demographics and geographies. The data is meaningful but not perfectly representative. This applies to traditional clickstream-derived metrics (keyword volumes, traffic estimates) as much as it applies to AI prompt data. None of these numbers are precise. They’re best understood as directional intelligence, not absolute truth.

The Takeaway

The infrastructure powering AI visibility tools isn’t new. It’s the same clickstream data pipeline that’s been informing SEO traffic estimates and keyword volumes for over a decade, repurposed for a new purpose. Understanding where the data comes from helps you read it more accurately.

Tools that show you “real prompts” are showing you panel-derived data with the same strengths and limitations as Semrush’s traffic estimates or Ahrefs’ search volume numbers. Tools that show you tracked prompts are showing you a rank-tracking methodology applied to AI platforms.

Both are useful. Neither is magic. And the difference matters when you’re deciding which tool to trust for which question.

Google Just Told You to Stop Publishing Commodity Content

Mike Friedman — Tue, 12 May 2026 13:00:00 +0000

At Google Search Central Live in Toronto on April 21, 2026, Danny Sullivan drew a clear line between two types of content. Commodity content: generic, easily replicable, the same topics covered the same way across hundreds of sites. Non-commodity content: specific, experience-driven, original, proprietary insight. Google’s recommendation was direct. Stop publishing the first kind. Start publishing more of the second.

This is essentially Information Gain repackaged as official, plain-language Google guidance.

What Sullivan Actually Said

The “unique, non-commodity content” language wasn’t entirely new. John Mueller had used it in a Search Central blog post in May 2025. But Sullivan’s Toronto presentation gave it sharper definition with concrete industry examples that make the concept easier to apply.

The interior designer example is the most quotable. Commodity content: “2024 Kitchen Trends You Need to See” with stock photos of green cabinets and brass hardware found on Pinterest. Non-commodity content: “Marble vs. Grape Juice: Why I Refused to Install Stone for a Family of Five,” a video showing actual stain tests with grape juice and turmeric to prove the point. Sullivan used similar contrasts for running stores and real estate.

The pattern is consistent. Commodity content can be produced by anyone with no real experience in the space. Non-commodity content requires that someone actually did something, learned something, or has access to information that isn’t already published everywhere else.

What “Commodity Content” Actually Means

Commodity content is content that is easy to reproduce. It usually covers a familiar topic in a familiar way, often using the same structure, the same talking points, and the same generalized advice found across dozens or hundreds of other pages. It is not necessarily wrong. It is not always low quality. But it is interchangeable. If one page disappeared, another could fill the gap with no real loss.

Non-commodity content contains something that’s hard to replicate. Direct experience. Original analysis. Proprietary information. Specific examples. Practitioner judgment. Contextual insight. It gives the reader something more than a reorganized summary of public knowledge.

The clearest self-test I’ve seen comes from Shaun Anderson at Hobo: “Would this be irrevocably lost if this page disappeared tomorrow?” If the answer is no, it’s commodity content.

A second test, from Florian Krückel at SEO Kreativ, sharpens the same idea: “Could ChatGPT write this in 90 seconds, and would the result be essentially identical?” If yes, rewrite it or skip it.

Both tests are useful. The first one focuses on what would be lost. The second one focuses on what’s already easy to produce. Either framing gets you to the same conclusion.

The Information Gain Connection

A few weeks ago I wrote about Information Gain and the Google patents behind it. The 2006 patent (US8140449B1) describes a system that scores documents based on how much novel content they contain relative to all other documents on the same topic. It’s the mechanism for measuring exactly what Sullivan is describing in plain language.

The patent says: pages that introduce information nuggets and entity interactions absent from the rest of the corpus score better. Sullivan says: publish content that contains something other sites don’t have.

Same idea. Different framing.

What’s notable about the commodity content guidance is that it’s official Google language, not a patent that may or may not be in active use. The patent gave us the mechanism. This gives us the editorial test. Both are pointing at the same underlying truth: content that adds something new to the conversation has a structural advantage. Content that restates what’s already out there does not.

Why This Matters More Now

Two things have changed that make commodity content more dangerous than it used to be.

AI has lowered the cost of producing commodity content to nearly zero. Anyone can prompt ChatGPT to produce a competent, generic guide on any topic in seconds. That floods the index with interchangeable pages and forces Google to raise its quality bar to find anything worth surfacing. Google has talked about scaled content abuse and the “Crawled, currently not indexed” signal as quality flags. The bar isn’t moving up because Google decided to be picky. It’s moving up because the volume of commodity content has exploded.

AI Overviews and answer engines are also very good at summarizing common knowledge. If your page is commodity content covering common knowledge, AI systems compress it into summaries with no need to send the user to your site. Non-commodity content, with specific anecdotes and proprietary insight, has the kind of citable detail that gets pulled directly into AI responses with attribution. Commodity content gets summarized away. Non-commodity content gets cited.

This connects to the AirOps and Kevin Indig study I covered last week. Pages with focused, specific content outperformed exhaustive guides. Pages whose headings closely matched the query outperformed pages with broad coverage. The commodity vs. non-commodity framing explains why. Specific is harder to commoditize than broad. Experience-driven is harder to replicate than synthesized.

The Honest Caveat

“Commodity content” is not an officially confirmed ranking signal. It’s a strategic recommendation from Google, not a defined penalty or scoring mechanism. Some commodity content is necessary on most sites. A definitions page, a basic explainer, a foundational topic, those have their place.

The problem isn’t publishing any commodity content. The problem is building your entire content strategy on it. If most of your pages could be replicated by any competitor in 90 seconds, your site doesn’t have a differentiated reason to exist in search results.

The commodity content framing is a strategic lens, not a checklist. It tells you what kind of content is increasingly hard to win with, not what you can never publish.

How to Apply This

This is a self-audit, not an editorial overhaul.

Pull a list of your top pages. Read them. Ask the test questions: would this be irrevocably lost if it disappeared? Could ChatGPT write something essentially identical in 90 seconds? If a page fails the test, it doesn’t necessarily need to be deleted. It needs something added that only you could provide.

Specific things you can add:

Original data from your own work, your clients, your audits, or your industry
Specific examples with names, numbers, and outcomes, not generic case studies
Direct quotes from practitioners, customers, or subject matter experts
Failed experiments and what you learned from them
Photos, videos, or screenshots of actual work
Decisions you made that contradict standard advice, with the reasoning behind them
Industry-specific knowledge that requires real experience to know

The pattern across all of these is that they require something other than synthesis of public information. They require that you, or someone you have access to, actually did something or knows something that isn’t already on the first page of Google.

The tactical move isn’t to publish less. It’s to make sure each page has at least one thing that wouldn’t appear on a competitor’s version of the same content. One specific data point. One real example with a name and a number. One opinion you can defend with experience. That’s the threshold between commodity and non-commodity, and it’s lower than people think.

The Takeaway

Information Gain explained the mechanism. The 2006 patent, the entity interactions, the depth weighting, all of it describes how Google can measure whether a page contributes novel content to a topic. Sullivan’s Toronto talk explained the editorial test in plain language. Both are saying the same thing.

If your page can be replaced by any of the other pages ranking for the same query, you’re not adding anything to the search results. You’re filling space.

The cost of that has gone up. AI has made commodity content cheap to produce, which means there’s more of it, which means Google’s bar for what gets indexed and surfaced has risen. AI Overviews and answer engines compress commodity content into summaries that don’t link back. The middle of the distribution, content that’s fine but not differentiated, has gotten harder to win with.

The fix is the same one it’s always been. Add something to the conversation that wouldn’t be there without you.

Read the Search Engine Roundtable coverage of Sullivan’s Toronto presentation.

AI Search Is Still SEO (Kevin Indig and AirOps Just Proved It)

Mike Friedman — Tue, 05 May 2026 13:00:00 +0000

The AI search panic narrative has been everywhere for the past year. Everything is different now. Traditional SEO is dead. You need an entirely new playbook. The fundamentals don’t apply anymore.

A new study from AirOps and Kevin Indig should put a lot of that to rest.

The Fan-Out Effect analyzed 16,851 queries and 353,799 pages across ChatGPT’s full retrieval pipeline. The findings are clear and the implications are direct. AI search is still SEO. The principles haven’t changed. A few specific tactics need adjusting, but anyone who told you to throw out your SEO playbook was wrong.

This note covers the findings that matter most, and validates a few things I have shared here the past few months.

Retrieval Rank Is the Whole Game

The single most important finding from the study. A page at position 1 in ChatGPT’s retrieval results has a 58% citation rate. By position 10, that drops to 14%. A 4x gap that no amount of content quality can close.

ChatGPT doesn’t pull from some magical alternative source. It runs web searches, gets back ranked results, and cites from there. The retrieval system underneath is doing the heavy lifting. If you don’t rank well in traditional search, you don’t get cited in AI search.

The study tested this against every other variable and the conclusion held. A page with perfect content relevance at rank 11 or worse got cited 21.5% of the time. A page with mediocre content relevance at rank 1 got cited 55.9%. Rank overrides content quality.

That’s the headline argument. The “AI search makes traditional SEO obsolete” narrative collapses under this finding. ChatGPT citations flow through the same retrieval mechanics that have always determined organic search visibility. Great SEO isn’t your obstacle in AI search. It’s your advantage.

Heading Match Is the Primary On-Page Lever

Last week’s note covered semantic distance and the Google patent that describes how heading structure creates semantic relationships on a page. That note explained the mechanics. This study quantifies the impact.

Pages whose headings closely match the query are cited 41% of the time. Pages with weak heading matches get cited 29% of the time. That 12-point gap holds even after controlling for retrieval rank.

The study compared heading match against every other content signal: word count, topical breadth, body copy depth, schema markup, readability. Heading structure was the strongest content predictor of citation. By a meaningful margin.

This connects directly to what last week’s note covered. Headings aren’t just keyword placement opportunities. They’re semantic containers that define what a page is about. When the container clearly maps to the query someone is asking, AI systems and traditional search engines both reward it. When the container is vague or off-topic, both penalize it.

Heading Structure Has a Sweet Spot

The study also found a sweet spot for how many subheadings to use, and a counterintuitive pattern below it.

For articles, the optimal range is 4 to 10 H2-H4 subheadings (33.2% citation rate). The strange finding: articles with 1 to 3 subheadings (28%) perform worse than articles with zero subheadings (30.1%). Half-measures are worse than no structure at all. Either commit to proper structure or don’t bother.

The sweet spot also varies by page type. Articles do best with 4 to 10 subheadings. Product pages, oddly, perform best with zero subheadings (43.2%) and worst with 21 or more (25%). The “other” bucket (forums, landing pages) tracks the article pattern.

The takeaway: don’t apply article-page heading structure to product pages. Product pages are typically focused on a single item and don’t need editorial scaffolding. Different page types have different optimal structures, and forcing the wrong structure on a page hurts more than it helps.

Domain Authority Doesn’t Translate

A few weeks ago I wrote about how Domain Authority and similar metrics get misused. This study delivers one of the most direct empirical contradictions of DA-based thinking I’ve seen.

Always-cited pages have lower DA (53) than never-cited pages (56). Backlinks show a 3x inverse gap. The always-cited pages have an average of 1.1 million backlinks, while the never-cited pages have 3.2 million.

Pages that get cited consistently have fewer links and lower DA than pages that never get cited.

The site-type breakdown is even more damning. Five of the highest-DA site types in the study produce wildly different citation rates: YouTube (DA 100) at 2.4%, Reddit (DA 92) at 29.9%, Major News (DA 94) at 32%, Health Publishers (DA 90) at 46.4%, Wikipedia (DA 95) at 59.2%. Almost identical authority. Citation rates spanning 25x.

DA tells you nothing about citation likelihood. Just like it tells you nothing about how Google evaluates content.

Length Isn’t the Answer

In the recent note on SEO concepts that aren’t helping you, I covered why word count chasing doesn’t work. The study confirms it.

The citation sweet spot is 500 to 2,000 words. Pages over 5,000 words underperform pages under 500 words. Long-form padding actively hurts you in AI search.

The reason is the same one that applies in traditional search. Word count itself does nothing. What helps is covering the topic with depth and specificity. What hurts is padding to hit a target. AI systems appear to be even less tolerant of filler than traditional search results, probably because they’re trying to extract specific, citable information rather than rank pages.

If your content strategy revolves around hitting word count targets, that strategy is working against you in both traditional and AI search.

Focused Beats Comprehensive

This finding partially complicates the standard SEO playbook. The “ultimate guide” approach to content has been a dominant strategy for years. The study suggests it actively hurts AI citation rates.

Pages covering 26 to 50% of ChatGPT’s fan-out subtopics outperform pages covering 100% of them. When primary query relevance is held constant, exhaustive coverage actually reduces citation rate.

The study’s interpretation: exhaustive coverage signals “generalist” content that addresses many topics without depth. Moderate coverage paired with strong primary relevance signals focused expertise.

This loosely connects to information gain. The point isn’t to cover everything that has ever been written about a topic. The point is to cover the right things with depth. A page that nails one question outperforms a page that adequately addresses five. Fan-out subtopics aren’t a content checklist. They’re context.

(Side note: read the recent note on information gain here. Also, I just published a new video expanding on that note that is worth checking out. You can watch that below or over on YouTube.)

If you’ve been building 5,000-word ultimate guides on the assumption that more comprehensive equals more rankable, this study says you should reconsider. Focused, deep coverage of the primary query is what gets cited.

Schema Markup Is a Real Signal

Pages with JSON-LD schema markup have a 6.5 percentage point citation advantage (38.5% vs 32%). The study verified this isn’t explained by other factors. Schema and non-schema pages have similar word counts, heading counts, DA, and query match scores. The schema markup itself is contributing the lift.

The top-performing schema types:

MedicalWebPage: 47% citation rate
BreadcrumbList: 46.2%
FAQPage: 45.6%
Organization: 44.3%
WebSite: 40.6%

Schema markup helps AI systems parse and categorize page content. If you’ve been treating schema as optional, this is a reason to reconsider. It’s one of the few signals in the study that delivers a clear advantage independent of everything else.

Write at a Higher Reading Level Than You Think

This one is genuinely counterintuitive. The “write for an 8th grader” advice has been floating around SEO content guidance for years. The study contradicts it directly.

Flesch-Kincaid grade 16-17 (college level) writing performs best at 35.9% citation rate. Kindergarten-level writing performs worst at 29.6%. The signal peaks at college-level vocabulary and sentence structure, then tapers slightly above grade 18.

The study’s interpretation is that expert-written content tends to use higher-grade vocabulary and more complex sentence structure, and AI systems appear to favor that signal as a marker of expertise.

The practical takeaway: don’t dumb your content down past the level of expertise your audience expects. If you’re writing for practitioners, write at a practitioner level. If you’re writing for technical audiences, use the technical language they actually use. Oversimplifying for an imagined “8th grade reader” who doesn’t exist in your actual audience may be costing you visibility in AI search.

The Takeaway

AI search is still SEO. The principles haven’t changed.

Rank well in retrieval, because nothing else matters if you can’t be found. Use headings that match the query, with proper structure for your page type. Write focused content of appropriate length. Use schema markup. Write at the reading level your audience actually expects. Don’t chase domain authority, because no one is using it.

The “AI changes everything” narrative was wrong. The “you need a completely new playbook” narrative was wrong. A few tactics need adjusting (length targets are tighter, exhaustive coverage hurts more than it helps, expert-level writing matters more than it did), but the fundamentals still work.

The fundamentals are still the work.

Read the full AirOps and Kevin Indig study.

Semantic Distance: Why Your Heading Structure Matters More Than You Think

Mike Friedman — Tue, 28 Apr 2026 13:00:00 +0000

Most SEOs think about headings as places to put keywords. Put your target keyword in the H1, sprinkle related terms into H2s and H3s, and move on. That’s not wrong, but it misses what headings actually do from Google’s perspective.

There’s a Google patent that describes how the search engine uses HTML structure – headings, lists, tables, divs – to determine how semantically close terms on a page are to each other. That closeness directly affects how relevant Google considers the page for queries containing those terms. Once you understand the mechanics, it changes how you think about content organization.

The Concept in 30 Seconds

Semantic distance is a measure of how far apart two meanings are. “Dog” and “cat” are semantically close. They share context: pets, fur, domestication. “Dog” and “carburetor” are semantically distant. Almost no shared context.

Search engines use semantic distance to match intent, not just keywords. That’s the general idea, and it underpins a lot of how modern search works.

But there’s a specific, on-page version of semantic distance that most SEOs aren’t thinking about: how Google interprets the structural distance between terms within a single page. Not the conceptual distance between words in a language model. The literal structural distance between terms as defined by your HTML.

The Patent

US7716216B1, “Document ranking based on semantic distance between terms in a document.” Filed 2004, granted 2010, assigned to Google. Inventors: Georges R. Harik and Monika H. Henzinger. A continuation patent (US8060501B1) was granted in 2011. Bill Slawski wrote the definitive breakdown on SEO by the Sea.

The patent describes a system that locates semantic structures in HTML documents such as headings, lists, tables, divs, even elements styled with larger font sizes, and uses those structures to calculate distance values between terms. Those distance values feed into ranking scores that determine how relevant the page is for a given query.

The key insight: the search engine doesn’t just count the number of words between two terms to figure out how close they are. It looks at the HTML structure and uses that structure to override simple word-count proximity.

How It Works

The classic example from the patent makes this concrete. Imagine a page with the heading “Saturn Facts” and a list beneath it:

Orbit: 10,759 Days
Rotation Period: 10.7 Hours
Mass: 568.5 x 10²⁴ kg
Volume: 82,713 x 10¹⁰ km³
Distance from the Sun: 1,434 x 10⁶ km

Two things happen under the patent’s logic.

First, “Saturn” in the heading is considered semantically close to every item in the list, regardless of position. The word count between “Saturn” and the last list item doesn’t matter. The heading creates a semantic container, and everything inside that container is equally close to the heading. The page is equally relevant for “Saturn mass,” “Saturn volume,” and “Saturn distance from the sun.”

Second, terms within the same list item are closer than terms across different list items. Here’s the counterintuitive part: “Saturn” and “Distance” (the heading and the last list item) are considered closer than “Days” and “Rotation” (the last word of item 1 and the first word of item 2), even though that second pair is visually adjacent on the page. The list boundary between items creates semantic separation that overrides physical proximity.

The patent lays out three rules:

Both terms in the same list item: close.
One term in a heading, one in any list item under that heading: approximately equally close, regardless of which list item.
Terms in different list items: farther apart than either of the above.

The patent also notes that Google looks beyond formal HTML heading tags. A larger font size used as a visual heading can be interpreted as a heading element even without an H1 or H2 tag.

Slightly off-topic, but this is something I have been stressing with people for years. Google can understand heading structures by the layout even if proper H tags are missing. That’s why when you “fix” the missing H tags on a page or correct the order, you don’t see ranking improvement.

Google is trying to understand the visual and structural hierarchy of the page, not just parse HTML tags.

Why This Changes How You Think About Headings

Most SEOs treat headings as keyword opportunities. The patent reframes them as semantic containers that define relationships between every piece of content beneath them.

Heading text creates relationships, not just labels. If your H2 says “Installation Costs by Material Type” and the paragraphs beneath discuss pricing for hardwood, tile, and carpet, Google considers “Installation Costs” semantically close to all three materials. That heading establishes a relationship between the cost concept and every material mentioned in the section.

Now consider what happens if that H2 instead says “Additional Information.” The same content sits beneath it, but the semantic container is weak. Google gets far less signal about how “installation costs” relates to the content below. The heading is doing less work.

What you group together under a heading matters. If you want Google to associate two concepts, put them under the same heading. If you split them across different sections with different headings, you increase the semantic distance between them. This is a content architecture decision that most people make based on readability alone, without considering the semantic consequences.

Lists create equal-distance relationships. Every item in a list is equidistant from the heading above it. You can’t accidentally push a concept further from the main topic by placing it lower in a list. That’s structurally different from running paragraphs, where terms physically farther from the heading accumulate more word-count distance in a traditional proximity model.

Heading hierarchy is semantic nesting. An H3 under an H2 inherits context from the H2. The H2 inherits from the H1. You’re building a tree of semantic relationships. A well-structured hierarchy tells Google not just what each section is about, but how sections relate to each other and to the page’s central topic.

The Connection to Entity-Based SEO

If you’ve been following the entity extraction and information gain notes, this concept slots in directly.

Entities and their attributes are the building blocks. Semantic distance is how you control the relationships between those building blocks on the page. When you place an entity under a heading, you’re telling Google that entity is semantically close to the heading’s concept. When you place two entities under the same heading, you’re telling Google they’re related to each other.

This is entity relationship management at the page level. Your heading structure isn’t just formatting. It’s the mechanism by which Google interprets which entities belong together and how they connect to the page’s central topic.

What to Do With This

Review your heading text. Are your headings specific enough to create meaningful semantic containers? Generic headings like “Overview,” “More Info,” or “Details” create weak containers. Specific headings that name the concept being covered create strong ones.

Check your content grouping. Are related concepts under the same heading? Are unrelated concepts accidentally sharing a section? Look at your most important pages and ask whether the content beneath each heading actually belongs together semantically.

Use lists deliberately. When you have a set of items that should all be equally associated with a concept, a list under a clear heading is structurally stronger than scattering the same items across running paragraphs. The list structure guarantees equal semantic distance from the heading.

Think about heading hierarchy as a relationship tree. Your H1 is the trunk. H2s are branches. H3s are sub-branches. Content under each heading is bound to it semantically. When you’re planning a page, think about which concepts should be siblings (under the same parent heading) and which should be nested (sub-heading under a parent).

Don’t over-optimize. John Mueller has noted that heading hierarchy order doesn’t need to be perfect from Google’s perspective. The patent describes one ranking signal among many. Structure your content for humans first. But know that when you make a structural decision, it carries semantic weight.

The Takeaway

Your page structure isn’t just organization. It’s communication. Every heading you write, every list you create, every section break is telling Google how the concepts on your page relate to each other. The patent is from 2004, but the logic holds. Google is still reading the structure of your pages to understand meaning. Give it a clear structure, and it has a better chance of understanding what your page is actually about.

4 SEO Concepts That Aren’t Helping You

Mike Friedman — Tue, 21 Apr 2026 13:00:00 +0000

Last week’s note covered four SEO metrics that most people are reading at the wrong level. This week is the companion piece: four concepts that people spend real time and energy optimizing for but probably shouldn’t be.

The difference between the two notes is this. Last week’s metrics have legitimate uses when applied correctly. This week’s concepts are either outdated, misunderstood at a fundamental level, or solving a problem that doesn’t exist anymore.

Keyword Density

People still ask this question. “What percentage should my keyword appear in the content?” Five percent? Three percent? Two?

There is no target percentage. There hasn’t been one for a very long time.

Google stopped relying on literal keyword matching years ago. It understands entities, relationships between entities, synonyms, and meaning. When you search “how to fix a leaking faucet,” Google doesn’t count how many times each ranking page says “leaking faucet.” It understands that the page is about plumbing repair, that a faucet is a fixture, that a leak is a malfunction, and that the searcher wants step-by-step instructions. It matches intent and topical coverage, not word frequency.

Yet SEO plugins still flag keyword density. Beginners see a warning that their keyword only appears 1.2% of the time and start cramming it into sentences where it doesn’t belong. The result is content that reads awkwardly, and awkward content doesn’t help anyone.

If you’re writing naturally about a topic and covering the relevant entities with appropriate depth, your target keyword will appear at a natural frequency. You don’t need to count it. If you’re worried about whether Google understands what your page is about, the answer is almost never “use the keyword more times.” It’s “cover the topic more thoroughly.” Those are very different things.

PageSpeed Insights Score

People chase a perfect 100 in Google’s PageSpeed Insights tool like it’s a grade. It’s not.

The Lighthouse score you see in PageSpeed Insights is a lab-based diagnostic tool. It runs a simulated test of your page under controlled conditions and produces a score. That score is useful for identifying specific performance issues: images that aren’t compressed, JavaScript that blocks rendering, layout shifts during load. It’s a debugging tool.

What Google actually uses as a ranking signal (and a very weak one at that) is Core Web Vitals field data. That’s the real-world performance data collected from actual users visiting your site through Chrome. It measures three things: how fast the largest visible element loads (LCP), how quickly the page responds to interaction (INP), and how much the layout shifts unexpectedly during load (CLS). These are measured from real user sessions, not from a simulated lab test.

A site can score 65 in Lighthouse but have perfectly good Core Web Vitals because real users on real connections experience the site just fine. A site can score 98 in Lighthouse but have poor field data because the lab simulation doesn’t reflect how the site actually performs for its audience.

The Lighthouse score and Core Web Vitals field data are related but not the same thing. If you’re going to track page speed as part of your SEO work, look at the Core Web Vitals report in Google Search Console or the field data section in PageSpeed Insights (labeled “Discover what your real users are experiencing”). That’s what Google uses. The number at the top of the screen is for diagnosing problems, not for measuring ranking impact.

Word Count

The idea that longer content ranks better refuses to die. It comes from correlation studies that found pages ranking in the top positions tended to have more words. The conclusion people drew was that writing longer pages would improve rankings.

The problem is that correlation isn’t causation, and the actual cause is straightforward. Longer pages tend to rank better because they tend to cover more entities, answer more questions, address more aspects of the search intent, and provide more information gain. Those things help with rankings. The word count itself does nothing. Google has said this explicitly. There is no minimum word count for ranking, and adding words doesn’t help unless those words add substance.

A 3,000-word page that pads its length with filler, restated points, and generic advice performs worse than a 1,200-word page that covers the topic with depth and specificity. If you read the information gain note, this should click. Google can measure whether a page contributes novel information relative to other pages on the same topic. More words is not more information gain. More novel, specific information is more information gain, regardless of how many words it takes to deliver it.

The practical consequence of chasing word count is that people dilute their content. They add paragraphs restating things they already said. They include sections on tangentially related topics just to hit a number. Every paragraph that restates what the other ranking pages say, or what your own page already said, dilutes the ratio of useful-to-redundant content. That’s the opposite of what you want.

Write until you’ve covered the topic thoroughly. Stop when you’ve said what needs to be said. If that’s 800 words, publish 800 words. If it’s 2,500 words, publish 2,500 words. The number is an outcome of thorough coverage, not a target to aim for.

“Toxic” Links and Disavow Obsession

Third-party SEO tools have a feature that scans your backlink profile and flags links as “toxic.” The flags show up in red. There’s usually a score. It feels urgent. People spend hours compiling disavow files to submit to Google, rejecting links from sites they’ve never heard of.

Most of the time, this is wasted effort.

Google has said repeatedly that its algorithms are very good at identifying and ignoring low-quality links on their own. John Mueller has addressed this directly more than once. The system doesn’t need your help to figure out that a spammy comment link on a random blog isn’t a genuine endorsement of your site. Google just ignores it.

The disavow tool exists for specific situations. If you’ve received a manual penalty related to unnatural links, you may need to disavow the links that caused it. If you previously participated in a paid link scheme and want to clean it up, the disavow tool is appropriate. These are deliberate, known problems where you’re telling Google “I know about these specific links and I want you to ignore them.”

What the disavow tool is not for is going through every link a third-party tool paints red and rejecting it preemptively. Those tools use their own proprietary scoring to determine what’s “toxic.” Their criteria don’t necessarily match what Google considers problematic. A link from a low-DA site with a foreign-language domain might look suspicious to a tool’s algorithm but be a perfectly legitimate link from a real site in another country. Disavowing it doesn’t help you. In some cases, people accidentally disavow links that were actually passing value to their site.

The risk isn’t just wasted time. It’s the possibility of removing links that were helping. If you haven’t received a manual penalty and you aren’t cleaning up a link scheme you knowingly participated in, you almost certainly don’t need to touch the disavow tool. Let Google’s algorithms handle the noise. They’ve been doing it for years.

If anything, use a toxic link designation as a notification that this is a link you may want to take a closer look at. Nothing more.

The Pattern Across Both Notes

Last week’s note and this one share the same underlying problem. People anchor to a number or a concept because it feels concrete and measurable, and they optimize for it without asking whether it actually connects to how search engines work.

The fix is always the same question. Does this thing I’m spending time on directly influence how Google evaluates my site? If the answer is no, or if the answer is “only in a very specific context that doesn’t apply to what I’m doing,” that time is better spent elsewhere.

There’s no shortage of things in SEO that actually matter and actually respond to effort. Spend your time there.

4 SEO Metrics You’re Reading Wrong

Mike Friedman — Tue, 14 Apr 2026 13:00:00 +0000

There are metrics in SEO that feel important but lead to bad decisions when applied without context. The numbers themselves aren’t useless. The problem is how people look at them.

Each of the four metrics below has a specific, narrow context where it’s genuinely useful. Outside that context, it’s noise. The pattern is always the same: a metric that means something at one level of granularity becomes meaningless when zoomed out.

Click-Through Rate in Google Search Console

This is the one I see most often in forums and Slack groups. Someone pulls up their site’s CTR in Search Console, sees 2.1%, and panics. Or they look at a specific page’s CTR, see 3.4%, and start rewriting title tags.

The problem is what that number actually represents. Site-wide CTR is an average across every query the site appeared for, including queries where the site ranked on page 5, page 8, page 12. Those impressions where you ranked 47th and nobody clicked? They’re in that average, dragging the number down. A site could be performing brilliantly for its important queries and still show a low aggregate CTR because it also has thousands of impressions for queries where it barely appeared.

Page-level CTR has the same issue. A single page might rank #3 for its target keyword but also show up at position 40 for a dozen tangentially related queries. Those low-ranking impressions pull the page’s average CTR down even though the page is performing exactly as it should for the query that matters.

CTR is useful in exactly one context: a single query, evaluated relative to its average position. If a query is averaging position 2 and has a 2% CTR, something is wrong. Maybe the title tag doesn’t match the intent. Maybe a featured snippet or AI overview is stealing the click. Maybe the SERP is dominated by ads. That’s a real signal worth investigating. But you can only see it at the individual query level. The aggregate number tells you nothing actionable.

Average Position in Google Search Console

Same mistake, different metric. People look at their site’s overall average position, see something like 28.4, and think their site is performing terribly.

But consider what that number actually averages. A site that ranks #1 for 10 valuable queries and #80 for 200 low-relevance queries it barely targets may show an average position in the 30s or 40s. That’s not a struggling site. That’s a site performing well for its important terms and showing up incidentally for a bunch of things it never optimized for (or are too difficult for it to rank for yet).

Average position only means something at the individual query level, and even there it’s imperfect because it’s averaged across whatever fluctuations happened during the date range you’re looking at. A query that ranked #3 for 20 days and #15 for 10 days will show an average position around 7, which doesn’t reflect either of the positions it actually held.

The useful application is tracking a specific query’s average position over time to spot trends. Is your target keyword gradually improving? Gradually declining? Holding steady? That’s useful. The site-wide number is not.

Domain Authority (and Domain Rating and Authority Score)

Domain Authority is Moz’s metric. Domain Rating is Ahrefs’. Authority Score is Semrush’s. They all attempt to approximate the overall strength of a domain on a 0-100 scale. And they’re all misused in the same three ways.

Misuse 1: Judging link quality. People look at the DA of a site linking to them and use it as a proxy for link value. “I got a link from a DA 72 site” sounds impressive. But link value doesn’t come from the strength of the overall domain. It comes from the strength, relevance, and authority of the specific page linking to you. A link from a strong, well-linked page on a DA 30 niche site can pass more value than a link from a page with zero backlinks on a DA 90 site. When you evaluate a link, you need to look at the linking page, not the domain’s aggregate score.

Misuse 2: Gauging ranking difficulty. People look at the DA of the top 10 results for a keyword and conclude that if they’re all DA 70+, the keyword is too competitive. This tells you almost nothing useful. DA is a domain-level metric. What matters for ranking is the strength of the specific pages at the top: their backlink profiles, their content relevance, their topical authority. A page on a DA 90 domain with no links and thin content is beatable. A page on a DA 40 domain with strong links and deep, relevant content is not. The domain number is a distraction from what actually determines the ranking.

Misuse 3: Tracking your own DA. I see this constantly. People track their site’s DA over time as if it’s a KPI. No search engine on the planet uses Domain Authority. It’s a third-party metric calculated by a third-party tool using that tool’s own methodology and data. It doesn’t factor into Google’s ranking algorithm. It doesn’t appear in any Google patent. Google has said explicitly that they don’t use it. Tracking your own DA is tracking someone else’s estimate of how authoritative your site might be, updated on someone else’s schedule, using criteria that don’t match how Google actually evaluates authority. Your time is better spent tracking metrics that directly reflect performance: rankings for target queries, organic traffic, conversions.

And a warning about link sellers. DA is the favorite metric of people selling links, and that should tell you something. When you see someone advertising that they can get you links on DA 90+ sites, what they’re usually selling are profile links, forum signatures, or directory listings. These are pages with no real content, no backlinks of their own, and often no link path from the root domain of the site. They don’t benefit from the overall strength of the domain because there’s no link equity flowing to them. These links have zero authority.

The DA number is the entire sales pitch because it sounds impressive and most buyers don’t know enough to question it. Here’s my mantra on this: anyone selling links based on DA is either incompetent or a scammer. They’re incompetent because they don’t understand how link equity actually flows through sites. Or they’re a scammer because they do understand it but are using DA to make worthless links sound valuable. Either way, you shouldn’t be buying from them.

Number of Search Results

People search a keyword in Google, see “About 12,400,000 results” at the top of the page, and conclude the keyword is extremely competitive. Or they see “About 8,200 results” and conclude it’s easy pickings. Both conclusions are wrong.

The number of indexed pages that contain words related to a query tells you nothing about the strength of the pages at the top of the results. A query could return 50 million results, but if the top 5 are weak, low-authority pages with thin content, that query is very winnable. A query could return 3,000 results, but if the top 5 are deeply authoritative pages backed by strong link profiles, good luck.

Think of it like a marathon. If you’re the 3rd fastest runner in the field, it doesn’t matter whether there are 4,000 or 40,000 participants. You’re finishing 3rd either way. The total number of runners in the race tells you nothing about how fast the people ahead of you are running. What matters is the competition at the front, not the size of the field behind them.

If you want to evaluate keyword difficulty, look at what’s actually ranking in the top 5 to 10 positions. Look at their content quality, backlink profiles, topical authority, and how well they match the search intent. That’s the competition. The number at the top of the SERP is just how many pages Google found that were tangentially related to the words you typed.

The Pattern

Every one of these mistakes follows the same structure. A metric that means something at a specific, narrow level gets applied at a broad level where it loses all context. CTR means something for a single query at a known position. Average position means something for a single query over time. Link strength means something at the page level. Competitive difficulty means something when you evaluate the actual pages ranking, not the total count.

The fix is always the same question: does this number, at this level, actually tell me what I’m trying to learn? If the answer is no, stop looking at it and zoom in until it does.

Long Title Tags, Automated (Free Claude Skill + ChatGPT GPT)

Mike Friedman — Tue, 07 Apr 2026 13:16:53 +0000

Last February I shared a note about using longer title tags to improve Google rankings. It was based on research from Joy Hawkins at Sterling Sky and Joel Headley, a former Google employee, and backed up by results I was seeing on my own client sites. You can read that original note here.

That note got a bigger response than I expected. A lot of people tried it and saw results. I’ve been using this approach on every client site since, and it’s become one of the most consistently effective on-page changes I make.

So I turned the process into a tool. It’s available as both a free Claude skill and a free ChatGPT GPT. Give it a topic, a brand name, and the page content (or a brief if the page doesn’t exist yet), and it builds a strategically long, multi-segment title tag designed to rank for multiple search intents per page.

Why Long Title Tags Work (The Evidence)

The standard SEO advice is to keep title tags under 60 characters to avoid truncation. That advice is outdated, and the evidence against it is strong.

Joy Hawkins’ team at Sterling Sky tested title tags exceeding 200 characters across multiple pages and documented noticeable ranking improvements. Joel Headley, who spent years at Google, tested thousands of healthcare websites. He injected neighborhood names into title tags and saw a 15% increase in visibility across the sites he tested.

The key insight is that Google reads the entire title tag for ranking purposes, even when it truncates what it displays in search results. Google has also confirmed that when it rewrites a displayed title (which it does frequently), the original title tag is still used for ranking. So the title tag’s job is to help Google understand what the page is about and match it to queries. Display is secondary.

That changes the calculus. If Google reads the whole thing but only shows part of it, you should optimize for ranking, not for what fits in a search snippet.

Since adopting this approach across my client sites, I’ve consistently seen ranking improvements when expanding title tags from the traditional 50-60 character range to 150-250 characters. Not every page sees a dramatic jump, but the direction has been reliably positive. The original note includes specific before-and-after examples from my own work.

How the Tool Works

The tool builds title tags using a multi-segment architecture. Each segment between the hyphens functions as a near-standalone title tag targeting a slightly different search intent. Instead of one short title trying to capture a single query, you get three or four segments that each target a distinct variation of what someone might search for.

Give it a topic, a brand name, and the page content, and it handles the rest. The tool accepts the content as a URL (it will fetch and review the page), pasted text, or an attached document. It scans the actual content before building segments to make sure every segment is supported by what’s on the page.

This is the part that matters most. A title tag is a promise to both users and search engines. If Segment 2 says “affordable” but the page never discusses pricing, that’s a misalignment. If a segment references a location the business doesn’t actually serve, that’s a problem. The tool checks for this and won’t include intents or claims the content doesn’t support.

If the content doesn’t exist yet, the tool runs in Draft Mode. You give it a topic or content brief instead of a live page, and it builds the title tag based on the intended scope. The output is labeled as a draft with a reminder to validate it against the final content before publishing. This is useful when you’re planning a content calendar or topical map and want title tags ready before the pages are written.

What it outputs:

The full title tag (targeting 150-250 characters)
Character count
A rationale for each segment explaining which search intent it targets and why
A content alignment status (verified against actual content, or flagged as draft)

It also includes a Local SEO Mode. If you mention locations, service areas, or neighborhoods, it automatically switches to injecting geo-modifiers into the segments. This is especially useful for service area businesses that need to target multiple locations without creating dozens of thin pages.

The Segment Architecture

Every title tag the tool builds follows this structure:

Segment 1 (Primary). Targets the main keyword. This is the most important segment because it’s what Google is most likely to display. Lead with your strongest keyword here.

Segment 2. Targets a secondary intent, a related query variation, or reframes the topic for a different type of searcher. This should be meaningfully different from Segment 1, not just the same keywords rearranged.

Segment 3. Targets a tertiary intent. This could be a how-to framing, a benefit statement, an objection-handling angle, or a long-tail variation.

Segment 4 (Optional). Only used when a genuinely distinct intent exists that isn’t covered by the first three. The tool won’t force a fourth segment just to add length.

Brand. Always last.

Each segment is separated by a hyphen, and each one should read as a complete, natural phrase. Not a keyword fragment. Not a comma-separated list of terms. A readable title that could stand on its own.

What It Avoids

The difference between a strategically long title tag and keyword stuffing is structure. The tool enforces several rules:

No content misalignment. Every segment must be supported by what’s actually on the page. If the content doesn’t cover a topic, the title tag won’t promise it.

No exact keyword repetition across segments. It uses synonyms, conditional synonyms (words that aren’t dictionary synonyms but function as synonyms in context), and reframings instead.

No fragments. “Best cheap fast plumber” is not a segment. “Affordable Emergency Plumber For Your Home” is.

No forced segments. If three segments plus the brand cover the intent space, it stops at three.

No pipes. Segments are separated by hyphens, not pipes. No real reason. I just hate pipes.

Examples

Here are a few examples showing the before (traditional short title) and after (multi-segment title).

Standard Mode: Product Tool Page

Before: Free SKU Generator – ACME Corp

After: SKU Generator – Create SKUs On Demand For Free – Effortlessly Build SKUs For Your Entire Inventory – ACME Corp (111 characters)

Segment 1 targets the head term “SKU generator.” Segment 2 targets “create SKUs free.” Segment 3 targets the use-case intent of building SKUs for an entire inventory. Three different types of searchers, one title tag.

Standard Mode: Informational Guide

Before: How To Build AI Agents – ACME Corp

After: Learn How To Build AI Agents – Free Guide For Building AI Agents From Beginning To Implementation – Avoid These 5 Mistakes In Building Your AI Agent – ACME Corp (161 characters)

Segment 1 targets the how-to query. Segment 2 targets people searching for a comprehensive guide. Segment 3 targets the mistake-avoidance angle, a distinct informational intent.

Local SEO Mode: Service Business

Before: Emergency Plumber – Acme Plumbing

After: Emergency Plumber in Dallas – 24/7 Plumbing Repair and Drain Services – Fast Emergency Plumbing near Fort Worth and Arlington – Acme Plumbing (143 characters)

Segment 1 targets the primary local query “emergency plumber Dallas.” Segment 2 broadens to service-type variations without a geo-modifier. Segment 3 picks up secondary locations with natural phrasing. Three geo-targets, two service variations, one title tag.

How to Get It

The tool is available in two formats. Same instructions, same output, just different platforms.

Claude Skill

Download the skill file: DOWNLOAD LINK
Open Claude.ai
Go to Settings > Capabilities
Scroll to the Skills section and upload the file
Toggle the skill ON
Start a new chat

To use it: “Using the title tag skill, build a title tag for [topic]. Brand is [brand name].” Then provide the page content by pasting a URL, the text, or attaching a document. If the page doesn’t exist yet, tell it to use Draft Mode and provide a topic or brief instead.

For Local SEO Mode, just include locations in your request: “Using the title tag skill, build a title tag for emergency plumbing services in Dallas, Fort Worth, and Arlington. Brand is Acme Plumbing.”

ChatGPT GPT

LINK TO GPT

Same instructions, same output. Use whichever platform you prefer. If you have both, the only difference is that Claude skills persist across chats while a GPT is a separate conversation each time. That, and I think Claude just does a better job at everything 😉.

One More Thing

This tool handles individual title tags well. But if you’re doing a full site audit or building title tags for an entire content cluster, the real leverage comes from thinking about title tags sitewide. Search engines aggregate title tags across your site to understand your overall topicality. The tool includes semantic SEO principles like conditional synonyms, hypernym-hyponym pairing, and sitewide n-gram awareness to help with this, but that’s a topic for a deeper note down the road.

For now, try it on a few pages and see what happens. The original case study from Sterling Sky has the full evidence if you want to dig in before testing.