How Google Detects AI Content (and What SEOs Should Do About It)

4839

•

11-minute read

Author

Inessa takes care of our beloved user community on Facebook and helps with video production. The rest of the time she does sports, reads a lot of fiction and drinks tea.

Date

Apr 07, 2026

What is SynthID?
Can’t we just remove AI watermarks?
Why does Google care?
So what should you actually do?
The final word

A week ago, I came across a fascinating LinkedIn post by SEO consultant Geoff Kenyon, where he described something most content creators have never heard of — an invisible mark that Google embeds into everything its AI tools generate:

Not metadata you can strip. Not a label you can edit out. Something woven into the content itself!

Here’s an example of this hidden mark:

This mark is called SynthID and Google isn't the only one doing it.

In this guide, I've pulled together the most up-to-date information on how AI detects AI content, and what SEO can do about it.

What is SynthID?

SynthID is a watermarking technology developed by Google DeepMind. It embeds an invisible digital watermark into AI-generated content — images, video, audio, and text — at the moment the content is created. The watermark doesn't degrade quality and isn't visible to the human eye, but it is detectable by machines.

Think of it like a serial number stamped into content at birth. You can neither see nor read it, but it's there.

In early 2026, over 10 billion pieces of content have already been watermarked with SynthID.

It's embedded across Google's AI tools including:

Gemini for text,
Imagen for images,
Lyria for audio,
and Veo for video.

But surely, Google isn't alone in building this infrastructure to detect AI content. Over 200 organizations (including Microsoft, Adobe, OpenAI, Meta, BBC, and Amazon) have joined a coalition called C2PA, which developed an open standard called Content Credentials: a kind of digital nutrition label that records who created a piece of content, which tools were used, and whether AI was involved.

OpenAI already embeds these credentials in images generated through ChatGPT and DALL·E.

The approaches differ technically, but the direction is the same across the industry: knowing where content came from is becoming a basic requirement, not a nice-to-have.

Can’t we just remove AI watermarks?

This is the part that surprises most people. For images, the watermark survives cropping, compression, screenshots, and common filters. For video, every frame gets individually marked, so trimming a clip doesn't help either.

The watermark isn't stored in removable metadata — it's embedded in the content itself. Geoff's post demonstrated this vividly: if you isolate the specific pixel frequencies where SynthID hides in a Gemini-generated image and crank up the contrast, a distinct pattern becomes visible. The mark was there all along; you just couldn't see it at normal settings.

Sure thing, determined users can still degrade the watermark through aggressive image manipulation — extreme color distortion, re-encoding with major adjustments — though this risks degrading the content quality too.

And for text, thoroughly rewriting or translating AI-generated content can significantly reduce the detector's confidence score. It's not impenetrable, but casual use doesn't remove it.

Wait — text too?

Yes, and this is the part that gets skipped in most conversations about AI detection.

Large language models generate text one token (roughly one word) at a time, assigning each token a probability score based on how likely it is to come next. SynthID adjusts those probability scores in subtle ways to encode a watermark, without noticeably affecting the quality or meaning of the text.

The resulting pattern of word choices, shaped by those adjusted probabilities, is the watermark. A trained detector can compare that pattern against what watermarked and unwatermarked text typically look like, and make a probabilistic judgment about the content's origin.

One important nuance: text watermarking works best on longer, more open-ended responses. It's less effective on short factual answers — like "What's the capital of France?" — because there's less room to adjust word choices without changing the meaning.

This has happened before

If you've been in SEO for more than a few years, you might have heard about Spintax — automated content that shuffled synonyms to create "unique" articles.

It worked for a while, but then Google got better at detecting it. Exact-match keyword stuffing worked, then it didn't. Content farms built for volume worked, then the Panda update arrived.

Tip

New to Google algorithm updates? Check out our detailed guide covering everything you need to know.

Each time, the shortcut stopped working not because Google suddenly decided to care, but because the detection infrastructure quietly caught up.

SynthID is the detection infrastructure for the AI content era. It doesn't mean AI content is penalized today. But the pattern of SEO history is consistent: the tools that enable easy abuse tend to develop the tools that eventually neutralize it.

Why does Google care?

It's tempting to assume this is about penalizing AI content in search. That's not quite the right frame, at least not yet. The more immediate reason is something researchers call model collapse.

Here's the problem in plain terms: AI models learn from data scraped from the internet. As more and more of that internet gets filled with AI-generated content, future models increasingly end up training on AI outputs rather than human writing.

Research published in Nature found that this process causes a degenerative effect: models gradually forget the true diversity of human-generated data, and their outputs become increasingly narrow and distorted over generations.

In one experiment, researchers fine-tuned a language model using only AI-generated data. By the fourth generation of retraining, a model asked about medieval architecture was producing unrelated text about jackrabbits. That's model collapse in action.

Watermarking is one of the cleanest solutions to this problem. If you can reliably identify which content was AI-generated, you can filter it out of future training data. Google has open-sourced SynthID's text watermarking so any developer can incorporate it into their own models, meaning the infrastructure for AI content detection is spreading beyond just Google's own tools.

So what should you actually do?

Stop using AI for content entirely? No. That's not the takeaway.

The distinction that matters is between AI as a replacement for original thought versus AI as a tool that supports it. Content that offers genuine original insight — proprietary data, first-hand experience, a perspective that doesn't exist elsewhere — has a structural advantage in a world where AI systems are actively trying to distinguish human from machine.

Here's what that looks like in practice.

1. Map your niche before you write anything

Generic content gets detected (and ignored) precisely because it could have been written about any niche by anyone. The antidote is knowing your topic landscape well enough to find the angles nobody else is covering.

RankDots is built for this: it maps the full topic structure of a niche, clusters keywords into coherent subjects, and shows you where real traffic and low competition intersect.

That's your editorial roadmap and it's what separates strategic content from guesswork.

2. Make AI a research and drafting tool, not the author

Use AI to pull together background research, generate outlines, and speed up first drafts.

Then bring in what only you can provide: a contrarian take, a case study from your own experience, data you've collected, or an observation from your industry that isn't already on page one. That's the layer detection systems can't replicate, because it didn't exist before you wrote it.

3. Add original data or first-hand experience wherever possible

Research from Princeton University found that adding original statistics and source citations to content can boost AI citation rates by 30–40%.

The logic is straightforward: AI systems are looking for extractable, verifiable facts. Generic prose gives them nothing to cite, while a proprietary survey, a client result, or even a simple data point you calculated yourself gives them exactly what they need.

What works best according to the research:

Adding statistics and data points: ~40% improvement in AI visibility. Content with specific numbers, percentages, and cited data gets picked up significantly more.
Adding quotations from authoritative sources: ~28% improvement. Expert quotes signal credibility to AI models.
Citing sources explicitly: The top three strategies — Cite Sources, Quotation Addition, and Statistics Addition — improved visibility by 30-40% over baselines.
Best combo: Fluency Optimization + Statistics Addition outperformed any single strategy by 5.5%+.

The pattern is consistent across the web: Wikipedia and Reddit are among the most cited domains in AI Overviews and ChatGPT responses, not because they rank well traditionally, but because they contain specific, concrete, human-contributed answers that AI can extract directly. Original beats derivative every time.

Tip

If you want to keep tabs on how your brand shows up in AI answers, Reddit monitoring is worth adding to your workflow. Tools like Awario let you track mentions in real time, so you catch every thread where your keyword or company comes up.

4. Don't rely on AI detectors as a safety net

While you're thinking about all this, you might be tempted to run your content through a detector and call it done. That's not a reliable plan, to be honest.

Tools like GPTZero, Turnitin, and Originality.ai exist, but their track record isn't reassuring. In a 2023 peer-reviewed study testing 14 of the most popular detectors, not one hit 80% accuracy and several flagged human-written text as AI. OpenAI launched its own classifier and pulled it months later.

Use these tools for a rough sense-check if you want, but don't treat a passing score as clearance.

5. Build a consistent author identity

Google's E-E-A-T guidelines have always rewarded demonstrable expertise, but they matter more now. A byline attached to a real track record — published work, an author page with actual credentials, a consistent voice across multiple pieces — signals something a watermark detector can't measure: that a person with relevant experience wrote this.

The practical checklist is short:

Named author on every piece
Author bio that links out to their work elsewhere (LinkedIn, other publications, a personal site)
Author page on your own site with a clear area of expertise
Consistent tone and perspective across the body of work, not interchangeable with any other writer on the team

This matters beyond Google rankings. AI systems pulling citations prefer attributable sources. A claim from "Staff Writer" carries less weight than the same claim from someone with a verifiable history in the field and that gap is only going to widen.

6. If you want to go further: the grey-hat tactics

Some teams go beyond editing to actively reduce watermark detectability. Google's own technical documentation confirms the methods that work to varying degrees:

Heavy structural rewriting. Thoroughly rewriting AI-generated text — not just swapping synonyms but reworking sentence structure and order — can significantly reduce SynthID's detector confidence score. Swapping "important" for "significant" leaves the underlying token pattern mostly intact. Restructuring the ideas themselves does not.
Translation round-tripping. Translating AI-generated text into another language and back can degrade watermark detectability, because the process effectively regenerates the token sequence from scratch. A 2025 academic study found that back-translation is one of SynthID's most significant vulnerabilities, with detection accuracy dropping sharply even under a single translation pass.
Passing text through a different AI model. Running Gemini output through a non-watermarking model — Claude, GPT-4, Llama — produces new output with an entirely different token sequence. No SynthID signal survives because the content was regenerated, not just edited.

These tactics reduce watermark detectability. They don't make the content more citable, more useful, or more likely to earn links. Passing a detector and being worth citing are two different problems, and the second one is harder to game.

The final word

Good SEO content has always come down to one thing: a real person with a real point of view. SynthID and other AI detection methods don't change the game. They just make it easier to see who's actually playing it.