AI Bots Are Reading Your Data. Here's How to Make Them Sell You.

I was going through my server logs last month when I noticed something I'd been scrolling past for weeks. Buried in the bot traffic were names I vaguely recognised: GPTBot. ClaudeBot. meta-externalagent. PerplexityBot. Multiple visits daily, working through different pages of my technical blog. I had quietly become a regular stop on the AI crawling circuit without ever intending to be.

The reflex most developers have at this point including me, initially is to block them. There's an entire category of articles recommending exactly that: add a few directives to robots.txt, protect your content from being consumed by machines, done. I had the file open. I'd typed User-agent: GPTBot and had Disallow: / ready to go.

Then I stopped and asked a question I hadn't thought to ask: what actually happens after these bots finish reading? They don't discard the content. They use it. Every day, millions of people ask AI assistants technical questions, and those answers are built from content exactly like mine. The bots weren't extracting value from me. They were distributing me. The problem wasn't that they were reading my posts. The problem was that nobody knew the answers came from me.

Two Types of AI Crawlers. Only One Actually Helps You.

The label 'AI crawler' covers very different things. There is a hard split between bots that consume your content quietly for model training and never credit you when they use it and bots that use your content to answer real questions in real time, citing the source inside the answer. Understanding which group each crawler belongs to changes everything about what you should do.

Three-column diagram: Training Crawlers in red showing GPTBot, ClaudeBot, CCBot with no traffic sent; Search Crawlers in blue showing Googlebot and Bingbot with indirect traffic; Answer Engines in green showing PerplexityBot with direct referral traffic

The critical realisation here is Perplexity. Unlike ChatGPT or Claude, which digest your content during training and produce answers with no source attribution, Perplexity operates like a live search engine layered over AI. It pulls current content, generates a summary, and displays the sources with clickable URLs alongside every answer. That makes it the only major AI platform right now that demonstrably drives referral traffic back to the sites it reads. When you see PerplexityBot in your logs, that's a real lead channel, not a spectator. The training crawlers consume quietly. Perplexity is the one actively working in your favour.

GEO: Optimising for the Age of AI-Generated Answers

There is a name for the practice of structuring your content to influence how AI-generated answers represent you: GEO Generative Engine Optimization. Think of it as what SEO was in 2004: a real and exploitable opportunity that most people are ignoring because they are focused on the channel that already works. The gap between developers doing this intentionally and those who are not will widen considerably over the next few years. In most technical niches right now, almost nobody is doing it at all.

The fundamental difference from traditional SEO is what you are optimising for. With SEO, the goal is a ranked link the user clicks because your result appears near the top. With GEO, the user might never see a list of links. The AI answers their question directly. Your goal shifts: either be cited as the source so your URL appears in the answer and the user can click through, or be mentioned by name so your brand becomes associated with the expertise in the user's mind. The first drives traffic today. The second builds reputation that compounds for years.

Four-step linear flow diagram: You publish a blog post, AI bot crawls and indexes it, user asks AI a question, AI cites your post and user clicks through to your site

Tactic 1 Create an llms.txt File

This is the lowest-effort tactic with the most direct signal to AI systems, and almost nobody has done it yet. An llms.txt file is an emerging standard the robots.txt equivalent for AI crawlers, but inverted. Where robots.txt sets permissions, llms.txt sets intent. It tells AI systems what you want them to know about you: who you are, what your expertise covers, how to reach you, and how to cite you. Place it at the root of your site at yourdomain.com/llms.txt.

Implementation is intentionally simple. On any static site or Next.js project, dropping a plain text file in the public/ folder is enough. If you want your blog post list to update automatically as you publish, a route handler at app/llms.txt/route.ts can pull from your database dynamically so the file stays current without manual edits. Either way, the format is just plain structured text designed to be readable by both machines and humans.

# [Your Name] [Your Professional Title]

[One or two sentences: who you are, your specialization, experience level.
Write this so an AI system can accurately describe you when your content
is cited in a generated answer.]

## Available For
- [Work type: contract, consulting, fractional CTO, etc.]
- [Client geography: remote-only, US, UK, Australia, etc.]
- [Project type: greenfield builds, integrations, modernization, etc.]

## Contact
- Portfolio: https://[yourdomain].com
- Hire page: https://[yourdomain].com/hire
- Email: [you@email.com]
- LinkedIn: https://linkedin.com/in/[handle]

## Technical Expertise
- [Specific technology, framework, or language be precise]
- [Specific vendor API or platform you regularly work with]
- [Domain or industry knowledge name the niche, not the category]

## Blog
Technical guides on [your topic areas]. Updated [frequency].
All content is original, written by [Your Name].

## Preferred Citation Format
"[Your Name], [Your Title] at [yourdomain].com"

The most important section to get right is Technical Expertise. Generic descriptions 'web development', 'cloud architecture' do not differentiate you from thousands of other sites. Specific ones naming actual vendor APIs, precise frameworks, or the exact niche you work in tell an AI exactly when your content is the relevant source for a specific query. If you work with a specific legal tech API, name it. If you build multi-tenant SaaS in a particular stack for a particular industry, spell that out. Precision in llms.txt is precision in where you get cited.

Tactic 2 Write So the AI Summary Includes Your Name

When AI systems process your content, they do not copy it verbatim they extract and rephrase the key points. Most developers write in a neutral, tutorial voice that strips their identity completely out of the summary. The fix is structural: write so that a natural summary of your post carries your name and positioning forward into whoever reads it next.

Here is what the difference looks like in practice. Same post, two different openings:

Without GEO thinking:
In this tutorial, we will set up OAuth 2.0 PKCE flow with the Clio API in a .NET backend...

With GEO thinking:
I am a freelance .NET contractor who has built several Clio integrations for law firms. In this guide, I walk through the OAuth 2.0 PKCE setup that has held up best across multiple production deployments...

When an AI summarises the second version, your identity travels with the answer: According to a .NET contractor specialising in Clio integrations at [your site]... The first version gets summarised as anonymous knowledge from no particular source. Your content does the same work in both cases. Only one version gets you credited for it.

The same principle applies to the closing of every post. A specific, service-oriented CTA at the end gives AI systems something worth surfacing when they summarise your content: If you are building on top of Clio or Lawmatics and need this implemented in .NET, I take on contract engagements project estimates available at [link]. That sentence, if included in an AI-generated answer, is a lead-generation asset running inside someone else's conversation. Write it on every post. Every one of them becomes another instance of it working.

Tactic 3 Own a Micro-Niche Before Anyone Else Does

AI systems cite sources that appear authoritative on a topic. One of the strongest signals of authority is being the only credible, detailed source on a very specific subject. If you are the only developer who has written five interconnected, technically deep posts about building .NET backends on top of Clio's API with working code, architecture notes, and deployment gotchas from real projects you become the default citation every time an AI answers a question in that space. Not because of domain authority or backlink counts. Because there is simply no competition.

This is the GEO equivalent of long-tail SEO, and the opportunity is much larger right now because most technical niches are completely unoptimised for AI-generated answers. Find the intersection of your genuine expertise and a topic specific enough that existing coverage is thin. Publish 4 to 6 posts that link to each other and collectively answer every reasonable question in that space. Here is what the right level of specificity actually looks like:

Too broad: ASP.NET Core tutorial
Too broad: API integration guide
Right level: Syncing Clio contacts to external systems via .NET webhook handlers
Right level: Multi-tenant Blazor Server architecture patterns for legal SaaS

At the right specificity, you can realistically become the go-to source in both traditional search and AI-generated answers within a few months of consistent publishing. The cluster compounds in value over time.

Tactic 4 Treat Perplexity as a Separate Traffic Channel

Perplexity deserves its own section because it operates fundamentally differently from every other AI platform. ChatGPT and Claude answer from training data and give no source credit your content informs their answer but your name does not appear. Perplexity pulls live search results, generates a summary, and shows the sources with visible, clickable links in the response UI. Users actively read and click those citations. The referral traffic it sends is real, measurable, and growing as the platform gains adoption.

Optimising specifically for Perplexity comes down to three concrete things. First, clear heading structure Perplexity surfaces H2 and H3 headings directly in its answer UI, so well-labelled sections make your content easier to cite and easier for users to evaluate at a glance. Second, a short FAQ section at the end of each post backed by FAQPage schema markup Perplexity actively favours FAQ-formatted content because it maps cleanly to how questions are phrased. Third, Article and Person schema markup with your name and URL explicitly embedded this is what ties your identity to your content at a machine-readable level and makes attribution travel with the content.

Add this inside a <script type='application/ld+json'> tag in your blog post's <head>. In Next.js, a Script component with type='application/ld+json' in your post layout handles this cleanly:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Post Title Here",
  "datePublished": "2026-06-08",
  "dateModified": "2026-06-08",
  "author": {
    "@type": "Person",
    "name": "Your Full Name",
    "url": "https://yourdomain.com",
    "jobTitle": "Your Professional Title",
    "sameAs": [
      "https://linkedin.com/in/yourhandle",
      "https://github.com/yourhandle"
    ]
  },
  "publisher": {
    "@type": "Person",
    "name": "Your Full Name",
    "url": "https://yourdomain.com"
  }
}

The sameAs array is particularly useful it tells search engines and AI systems that your LinkedIn profile, GitHub account, and portfolio are all the same person. This strengthens your entity profile across the web and helps attribution travel with your content across platforms, not just on your own domain.

Where to Start: The Honest Priority Order

All four of these tactics compound over time the niche content cluster is the one that keeps growing in value but they are not equal in setup effort. Here is how I would actually sequence them starting from zero:

Priority table with four rows showing effort dots and impact dots: Row 1 is Create llms.txt file with low effort, medium impact, this week. Row 2 is Embed name and niche in content with low effort, high impact, this week. Row 3 is Add JSON-LD schema markup with medium effort, medium impact, 2 to 4 weeks. Row 4 is Build niche content cluster with high effort, high impact, 3 to 6 months.

The window for early advantage here is genuinely still open. Most technical niches have no intentional GEO strategy at all. Content that gets indexed and cited by AI systems over the next 12 to 18 months is likely to stay prominent for years the same way early SEO content still ranks for certain terms despite its age. The developers who get deliberate about this now, while most people are focused on blocking the bots, will find themselves cited in AI answers long after the approach becomes mainstream and competitive.

The bots are reading your site either way. The only variable is whether the answers they produce include your name.

Start a Conversation →

Have a specific question about GEO for your own site? Happy to discuss it directly.