# ─────────────────────────────────────────────────────────────
# Shiftwork Solutions LLC — robots.txt
# File: robots.txt (root of repo)
# Created: April 5, 2026
# Last Updated: July 16, 2026
# Author: Claude Fable 5 for Jim @ Shiftwork Solutions LLC
#     (original build: Claude Opus 4.6, April 5, 2026)
#
# PURPOSE:
#     Crawler governance for shift-work.com.
#     - Allows all search engine crawlers full access
#     - Allows AI retrieval/search bots (real-time answers + citations)
#     - Allows major-lab AI training crawlers (brand presence inside
#       model knowledge — prospects increasingly get answers from
#       model memory, not just live search)
#     - Blocks data brokers and resale scrapers (no brand benefit)
#     - Points to sitemap.xml
#
# POLICY (two-tier, adopted 2026-06-11):
#     If a bot can put Shiftwork Solutions in front of a prospect —
#     via search results, AI citations, or the model's own learned
#     knowledge → ALLOW.
#     If a bot only harvests content for third-party data resale
#     → BLOCK.
#     Rationale: the site is marketing content engineered to be
#     distributed; the firm's real IP (methodology, benchmark data,
#     diagnostic logic) is deliberately not published.
#
# DEPLOYMENT:
#     Place in the ROOT of the Shiftwork-Solutions-Website GitHub repo.
#     Render serves it at https://shift-work.com/robots.txt
#
# CHANGE LOG:
#     2026-04-05 — Initial build. Full access for search engines,
#                  AI retrieval bots allowed, AI training bots blocked.
#     2026-06-11 — AI CRAWLER REFRESH + POLICY CHANGE (per Jim):
#                  Added retrieval/search bots: OAI-SearchBot,
#                  Perplexity-User, Claude-SearchBot, Claude-User,
#                  DuckAssistBot, MistralAI-User, meta-externalfetcher,
#                  Amazonbot, Google-CloudVertexBot, Applebot.
#                  POLICY: switched from "block all AI training" to
#                  two-tier — major-lab training crawlers (GPTBot,
#                  ClaudeBot, Google-Extended, Applebot-Extended,
#                  meta-externalagent, FacebookBot, anthropic-ai) now
#                  ALLOWED for in-model brand presence; data brokers
#                  and resale scrapers (CCBot, Bytespider, Omgilibot,
#                  Webzio-Extended, Diffbot,
#                  cohere-training-data-crawler) remain BLOCKED.
#     2026-07-16 — Fixed the Sitemap directive to point to the actual
#                  sitemap (sitemap.xml). It previously pointed at
#                  robots.txt itself. No crawler rules changed.
#
# I did no harm and this file is not truncated
# ─────────────────────────────────────────────────────────────

# ═══════════════════════════════════════════════════════════════
# SEARCH ENGINE CRAWLERS — Full access
# ═══════════════════════════════════════════════════════════════
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Slurp
Allow: /

User-agent: DuckDuckBot
Allow: /

User-agent: Baiduspider
Allow: /

User-agent: YandexBot
Allow: /

User-agent: Applebot
Allow: /

# ═══════════════════════════════════════════════════════════════
# AI RETRIEVAL & AI SEARCH BOTS — Allowed
# These bots index or fetch content to answer user questions in
# AI products, with citations. This is where prospects
# increasingly find consultants.
# ═══════════════════════════════════════════════════════════════

# OpenAI — ChatGPT search index
User-agent: OAI-SearchBot
Allow: /

# OpenAI — user-initiated page fetches inside ChatGPT
User-agent: ChatGPT-User
Allow: /

# Anthropic — Claude search index
User-agent: Claude-SearchBot
Allow: /

# Anthropic — user-initiated page fetches inside Claude
User-agent: Claude-User
Allow: /

# Perplexity — search index
User-agent: PerplexityBot
Allow: /

# Perplexity — user-initiated page fetches
User-agent: Perplexity-User
Allow: /

# DuckDuckGo — DuckAssist AI answers
User-agent: DuckAssistBot
Allow: /

# Mistral — user-initiated fetches with citations
User-agent: MistralAI-User
Allow: /

# Meta — user-initiated link fetches in Meta AI
User-agent: meta-externalfetcher
Allow: /

# Amazon — Alexa / Rufus answers
User-agent: Amazonbot
Allow: /

# Google — Vertex AI site grounding (customer-directed retrieval)
User-agent: Google-CloudVertexBot
Allow: /

# Cohere — retrieval
User-agent: cohere-ai
Allow: /

# ═══════════════════════════════════════════════════════════════
# MAJOR-LAB AI TRAINING CRAWLERS — Allowed (policy: 2026-06-11)
# Training inclusion builds brand presence inside the models that
# prospects use. Site content is marketing material designed for
# distribution; the firm's real IP is not published here.
# ═══════════════════════════════════════════════════════════════

# OpenAI — training
User-agent: GPTBot
Allow: /

# Anthropic — training
User-agent: ClaudeBot
Allow: /

# Anthropic — legacy training agent string
User-agent: anthropic-ai
Allow: /

# Google — Gemini training (does not affect Search or AI Overviews)
User-agent: Google-Extended
Allow: /

# Apple — Apple Intelligence training
User-agent: Applebot-Extended
Allow: /

# Meta — training
User-agent: meta-externalagent
Allow: /

# Meta — legacy crawler
User-agent: FacebookBot
Allow: /

# ═══════════════════════════════════════════════════════════════
# DATA BROKERS & RESALE SCRAPERS — Blocked
# These harvest content for third-party datasets and resale.
# No path to a prospect; no brand benefit.
# ═══════════════════════════════════════════════════════════════
User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: Webzio-Extended
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-training-data-crawler
Disallow: /

# ═══════════════════════════════════════════════════════════════
# DEFAULT — Allow everything else
# ═══════════════════════════════════════════════════════════════
User-agent: *
Allow: /

# ═══════════════════════════════════════════════════════════════
# SITEMAP
# ═══════════════════════════════════════════════════════════════
Sitemap: https://shift-work.com/sitemap.xml