# ───────────────────────────────────────────────────────────── # Shiftwork Solutions LLC — robots.txt # File: robots.txt (root of repo) # Created: April 5, 2026 # Last Updated: June 11, 2026 # Author: Claude Fable 5 for Jim @ Shiftwork Solutions LLC # (original build: Claude Opus 4.6, April 5, 2026) # # PURPOSE: # Crawler governance for shift-work.com. # - Allows all search engine crawlers full access # - Allows AI retrieval/search bots (real-time answers + citations) # - Allows major-lab AI training crawlers (brand presence inside # model knowledge — prospects increasingly get answers from # model memory, not just live search) # - Blocks data brokers and resale scrapers (no brand benefit) # - Points to sitemap.xml # # POLICY (two-tier, adopted 2026-06-11): # If a bot can put Shiftwork Solutions in front of a prospect — # via search results, AI citations, or the model's own learned # knowledge → ALLOW. # If a bot only harvests content for third-party data resale # → BLOCK. # Rationale: the site is marketing content engineered to be # distributed; the firm's real IP (methodology, benchmark data, # diagnostic logic) is deliberately not published. # # DEPLOYMENT: # Place in the ROOT of the Shiftwork-Solutions-Website GitHub repo. # Render serves it at https://shift-work.com/robots.txt # # CHANGE LOG: # 2026-04-05 — Initial build. Full access for search engines, # AI retrieval bots allowed, AI training bots blocked. # 2026-06-11 — AI CRAWLER REFRESH + POLICY CHANGE (per Jim): # Added retrieval/search bots: OAI-SearchBot, # Perplexity-User, Claude-SearchBot, Claude-User, # DuckAssistBot, MistralAI-User, meta-externalfetcher, # Amazonbot, Google-CloudVertexBot, Applebot. # POLICY: switched from "block all AI training" to # two-tier — major-lab training crawlers (GPTBot, # ClaudeBot, Google-Extended, Applebot-Extended, # meta-externalagent, FacebookBot, anthropic-ai) now # ALLOWED for in-model brand presence; data brokers # and resale scrapers (CCBot, Bytespider, Omgilibot, # Webzio-Extended, Diffbot, # cohere-training-data-crawler) remain BLOCKED. # # I did no harm and this file is not truncated # ───────────────────────────────────────────────────────────── # ═══════════════════════════════════════════════════════════════ # SEARCH ENGINE CRAWLERS — Full access # ═══════════════════════════════════════════════════════════════ User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: Slurp Allow: / User-agent: DuckDuckBot Allow: / User-agent: Baiduspider Allow: / User-agent: YandexBot Allow: / User-agent: Applebot Allow: / # ═══════════════════════════════════════════════════════════════ # AI RETRIEVAL & AI SEARCH BOTS — Allowed # These bots index or fetch content to answer user questions in # AI products, with citations. This is where prospects # increasingly find consultants. # ═══════════════════════════════════════════════════════════════ # OpenAI — ChatGPT search index User-agent: OAI-SearchBot Allow: / # OpenAI — user-initiated page fetches inside ChatGPT User-agent: ChatGPT-User Allow: / # Anthropic — Claude search index User-agent: Claude-SearchBot Allow: / # Anthropic — user-initiated page fetches inside Claude User-agent: Claude-User Allow: / # Perplexity — search index User-agent: PerplexityBot Allow: / # Perplexity — user-initiated page fetches User-agent: Perplexity-User Allow: / # DuckDuckGo — DuckAssist AI answers User-agent: DuckAssistBot Allow: / # Mistral — user-initiated fetches with citations User-agent: MistralAI-User Allow: / # Meta — user-initiated link fetches in Meta AI User-agent: meta-externalfetcher Allow: / # Amazon — Alexa / Rufus answers User-agent: Amazonbot Allow: / # Google — Vertex AI site grounding (customer-directed retrieval) User-agent: Google-CloudVertexBot Allow: / # Cohere — retrieval User-agent: cohere-ai Allow: / # ═══════════════════════════════════════════════════════════════ # MAJOR-LAB AI TRAINING CRAWLERS — Allowed (policy: 2026-06-11) # Training inclusion builds brand presence inside the models that # prospects use. Site content is marketing material designed for # distribution; the firm's real IP is not published here. # ═══════════════════════════════════════════════════════════════ # OpenAI — training User-agent: GPTBot Allow: / # Anthropic — training User-agent: ClaudeBot Allow: / # Anthropic — legacy training agent string User-agent: anthropic-ai Allow: / # Google — Gemini training (does not affect Search or AI Overviews) User-agent: Google-Extended Allow: / # Apple — Apple Intelligence training User-agent: Applebot-Extended Allow: / # Meta — training User-agent: meta-externalagent Allow: / # Meta — legacy crawler User-agent: FacebookBot Allow: / # ═══════════════════════════════════════════════════════════════ # DATA BROKERS & RESALE SCRAPERS — Blocked # These harvest content for third-party datasets and resale. # No path to a prospect; no brand benefit. # ═══════════════════════════════════════════════════════════════ User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: Omgilibot Disallow: / User-agent: Webzio-Extended Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-training-data-crawler Disallow: / # ═══════════════════════════════════════════════════════════════ # DEFAULT — Allow everything else # ═══════════════════════════════════════════════════════════════ User-agent: * Allow: / # ═══════════════════════════════════════════════════════════════ # SITEMAP # ═══════════════════════════════════════════════════════════════ Sitemap: https://shift-work.com/robots.txt