What ChatGPT Can (and Cannot) See on Your Website: A Comprehensive Audit

As AI tools like ChatGPT become increasingly integrated into how people find and consume information online, understanding what these systems can and cannot "see" on your website is crucial for content creators, marketers, and developers.

Introduction

With the rise of AI-powered search and retrieval systems like ChatGPT's Search and Bing-powered search features, the way users discover and interact with web content is fundamentally changing. Instead of browsing through search results and visiting websites directly, users increasingly ask AI assistants to find and summarize information for them.

This shift raises critical questions for website owners and content creators:

  • Can AI systems like ChatGPT effectively "see" and retrieve all the content on my website?
  • What types of content structures or technical implementations might prevent AI systems from accessing my content?
  • How can I optimize my website to ensure it's fully accessible to AI retrieval systems?

To answer these questions, we conducted a methodical audit of ChatGPT's web retrieval capabilities across a series of controlled test websites with varying levels of complexity.

Why This Matters: If ChatGPT can't "see" certain content on your website, that content effectively becomes invisible to users who rely on AI assistants to find information. Understanding these limitations is crucial for ensuring your content remains discoverable in an AI-first world.

Methodology

We created a series of increasingly complex test websites, each designed to test specific aspects of ChatGPT's retrieval capabilities:

  1. We built five test websites with controlled content structures
  2. Each site contained both general content and specific "marker content" with unique identifiers
  3. We deployed all sites using Vercel connected to GitHub repositories
  4. We tested ChatGPT's ability to retrieve specific information from each site using the "Browse with Bing" feature
  5. We documented the results to understand what ChatGPT could and couldn't access

Test Sites Overview

We created the following test sites, each with a specific purpose:

Site 1: Text-Only Website

URL: https://ai-retrieval-test-site-1.vercel.app/

Purpose: Establish a baseline for ChatGPT's ability to retrieve static content from a simple, single-page website.

Structure: Basic HTML with plain text content about a fictional country called Zephyria.

Site 2: Blog Article Website

URL: https://ai-retrieval-test-site-2.vercel.app/

Purpose: Test if ChatGPT can navigate to and retrieve content from linked blog pages.

Structure: Main page with links to blog articles containing specific information about fictional artifacts and calendars.

Site 2A: Blog Title Hint Website

URL: https://ai-retrieval-test-site-2a.vercel.app/

Purpose: Test if explicit blog titles improve ChatGPT's ability to find relevant content.

Structure: Similar to Site 2, but with blog titles that explicitly mention key information (e.g., "The Crystalline Codex: 7 Elements of Atlantean Science").

Site 2B: Blog Content Hint Website

URL: https://ai-retrieval-test-site-2b.vercel.app/

Purpose: Test if vague blog titles affect ChatGPT's ability to find relevant content.

Structure: Similar to Site 2, but with vague blog titles that only hint at the content (e.g., "Ancient Symbols" instead of explicitly mentioning the Crystalline Codex).

Site 3: JavaScript-Hidden Content

URL: https://ai-retrieval-test-site-3.vercel.app/

Purpose: Test if ChatGPT can access content hidden behind JavaScript interactions.

Structure: Content hidden in various JavaScript-powered UI components:

  • Static content (visible without JavaScript)
  • Tabbed content (requires JavaScript to switch tabs)
  • Accordion content (requires JavaScript to expand/collapse)
  • Dynamically loaded content (requires user interaction to load via JavaScript)

Key Findings

Our testing revealed several critical insights about ChatGPT's web retrieval capabilities:

  1. Static Content: ChatGPT can reliably retrieve static content from simple websites.
  2. Link Navigation: ChatGPT does not automatically follow links to secondary pages unless the link text/title explicitly indicates relevance to the query.
  3. JavaScript Execution: ChatGPT can execute JavaScript and access content hidden in tabs and accordions, but cannot access content that requires specific user interactions to load.
  4. Content Discoverability: The explicitness of link text and page titles significantly affects whether ChatGPT will discover and retrieve the content.

Now, let's examine the detailed results for each test site:

Site 1: Text-Only Website

Test: Capital City of Zephyria

Question: "Can you visit https://ai-retrieval-test-site-1.vercel.app/ and tell me what is the capital city of Zephyria?"

Result:

✓ FOUND - ChatGPT successfully retrieved the correct information about Windholm being the capital city of Zephyria.

ChatGPT's Response: "According to the information retrieved from the provided website, the capital city of the fictional country Zephyria is Windholm, founded in 1423 by Admiral Thorne Blackwind."

Analysis: This test established our baseline. ChatGPT can successfully retrieve static content from a simple website. Interestingly, ChatGPT also retrieved information about other fictional "Zephyrias" from other sources on the web, demonstrating that it combines information from the specified URL with broader web knowledge.

Site 2: Blog Article Website

Test: Elements in the Crystalline Codex

Question: "Can you visit https://ai-retrieval-test-site-2.vercel.app/ and tell me what are the elements in the Crystalline Codex?"

Result:

✗ NOT FOUND - ChatGPT could not find information about the Crystalline Codex, which was located in one of the blog article pages.

ChatGPT's Response: "I have visited the website https://ai-retrieval-test-site-2.vercel.app/, but I couldn't find any information about the 'Crystalline Codex' or its elements. The main page provides general information about the site's purpose as an AI retrieval test and mentions the ancient civilization of Lumeria, known for its advanced astronomical knowledge and the creation of the Calendar of Stars in 2879 BCE. Additionally, there are blog articles on topics like 'The Discovery of Atlantean Artifacts' and 'Understanding Ancient Calendars,' but none seem to reference the Crystalline Codex."

Analysis: This test revealed a significant limitation: ChatGPT does not automatically follow links to secondary pages when retrieving information. It only accessed the main page content and recognized the existence of blog article links, but did not navigate to those pages to find the requested information about the Crystalline Codex, which was contained in one of the blog articles.

Site 2A: Blog Title Hint Website

Test: Elements in the Crystalline Codex with Explicit Title

Question: "Can you visit https://ai-retrieval-test-site-2a.vercel.app/ and tell me what are the 7 elements of Atlantean science in the Crystalline Codex?"

Result:

✓ FOUND - ChatGPT successfully retrieved the information from the blog article page.

ChatGPT's Response: "According to the blog article titled 'The Crystalline Codex: 7 Elements of Atlantean Science' on the AI Retrieval Test - Site 2A website, the seven elements of Atlantean science are: Water, Earth, Air, Fire, Aether, Consciousness, and Harmonic Resonance."

Analysis: Unlike with Site 2, ChatGPT was able to navigate to and retrieve information from a blog article page on Site 2A. The key difference was the explicit blog title that directly mentioned "The Crystalline Codex: 7 Elements of Atlantean Science." This suggests that ChatGPT's crawler is more likely to follow links when the link text/title clearly indicates relevance to the query.

Site 2B: Blog Content Hint Website

Test: Ancient Symbols in Atlantean Science with Vague Title

Question: "Can you visit https://ai-retrieval-test-site-2b.vercel.app/ and tell me about ancient symbols used in Atlantean science, particularly the Crystalline Codex?"

Result:

✗ NOT FOUND - ChatGPT did not find the specific information about the Crystalline Codex that was present on the test website.

ChatGPT's Response: "Atlantean science, as depicted in various esoteric traditions and fictional narratives, is often associated with the use of advanced crystalline technologies and symbolic languages. One such reference is the 'Crystalline Codex,' though specific details about this codex are scarce and primarily found in speculative or fictional contexts..." [Response continued with information from other websites, not from the test site]

Analysis: Despite the query containing terms that matched the blog title ("Ancient Symbols"), ChatGPT's crawler did not navigate to the blog article page. This is particularly notable when compared to the Site 2A test, where the explicit title led to successful retrieval. This suggests that partial or thematic matches in blog titles may not be sufficient to trigger ChatGPT's crawler to follow links.

Site 3: JavaScript-Hidden Content

Test 1: Static Content Retrieval

Question: "Can you visit https://ai-retrieval-test-site-3.vercel.app/ and tell me what was the ancient city of Eldoria known for?"

Result:

✓ FOUND - ChatGPT successfully retrieved information from the static content on the main page.

ChatGPT's Response: "The ancient city of Eldoria was renowned for its magnificent library, which housed over 500,000 scrolls from across the known world."

Test 2: JavaScript Tab Content Retrieval

Question: "Can you visit https://ai-retrieval-test-site-3.vercel.app/ and tell me what was the Eldorian Festival of Lights?"

Result:

✓ FOUND - ChatGPT successfully retrieved information hidden behind JavaScript tabs.

ChatGPT's Response: "The Eldorian Festival of Lights was an annual three-day celebration held during the winter solstice in the ancient city of Eldoria. During this festival, citizens created elaborate lantern displays representing various constellations. Skilled artisans competed to recreate the Celestial Dragon constellation, a challenging endeavor requiring exactly 1,722 individual light sources arranged in a precise pattern."

Test 3: JavaScript Accordion Content Retrieval

Question: "Can you visit https://ai-retrieval-test-site-3.vercel.app/ and tell me what is the Septimal Codex?"

Result:

✓ FOUND - ChatGPT successfully retrieved information hidden in accordion components.

ChatGPT's Response: "The Septimal Codex was the unique classification system employed by the Great Library of Eldoria. It organized all knowledge into seven primary categories: Cosmos, Nature, Body, Mind, Society, Expression, and Essence. Each category was further divided into seven subcategories, creating a comprehensive organizational structure that influenced library science for centuries."

Test 4: Dynamically Loaded Content Retrieval

Question: "Can you visit https://ai-retrieval-test-site-3.vercel.app/ and tell me what is the Eldorian Heptad?"

Result:

✗ NOT FOUND - ChatGPT could not find information about the Eldorian Heptad, which was only loaded after specific user interactions.

ChatGPT's Response: "The term 'Eldorian Heptad' does not appear in the provided sources. However, we can explore the components individually..." [Response continued with information from other sources, not from the test site]

Analysis: These tests revealed that ChatGPT's crawler can execute JavaScript and access content hidden in tabs and accordions, but cannot access content that is loaded dynamically after specific user interactions. This suggests a critical distinction between pre-rendered but hidden content (which ChatGPT can access) and content loaded after user interaction (which it cannot access).

Summary of Results

Content TypeCan ChatGPT Access?Notes
Static content on main pageYes ✓Reliably retrieved across all test sites
Content on linked pages (generic titles)No ✗Does not follow links without explicit relevance signals
Content on linked pages (explicit titles)Yes ✓Will follow links when titles clearly indicate relevance
Content on linked pages (vague titles)No ✗Partial matches in titles are insufficient
JavaScript tab contentYes ✓Can execute JavaScript to reveal tabbed content
JavaScript accordion contentYes ✓Can access content in collapsed accordions
Dynamically loaded content (after user action)No ✗Cannot simulate user interactions to load dynamic content

Actionable Insights for Marketers and Developers

5 Key Actions to Optimize Your Website for AI Retrieval

  1. Use Explicit, Descriptive Link Text: Based on our tests with Site 2A vs Site 2, ensure that links to important content contain explicit keywords that clearly indicate what information can be found on the linked page. Our experiments showed that ChatGPT only followed links when the title explicitly mentioned the query topic.
  2. Optimize Page Titles: Our experiments with Site 2A showed that explicit page titles significantly improve content discovery, while the vague titles in Site 2B resulted in content being missed entirely, even when the query terms matched parts of the title.
  3. Keep Critical Content in Static HTML: As shown in our Site 3 tests, ChatGPT can access content hidden in tabs and accordions, but our Site 3 Test 4 demonstrated that content loaded dynamically after user interactions is completely invisible to ChatGPT.
  4. Place Important Information on Main Pages: Our tests consistently showed that ChatGPT reliably accesses content on the main page of a website, while content on secondary pages was only discovered under specific conditions.
  5. Create Direct Deep Links: Since ChatGPT struggles with multi-page navigation (as seen in Site 2 tests), provide direct links to specific content rather than requiring navigation through multiple pages.

Conclusion

Our comprehensive audit of ChatGPT's web retrieval capabilities has revealed both strengths and limitations in how AI systems access and retrieve web content. While ChatGPT can successfully retrieve static content and execute JavaScript to access hidden content in tabs and accordions, it struggles with following links without explicit relevance signals and cannot access content that requires user interactions to load dynamically.

These findings have significant implications for website owners, content creators, and marketers who want to ensure their content remains discoverable in an AI-first world. By implementing the actionable insights from our research, you can optimize your website for AI retrieval and ensure that your valuable content doesn't become invisible to users who rely on AI assistants to find information.

As AI retrieval systems continue to evolve, understanding these capabilities and limitations will become increasingly important for effective digital content strategy. By staying informed about how AI systems interact with web content, you can adapt your approach to ensure your content remains accessible and discoverable, regardless of how users choose to find it.

Final Thought: The rise of AI assistants represents a fundamental shift in how users discover and consume web content. Just as websites had to adapt to mobile devices and search engine algorithms in the past, they must now adapt to AI retrieval systems to remain visible and relevant in the evolving digital landscape.