For AI agents: a documentation index is available at /llms.txt — markdown versions of all pages are available by appending .md to any URL path.
Skip to content

Page Size and Truncation Risk

Whether agents can process your pages without losing content. Agent platforms have diverse truncation limits, from 5K characters on some platforms to over 100K on others. Pages that exceed these limits are silently truncated: the agent sees the beginning of the page and loses the rest.

This category also covers the related problem of pages that technically fit within limits but waste most of that budget on boilerplate (navigation chrome, breadcrumbs, sidebars) instead of documentation content.

rendering-strategy

Whether pages contain server-rendered content or are empty client-side application shells.

WeightCritical (10)
Specrendering-strategy

Why it matters

Many agents fetch pages using HTTP libraries that don't execute JavaScript. When a site relies on client-side rendering, agents receive an empty shell with framework boilerplate but none of the documentation content. This isn't a truncation problem; it's a zero-content problem.

The rendering strategy is a property of the framework and its configuration, not the content. Sites using Next.js, for example, can be fully agent-accessible (like react.dev) or deliver empty shells, depending on whether server-side rendering is enabled.

Results

ResultCondition
PassPages contain substantive server-rendered content (headings, prose, code blocks)
WarnPages render server-side but have unusually short body content (legitimately short pages, or partial hydration / lazy loading)
FailSPA shell detected (framework markers like id="__next", minimal visible text, no page-specific content)

When the check warns, the sparse-content-html diagnostic fires if more than 25% of sampled pages are sparse. When the check fails, the spa-shell-html-invalid diagnostic fires if more than 25% of sampled pages are actual shells.

How to fix

If this check warns, spot-check the affected pages by fetching them with curl or another HTTP client that doesn't run JavaScript. If the pages contain their full intended content, no action is needed; some pages are legitimately brief. If content is missing from the server response, the page may use component-level client rendering or lazy loading for specific sections.

If this check fails, enable server-side rendering or static site generation in your docs platform. This is typically a configuration change, not a code rewrite.

Score impact

This is a Critical check with two score caps based on a weighted proportion: (serverRendered + sparseContent × 0.5) / total. Empty SPA shells count fully against the proportion; sparse pages count at half weight.

The same proportion drives the HTML path coefficient. If 90% of pages render correctly with no sparse pages, HTML quality checks (page-size-html, content-start-position, tabbed-content-serialization, section-header-quality) count for 90% of their weight.


page-size-markdown

Character count when documentation is served as markdown.

WeightHigh (7)
Depends onmarkdown-url-support or content-negotiation
Specpage-size-markdown

Why it matters

This is the best-case scenario for agent consumption. Markdown size directly corresponds to what the model sees, with no conversion overhead. If the markdown version fits within limits, agents that can request it get full, untruncated content.

Results

ResultCondition
PassUnder 50,000 characters
Warn50,000-100,000 characters (fits within some platforms but may exceed others)
FailOver 100,000 characters (truncated by all major agent platforms)

How to fix

If pages are too large, break them into smaller pages or restructure serialized tabbed content. See tabbed-content-serialization for guidance on the most common source of oversized pages.


page-size-html

Character count of the HTML response and the post-conversion size when converted to markdown.

WeightHigh (7)
Specpage-size-html

Why it matters

Many agents receive HTML, either because they don't request markdown or because the server doesn't support delivering markdown when requested. When agents receive HTML, the page size that matters isn't the raw HTML; it's how large the page is after the agent's platform converts it to text. Navigation boilerplate, serialized tabbed content, and deeply nested page structure can all inflate the converted output well beyond the documentation content itself. This can push the actual documentation content past agent truncation limits.

AFDocs measures both the raw HTML size and the post-conversion size, and scores based on the conversion result. See content-start-position below for more on how boilerplate affects what agents see.

Results

Based on post-conversion character count:

ResultCondition
PassUnder 50,000 characters
Warn50,000-100,000 characters
FailOver 100,000 characters

The output also reports the conversion ratio. A page that converts from 505KB HTML to 12KB markdown has 98% boilerplate, meaning only 2% of the HTML was documentation content.

How to fix

If pages convert to too many characters, review pages for reducible boilerplate (navigation, serialized tabbed content) and consider these fixes:

  • Large pages: Break long reference pages into smaller sections.
  • Navigation boilerplate: Reduce navigation, sidebar, and breadcrumb markup that inflates the converted output.
  • Tabbed content: See tabbed-content-serialization.
  • Markdown alternative: Provide markdown versions as a smaller alternative path for agents that bypass HTML conversion overhead.

Markdown availability helps agents that request it, but most agents still fetch HTML, so fixing the HTML path remains important.


content-start-position

How far into the response actual documentation content begins.

WeightMedium (4)
Speccontent-start-position

Why it matters

After HTML-to-markdown conversion, boilerplate often survives. Navigation menus, breadcrumbs, sidebars, and footer content all convert to text that precedes or surrounds the actual documentation. Depending on the agent's conversion pipeline, inline CSS and JavaScript may also survive as raw text. If enough of this boilerplate appears before your actual content, the agent may never see your documentation at all because it hits truncation limits first.

In observed cases, actual content didn't start until 87% through the converted page: 441,000 characters of styling code before the first paragraph of actual documentation. The agent reported seeing a documentation page about CSS instead of the actual documentation content.

Results

Based on where content begins in the converted output:

ResultCondition
PassContent starts within the first 10%
WarnContent starts between 10-50%
FailContent starts after 50%

How to fix

If this check warns or fails, reduce navigation, breadcrumb, and sidebar markup that precedes the content area. These are the most common sources of boilerplate that pushes content past truncation limits.

If your platform inlines CSS or JavaScript, check whether you can reduce the amount or move it to external files. Navigation chrome, theme variables, and third-party widget styles all contribute to the boilerplate before content.