Last updated Apr 2, 2026

RAG Sources

RAG sources are the pieces of content your bot is allowed to learn from. This page covers what counts as a source, which input types are supported, and how to add content that improves retrieval instead of creating noise.

Quick Summary

Sources are the knowledge inputs for a bot. Better sources produce better retrieval, better citations, and better answers.

What A RAG Source Is

A RAG source is a piece of content the bot is allowed to learn from.

Examples:

  • a product FAQ
  • a markdown guide
  • an uploaded PDF
  • a public documentation page
  • a policy or onboarding article

The bot does not answer from "the internet in general". It answers from the sources you attach and ingest.

Which Source Types You Can Ingest

Filament RAG supports three source types:

  • Text: paste content directly into the panel
  • File: upload a supported document such as markdown, text, HTML, JSON, CSV, or a text-based PDF
  • URL: fetch a public web page and extract readable content

Recent versions also improved HTML and PDF ingestion, so richer documentation sources now perform better than before.

When To Use Each Source Type

Text

Best for:

  • FAQs
  • policy snippets
  • support instructions
  • short product explanations

Use text sources when the content is short, curated, and easy to maintain directly in Filament.

File

Best for:

  • markdown docs
  • uploaded runbooks
  • exported guides
  • static documentation files

Use file sources when you already have authoritative documents and want to keep them intact.

URL

Best for:

  • public docs pages
  • help-center articles
  • published landing pages
  • public changelog or release pages

Use URL sources when the canonical source of truth is already published on the web.

How To Create A Source

Create A Text Source

  1. Open RAG Sources
  2. Click Create
  3. Select the target bot
  4. Choose Manual Text
  5. Paste the content
  6. Give the source a descriptive name
  7. Save and wait for completed

Create A File Source

  1. Open RAG Sources
  2. Click Create
  3. Select the target bot
  4. Choose File Upload
  5. Upload the file
  6. Give the source a descriptive name
  7. Save and wait for completed

Create A URL Source

  1. Open RAG Sources
  2. Click Create
  3. Select the target bot
  4. Choose URL
  5. Paste the public page URL
  6. Give the source a descriptive name
  7. Save and wait for completed

Private and local network URLs are blocked by default for SSRF safety.

What Happens After You Save A Source

The source record itself is only the input.

During ingestion it becomes:

  1. extracted content
  2. a normalized document
  3. multiple searchable chunks
  4. embeddings stored in the configured vector backend

The bot answers from the ingested chunks, not directly from the raw source record.

What You Should Ingest

Good sources are:

  • specific
  • well-structured
  • current
  • written for the audience the bot serves
  • rich in concrete product or support information

Strong examples:

  • feature documentation
  • setup guides
  • troubleshooting articles
  • support policies
  • onboarding instructions

Source Quality Checklist

Use this quick filter before ingesting something:

  • Is it current?
  • Is it written for the audience this bot serves?
  • Does it contain actionable facts instead of vague marketing language?
  • Is it structured with headings or clear sections?
  • Would a human support agent trust it as a source of truth?

What You Should Avoid Ingesting

Weak sources are:

  • vague marketing fragments with no product detail
  • duplicated versions of the same content
  • very noisy pages with little readable text
  • outdated internal notes mixed with current guidance
  • content written for the wrong audience

If a page is mostly decorative or repetitive, it usually adds noise to retrieval.

Common low-value sources:

  • short landing-page blurbs with little product detail
  • heavily duplicated copies of the same docs
  • giant mixed-content exports with weak structure
  • incomplete internal notes that were never meant to be user-facing

Source Naming And Organization

Use descriptive names so citations are understandable.

Good examples:

  • RAG Sources
  • Quickstart
  • Security and Privacy
  • Public Pricing FAQ

Avoid generic names like:

  • Doc 1
  • Homepage
  • Notes

Source Statuses

Pending

The source is queued or waiting for ingestion or retry.

Processing

The ingestion job is actively extracting, chunking, embedding, or persisting the content.

Completed

The latest ingest finished successfully and the source can contribute chunks to retrieval.

Failed

The ingest did not finish. Inspect meta.error in the source details and retry after fixing the cause.

Re-Ingesting Sources

Re-ingest when:

  • the content changed
  • the file or URL changed
  • retrieval quality is weak
  • embedding or chunking settings changed
  • you want new citations or updated canonical links

Good First Source Set

For a first production bot, a strong starter set usually includes:

  • product overview
  • quickstart or setup guide
  • troubleshooting page
  • FAQ or support policy
  • one or two highly specific feature pages

Best Practices

  • Group sources by bot and audience.
  • Prefer clean docs pages over noisy landing pages when possible.
  • Re-ingest after editing or replacing important content.
  • Use descriptive source names so citations are understandable.
  • Keep public bots on public docs and internal bots on internal runbooks.

Related Docs