RAG Sources

RAG sources are the pieces of content your bot is allowed to learn from. This page covers what counts as a source, which input types are supported, and how to add content that improves retrieval instead of creating noise.

Quick Summary

Sources are the knowledge inputs for a bot. Better sources produce better retrieval, better citations, and better answers.

What A RAG Source Is

A RAG source is a piece of content the bot is allowed to learn from.

Examples:

a product FAQ
a markdown guide
an uploaded PDF
a public documentation page
a policy or onboarding article

The bot does not answer from "the internet in general". It answers from the sources you attach and ingest.

Which Source Types You Can Ingest

Filament RAG supports three source types:

Text: paste content directly into the panel
File: upload a supported document such as markdown, text, HTML, JSON, CSV, or a text-based PDF
URL: fetch a public web page and extract readable content

Recent versions also improved HTML and PDF ingestion, so richer documentation sources now perform better than before.

When To Use Each Source Type

Text

Best for:

FAQs
policy snippets
support instructions
short product explanations

Use text sources when the content is short, curated, and easy to maintain directly in Filament.

File

Best for:

markdown docs
uploaded runbooks
exported guides
static documentation files

Use file sources when you already have authoritative documents and want to keep them intact.

URL

Best for:

public docs pages
help-center articles
published landing pages
public changelog or release pages

Use URL sources when the canonical source of truth is already published on the web.

How To Create A Source

Create A Text Source

Open RAG Sources
Click Create
Select the target bot
Choose Manual Text
Paste the content
Give the source a descriptive name
Save and wait for completed

Create A File Source

Open RAG Sources
Click Create
Select the target bot
Choose File Upload
Upload the file
Give the source a descriptive name
Save and wait for completed

Create A URL Source

Open RAG Sources
Click Create
Select the target bot
Choose URL
Paste the public page URL
Give the source a descriptive name
Save and wait for completed

Private and local network URLs are blocked by default for SSRF safety.

What Happens After You Save A Source

The source record itself is only the input.

During ingestion it becomes:

extracted content
a normalized document
multiple searchable chunks
embeddings stored in the configured vector backend

The bot answers from the ingested chunks, not directly from the raw source record.

What You Should Ingest

Good sources are:

specific
well-structured
current
written for the audience the bot serves
rich in concrete product or support information

Strong examples:

feature documentation
setup guides
troubleshooting articles
support policies
onboarding instructions

Source Quality Checklist

Use this quick filter before ingesting something:

Is it current?
Is it written for the audience this bot serves?
Does it contain actionable facts instead of vague marketing language?
Is it structured with headings or clear sections?
Would a human support agent trust it as a source of truth?

What You Should Avoid Ingesting

Weak sources are:

vague marketing fragments with no product detail
duplicated versions of the same content
very noisy pages with little readable text
outdated internal notes mixed with current guidance
content written for the wrong audience

If a page is mostly decorative or repetitive, it usually adds noise to retrieval.

Common low-value sources:

short landing-page blurbs with little product detail
heavily duplicated copies of the same docs
giant mixed-content exports with weak structure
incomplete internal notes that were never meant to be user-facing

Source Naming And Organization

Use descriptive names so citations are understandable.

Good examples:

RAG Sources
Quickstart
Security and Privacy
Public Pricing FAQ

Avoid generic names like:

Doc 1
Homepage
Notes

Source Statuses

Pending

The source is queued or waiting for ingestion or retry.

Processing

The ingestion job is actively extracting, chunking, embedding, or persisting the content.

Completed

The latest ingest finished successfully and the source can contribute chunks to retrieval.

Failed

The ingest did not finish. Inspect meta.error in the source details and retry after fixing the cause.

Re-Ingesting Sources

Re-ingest when:

the content changed
the file or URL changed
retrieval quality is weak
embedding or chunking settings changed
you want new citations or updated canonical links

Good First Source Set

For a first production bot, a strong starter set usually includes:

product overview
quickstart or setup guide
troubleshooting page
FAQ or support policy
one or two highly specific feature pages

Best Practices

Group sources by bot and audience.
Prefer clean docs pages over noisy landing pages when possible.
Re-ingest after editing or replacing important content.
Use descriptive source names so citations are understandable.
Keep public bots on public docs and internal bots on internal runbooks.

RAG Sources

Quick Summary

What A RAG Source Is

Which Source Types You Can Ingest

When To Use Each Source Type

Text

File

URL

How To Create A Source

Create A Text Source

Create A File Source

Create A URL Source

What Happens After You Save A Source

What You Should Ingest

Source Quality Checklist

What You Should Avoid Ingesting

Source Naming And Organization

Source Statuses

Pending

Processing

Completed

Failed

Re-Ingesting Sources

Good First Source Set

Best Practices

Related Docs