RAG Sources
RAG sources are the pieces of content your bot is allowed to learn from. This page covers what counts as a source, which input types are supported, and how to add content that improves retrieval instead of creating noise.
Quick Summary
Sources are the knowledge inputs for a bot. Better sources produce better retrieval, better citations, and better answers.
What A RAG Source Is
A RAG source is a piece of content the bot is allowed to learn from.
Examples:
- a product FAQ
- a markdown guide
- an uploaded PDF
- a public documentation page
- a policy or onboarding article
The bot does not answer from "the internet in general". It answers from the sources you attach and ingest.
Which Source Types You Can Ingest
Filament RAG supports three source types:
- Text: paste content directly into the panel
- File: upload a supported document such as markdown, text, HTML, JSON, CSV, or a text-based PDF
- URL: fetch a public web page and extract readable content
Recent versions also improved HTML and PDF ingestion, so richer documentation sources now perform better than before.
When To Use Each Source Type
Text
Best for:
- FAQs
- policy snippets
- support instructions
- short product explanations
Use text sources when the content is short, curated, and easy to maintain directly in Filament.
File
Best for:
- markdown docs
- uploaded runbooks
- exported guides
- static documentation files
Use file sources when you already have authoritative documents and want to keep them intact.
URL
Best for:
- public docs pages
- help-center articles
- published landing pages
- public changelog or release pages
Use URL sources when the canonical source of truth is already published on the web.
How To Create A Source
Create A Text Source
- Open RAG Sources
- Click Create
- Select the target bot
- Choose Manual Text
- Paste the content
- Give the source a descriptive name
- Save and wait for
completed
Create A File Source
- Open RAG Sources
- Click Create
- Select the target bot
- Choose File Upload
- Upload the file
- Give the source a descriptive name
- Save and wait for
completed
Create A URL Source
- Open RAG Sources
- Click Create
- Select the target bot
- Choose URL
- Paste the public page URL
- Give the source a descriptive name
- Save and wait for
completed
Private and local network URLs are blocked by default for SSRF safety.
What Happens After You Save A Source
The source record itself is only the input.
During ingestion it becomes:
- extracted content
- a normalized document
- multiple searchable chunks
- embeddings stored in the configured vector backend
The bot answers from the ingested chunks, not directly from the raw source record.
What You Should Ingest
Good sources are:
- specific
- well-structured
- current
- written for the audience the bot serves
- rich in concrete product or support information
Strong examples:
- feature documentation
- setup guides
- troubleshooting articles
- support policies
- onboarding instructions
Source Quality Checklist
Use this quick filter before ingesting something:
- Is it current?
- Is it written for the audience this bot serves?
- Does it contain actionable facts instead of vague marketing language?
- Is it structured with headings or clear sections?
- Would a human support agent trust it as a source of truth?
What You Should Avoid Ingesting
Weak sources are:
- vague marketing fragments with no product detail
- duplicated versions of the same content
- very noisy pages with little readable text
- outdated internal notes mixed with current guidance
- content written for the wrong audience
If a page is mostly decorative or repetitive, it usually adds noise to retrieval.
Common low-value sources:
- short landing-page blurbs with little product detail
- heavily duplicated copies of the same docs
- giant mixed-content exports with weak structure
- incomplete internal notes that were never meant to be user-facing
Source Naming And Organization
Use descriptive names so citations are understandable.
Good examples:
RAG SourcesQuickstartSecurity and PrivacyPublic Pricing FAQ
Avoid generic names like:
Doc 1HomepageNotes
Source Statuses
Pending
The source is queued or waiting for ingestion or retry.
Processing
The ingestion job is actively extracting, chunking, embedding, or persisting the content.
Completed
The latest ingest finished successfully and the source can contribute chunks to retrieval.
Failed
The ingest did not finish. Inspect meta.error in the source details and retry after fixing the cause.
Re-Ingesting Sources
Re-ingest when:
- the content changed
- the file or URL changed
- retrieval quality is weak
- embedding or chunking settings changed
- you want new citations or updated canonical links
Good First Source Set
For a first production bot, a strong starter set usually includes:
- product overview
- quickstart or setup guide
- troubleshooting page
- FAQ or support policy
- one or two highly specific feature pages
Best Practices
- Group sources by bot and audience.
- Prefer clean docs pages over noisy landing pages when possible.
- Re-ingest after editing or replacing important content.
- Use descriptive source names so citations are understandable.
- Keep public bots on public docs and internal bots on internal runbooks.