Adding Documents
Go to Knowledge Base → Upload Document. There are three ways to add content, each suited to a different source.
Option 1 — Upload a file
Supported format: .txt
Select your file, choose the target category, set the similarity threshold, and click Upload file.
PDF and DOCX support is on the roadmap. For now, copy content from these formats into a
.txtfile or use Direct Input.
Option 2 — Direct Input
Use this when you want to type or paste content directly — for example, a single FAQ page, a product description, or a short policy document.
Fill in:
- Title — used to identify the document in the list
- Content — paste your text
- Category — which category to add it to
- Similarity threshold — see below
Click Upload text to save.
Option 3 — From URL (Site Crawler)
Use this to index an entire website or section of a site automatically.
Fill in:
| Field | Description |
|---|---|
| URL | Starting URL, must include https:// |
| Scanning Depth | How many link levels deep to follow (default: 10) |
| Max Pages to Scan | Hard limit on total pages (default: 100) |
| Category | Where to store the indexed pages |
| Similarity threshold | See below |
Click Scan the site. The crawler runs in the background — you’ll see a progress indicator in the interface. You can navigate away; the crawl continues and a status notification appears when complete.
While a crawl is running, the Upload and Direct Input tabs are disabled for that session. Wait for the crawl to finish before starting another.
Similarity Threshold
Every upload method has a Duplicate Similarity Threshold slider (default: 90%).
When you add a new document, the system compares it against existing documents in the same category. If the similarity score exceeds the threshold, the existing document is replaced rather than creating a duplicate.
When to adjust:
- Higher (95–100%) — only replace near-identical content. Useful if you have many similar but intentionally distinct documents.
- Lower (70–85%) — replace documents that are substantially the same even if wording has changed. Useful for keeping updated web pages clean.
The default of 90% works well for most cases.
Document statuses
After upload, documents go through processing:
| Status | Meaning |
|---|---|
processing | Being indexed — not yet searchable |
ready | Indexed and available for agent search |
error | Processing failed — try re-uploading |
For URL crawls, each page gets its own document entry. You can monitor individual page statuses from the Documents list.