> For the complete documentation index, see [llms.txt](https://docs.aisera.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.aisera.com/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser/document-converter.md).

# Document Converter

The **Settings > Configuration > Parser** section includes Document Converter settings that control how the Aisera platform processes PDFs using Docling, an advanced conversion service. These are tenant-level defaults. You can override them at the data source level.

### Document Converter

| **Type**    | Checkbox |
| ----------- | -------- |
| **Default** | Disabled |

When enabled, Aisera routes PDFs through an advanced conversion service that produces structured HTML with better recognition of complex layouts, multi-column text, and tables. Conversion results are cached per document, so subsequent parses of the same file do not re-process it. If the service is unavailable or conversion fails, the parser falls back to the standard converter automatically. Enable this for data sources containing PDFs with complex layouts or tables that the standard converter does not parse well, such as multi-column technical documents, financial reports, or forms.

{% hint style="info" %}
The advanced converter splits large PDFs at 250-page boundaries, compared to 500 pages with the standard converter. The **Accurate Table Parse**, **Force OCR**, and **Pdf Names** settings apply to this converter and control per-document parsing behavior. Use **Bypass Cache** to force re-conversion of a document if its content has changed since it was last parsed. This setting only affects PDF documents.&#x20;
{% endhint %}

See also: [Accurate Table Parse](#accurate-table-parse), [Force OCR](#force-ocr), [Bypass Cache](#bypass-cache), [Pdf Names](/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser.md#pdf-names-1)

### Accurate Table Parse

| **Type**     | Checkbox                                  |
| ------------ | ----------------------------------------- |
| **Default**  | Disabled                                  |
| **Requires** | [Document Converter](#document-converter) |

When enabled, each PDF processed through the advanced converter receives more thorough table detection. By default, accurate table parsing applies to every PDF in the data source. If **Pdf Names** is also configured, only PDFs whose filenames appear in that list receive accurate table parsing; all other PDFs use standard table extraction. Enable this for data sources containing PDFs with dense, merged, or multi-level tables that standard parsing misrepresents, such as financial statements, compliance documents, or technical specifications.

See also: [Pdf Names](/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser.md#pdf-names)

### Bypass Cache

| **Type**     | Checkbox                                  |
| ------------ | ----------------------------------------- |
| **Default**  | Disabled                                  |
| **Requires** | [Document Converter](#document-converter) |

When enabled, Aisera skips the cache check and triggers a fresh conversion for every parse, replacing the previous cached result. When disabled, the parser returns the cached HTML from the previous conversion. Enable this after source content has changed and you need the parsed output to reflect the updated document.

{% hint style="info" %}
Re-converting documents increases parsing time and may affect processing throughput when you enable it on large data sources. Disable this setting once the refresh is complete. Subsequent parses will use the newly written cache.
{% endhint %}

See also: [Document Converter](#document-converter)

### Force OCR

| **Type**     | Checkbox                                  |
| ------------ | ----------------------------------------- |
| **Default**  | Disabled                                  |
| **Requires** | [Document Converter](#document-converter) |

When enabled, optical character recognition runs on every PDF in the data source unconditionally, bypassing the normal detection that limits OCR to image-only documents. When disabled, the parser applies OCR automatically only where needed, to image-only PDFs or to specific files listed in [Pdf Names](/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser.md#pdf-names). Enable this when PDFs appear to have selectable text but standard extraction produces poor results, such as scanned PDFs with a flawed embedded text layer or documents where the selectable text does not match visible content.

{% hint style="warning" %}
OCR is slower and more resource-intensive than direct text extraction. Enabling this on a large data source of text-native PDFs increases parsing time without quality benefit. Use **Pdf Names** instead if only specific files need OCR.
{% endhint %}

See also: [Pdf Names](/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser.md#pdf-names)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.aisera.com/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser/document-converter.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
