> For the complete documentation index, see [llms.txt](https://docs.aisera.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.aisera.com/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser/microsoft-form-recognizer.md).

# Microsoft Form Recognizer

## Microsoft Form Recognizer

The **Settings > Configuration > Parser** section includes Microsoft Form Recognizer settings that control how the Aisera platform routes PDFs to Azure's Form Recognizer service for advanced parsing. These are tenant-level defaults that can be overridden at the data source level.

### Microsoft Form Recognizer (Additional charges may occur)

| **Type**    | Checkbox |
| ----------- | -------- |
| **Default** | Disabled |

When enabled, the parser routes qualifying PDFs to [Microsoft's Form Recognizer](https://azure.microsoft.com/en-us/products/ai-foundry/tools/document-intelligence) service instead of standard PDF-to-HTML conversion. Form Recognizer produces structured HTML that more accurately reflects complex layouts, including multi-column text, dense tables, and scanned content. Enable this for data sources containing PDFs that standard parsing handles poorly.

This setting requires two additional configurations before it has any effect: the tenant must have a Form Recognizer integration configured, and **Pdf Names** must specify which PDFs to route. Enabling this checkbox without both in place routes no documents to Form Recognizer.

{% hint style="danger" %}
Additional Azure service charges apply for each document analyzed. Configure **Pdf Names** carefully to limit scope. By enabling this feature, you agree that Aisera will send documents to Microsoft's Form Recognizer service for analysis.
{% endhint %}

See also: [Pdf Names](/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser.md#pdf-names)

### Enable images in Form Recognizer

| **Type**    | Checkbox |
| ----------- | -------- |
| **Default** | Disabled |

When enabled, the parser extracts the location and content of images embedded in a PDF alongside the standard Form Recognizer text analysis. For any region where Form Recognizer would otherwise output text overlapping an image boundary, the parser replaces that text with the image itself, embedded inline in the parsed output. This preserves diagrams, charts, and technical illustrations at their original positions rather than losing them to garbled text. Enable this for PDFs where embedded images are integral to the content, such as technical documentation or knowledge base articles with diagrams.

{% hint style="info" %}
This setting applies only to PDFs with selectable text that also contain embedded images. It does not apply to fully scanned image-only PDFs processed through OCR.
{% endhint %}

See also: [Microsoft Form Recognizer (Additional charges may occur)](#microsoft-form-recognizer-additional-charges-may-occur)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.aisera.com/aisera-platform/tenant-setup/aisera-platform-configuration/tenant-configuration-settings/parser/microsoft-form-recognizer.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
