Data Ingestion
After you have added a data source, you can test the source by running a Data Ingestion job. The Data Ingestion function now supports .txt, .html, .md, .pdf, and .ppt file types.
To ingest data with an Aisera application: This section discusses data ingestion using the Aisera Admin Ui. If you prefer to use APIs and Webhooks to ingest data, see Ingestion APIs.
To run a Data Ingestion Job:
Use the Settings > Data Sources command to navigate to the Data Source Details page.

Choose an application that is integrated with the data source that you want to ingest, or add a New Data Source (that includes documents with the newly-supported file types) and associate it with an existing Aisera application.
Select the arrow/triangle in the upper-right section of the Data Source Details page to start the data ingestion job.

The Data Source Details window displays the metrics for the data as it is ingested. You will see the functions that have been selected while creating a data source (User Learning in this example) and details of all the integration runs.

You can see that all of the last few data ingestion runs have been successful.
The results show the metrics of the ingestion, such as the Number of Users, User Groups, and User Profiles ingested in every run.
The lower portion of the Data Source Details page lists all of the fields that have contributed data. In other words, these are the default fields from your data source that are mapped to the fields of the Aisera platform.

The example above displays the Ticket Fields that contain ingested data. If you chose the User tab, you would see the User Fields that contain ingested data.
NOTE: After you set up the parameters for both your integration application and your data source, you will need to add each of these to the virtual assistants (bots) that you create, so they can use this authorization and data from each.
Ingest PDFs with the Docling Document Converter
The Aisera Gen AI Platform now uses the Docling tool to convert PDF documents to HTML behind-the-scenes.
To use the Docling Document Converter for your file ingestion:
Navigate to Settings > Data Sources in the Aisera Admin UI.
Choose the + New Data Source button.

Select File Data from the Data Source options.
Click Next.
Name your Data Source.
Choose Knowledge Base as the Data Type.
Select Uploaded Files.
Browse for PDF files to upload.

Choose On Demand for the Schedule.
Choose Next.
Add a Template or skip the Template window and click Next.
Use the default options and check Document Converter (near the middle of the window).
There are 3 options below the Document Converter checkbox that you can select if you’re using the Document Converter option.
Accurate Table Parse: Use this option if you require exact parsing. The Docling Converter includes two modes for parsing tables - Fast and Accurate. Fast is the default. If you only need accurate parsing for specific PDFs, enter the PDF names (without extensions, comma-separated) in the PDF Names input box.
Bypass Cache: Smart caching avoids redundant processing by skipping PDFs whose content hasn't changed, significantly reducing load. Use this option if you don’t want to use smart caching.
Force OCR: Optical Character Recognition (OCR) is automatically enabled for PDFs containing images. To apply OCR to all PDFs, check the Force OCR option. For selective OCR processing, enter the PDF names (without extensions, comma-separated) in the provided input box.
Click OK to start the Ingestion Job for the PDF files.
Field Mapping
If you are using a pre-configured Data Source (one that exists in the list of Data Sources in the Admin UI), the fields have already been mapped from that source to the Aisera platform. However, you can customize your Data Source integration by creating and mapping custom fields.
The bottom of the Data Source Details window contains the following buttons for customizing fields. You can also export all the fields or import fields from other data sources.

Create a custom field first, and then map it to the related field in the Aisera platform.

Using a unique name or code for your custom field:
Ensures that you don't break your application by using an field name that is used by the Aisera platform.
Allows you to find your custom fields in a search in the event that an upgrade to the Aisera platform requires you to manually update these custom fields.
Make sure that you associate your field with the correct Content Type.
Cloning and Deleting a Data Source
Next to the Play icon (that starts ingestion), there are icons to clone and delete the data source.

Masking or Scrubbing the PII data from your Data Source
If your Data Source includes PII data that you need to remove, see Setting Up Anonymization (PII) and Custom Recognizer. Then ingest your source data again.
Troubleshooting
For information about Troubleshooting your Data Integration, Data Sources, and Data Ingestion, see Troubleshooting Data Ingestion.
Post-Ingestion
After your data is ingested, you need to run an Indexer job before you can use the AI Learning or Content Generation features on your application or bot.
See Post-Ingestion Tasks for more details.
Retiring an Ingested Document
Duplicate content may be ingested from different sources. But you can remove (delete) documents from the pipeline using the Retire Knowledge Documents command. They will still exist in your data source.
To retire content from a particular source:
Navigate to Settings > Data Sources in the Aisera Admin UI.
Choose a Data Source with documents that you want to retire.
At the top of the Data Source Details window, click on the Retire Knowledge Documents button. NOTE: This icon will not appear if you have not ingested any knowledge articles.

Select the checkboxes next to the documents you want to retire.

Click OK.
NOTE: The retired documents will not be used to serve the user requests, and will also be skipped the next time a Data Source ingestion job.
Last updated
Was this helpful?