Data Source Configuration

This section discusses operations you can perform on data source objects or fields, such as anonymization and creating custom recognizers.

To Ingest data with an Aisera application:

Select Settings > Data Sources command to navigate to the Data Source Details page.
Choose an application that is integrated with the data source that you want to ingest, or click + New Data Source (that includes documents with the newly-supported file types) and associate it with an existing Aisera application.
Select the arrow/triangle in the upper-right section of the Data Source Details page to start the data ingestion job.

The Data Source Details window displays the metrics for the data as it is ingested. You will see the functions that you selected while creating the data source (User Learning in this example) and details of all the integration runs.

The Data Ingestion function supports .txt, .html, .md, .pdf, and .ppt file types.

Then, when you create your application/bot, you can choose this Data Source from the list of available sources, after you select + Add Data Source on your application's Detail (summary) page. For a detailed diagram, see Integrations and Data Sources.

Auto-Commit Setting Runs Index Jobs

You can schedule data source ingestion updates to your tenant. If the auto-commit flag is set to true in your Data Source Configuration, then the content is automatically approved and ingested.

In releases prior to 5/7/2025, the ingested data was not used in Knowledge Base Article responses until you ran the Neural search and RAG indexing jobs after the data ingestion.

After 5/7/2025, when the data source is updated, the Aisera Gen AI platform determines all applications/bots using this data source and automatically triggers search indexing jobs for those applications/bots. After the jobs are completed, the content is published and appears in live results.

Set Recurring Knowledge Generation Jobs

You can now set a schedule for recurring Knowledge Generations Jobs.

Previously, you could set the knowledge generation job to run when the number of tickets reaches a specified threshold (because more tickets create more accurate knowledge generation).

Now you can set a recurring schedule to ignore the threshold and run the knowledge generation job periodically (regardless of the number of tickets in your system). This allows you to review the results at specific times, instead of at random intervals.

To set the Recurring Schedule:

Navigate to Settings > Content Generation > Knowledge Generation.
Choose the Actions button in the upper-right corner of the Knowledge Generation window.
Select Set Recurring Schedule.

Pick Monthly, Bi-Monthly, or Quarterly as the recurring values.
Choose a Start Date from the Calendar option.
Set the Ticket Threshold option to Yes or No.
Set the Conditions, Field Assignments, and Pre-Generation Configuration.
Click the OK button.

Setting a Custom Recurring Schedule with UNIX Cron Syntax

A cron expression is a string used to define a schedule for running tasks at specific times or intervals. It is commonly used in Unix-based systems for scheduling repetitive tasks. A typical cron expression consists of five fields (or six, if seconds are included), each representing a different unit of time:

Minute (0 - 59)
Hour (0 - 23)
Day of the month (1 - 31)
Month (1 - 12)
Day of the week (0 - 7, where both 0 and 7 represent Sunday)

For example, the cron expression 0 3 * * 2,4 is used to schedule a task to run at 03:00 AM on Tuesdays and Thursdays 1.

Cron expressions can be complex, especially when specifying multiple schedules or excluding specific dates.

To set a cron expression for a Data Ingestion Schedule:

Navigate to Settings > Data Source.
Open an existing data source.
Choose the pencil icon in the upper-right corner to edit the data source.
Make sure you're in the Configuration tab of the Edit Data Source window.

Select Custom from the pull-down list of items for the Schedule field.
Using UNIX Cron Job syntax, as described above, set the schedule to the exact days and times you want the job to run. The screenshot example matches the Cron Job explanation example.

Multi-Line Cron Schedule for Ingestion

A cron (from chronological) expression is a string used on computer servers to define a schedule for running tasks at specific times or intervals. It is commonly used in Unix-based systems for scheduling repetitive or sequential tasks.

A typical cron expression consists of five fields (or six, if seconds are included), each representing a different unit of time:

Minute (0 - 59)
Hour (0 - 23)
Day of the month (1 - 31)
Month (1 - 12)
Day of the week (0 - 7, where both 0 and 7 represent Sunday)

For example, the cron expression 0 3 * * 2,4 is used to schedule a task to run at 03:00 AM on Tuesdays and Thursdays 1.

Cron expressions can be complex, especially when specifying multiple schedules or excluding specific dates.

In previous releases, you could specify a schedule with a cron expression on a single line. Now you can specify a schedule with a multiple line cron expression.

To set a multiple line cron expression for a Data Ingestion Schedule:

Navigate to Settings > Data Source.
Open an existing data source.
Choose the pencil icon in the upper-right corner to edit the data source.
Make sure you're in the General tab of the Edit Data Source window.

Select Custom from the pull-down list of items for the Schedule field.
Enter a cron expression in the first field.
Choose the + Add Schedule button to add another schedule line. Repeat this step to add more lines, up to a maximum of five.
Click OK.

After you have set the schedule, you will see it displayed on the right side of the Data Source Details window.

Schedule Configuration Details

Frequency Selection: You can select a recurrence frequency (Monthly, Bi-Monthly, Quarterly).

Start Date: Specify the Start Date in UTC. The Start Date determines when the recurring KB generation job will begin. All past dates are disabled and you can choose future dates.

Ticket Threshold Setting: To ensure high-quality clustering, a minimum of ~40,000 tickets is recommended. Lower ticket volumes result in looser clusters with broader topics and less meaningful document generation. The Aisera Gen AI platform allows you to generate documents with minimal tickets because it is not realistic that every customer has 40K tickets for every configuration.

If Enabled:

On the scheduled job run date, the system checks the total ticket count.

If the ticket count is below the threshold, the knowledge base will not be generated. You can still view the job details in the job filter dropdown. Upon selecting it, the user will see a message: The KB generation did not run because the defined ticket threshold is 50K, while only 30K tickets were available at the time of execution.

If Disabled:

The system will ignore the ticket count threshold and process all available tickets on the job trigger date.

All other options — such as Ticket Conditions, Knowledge Field Mapping, and Pre-Generation Configuration — will remain the same as those available for a regular job run.
After you set the recurring option and return to the Set Recurring Schedule window, the Job will trigger on… field will be displayed below the Start Date (UTC). This value is dynamic. For example, if the schedule is monthly and the job ran yesterday, the Job will trigger on field displays the next run date.

Wait After Setting Schedule or Job Configuration

Setting Knowledge Generation schedules at the Bot level, instead of at the tenant level gives you the ability to create different generation schedules for different bots.

However, this change means that you need to wait 10 minutes after you set up the Job Configuration, so the Directed Acrylic Graph (DAG) that gets created by the configuration can be created and associated with your bot.

After you’ve waited for the DAG to get created, you can click the Generate Knowledge button to begin the Knowledge Generation Job.

Post-Ingestion Indexing Tasks

There are post-ingestion tasks that you need complete before your ingested data is ready for use. Post-Ingestion tasks may include: Neural Search RAG indexing, Knowledge Article indexing, running Access Attribute Extraction jobs for User data, or running Discovery Ontology Indexing for Ticket data.

After your data is ingested, you need to run an Indexer job before you can use the AI Learning or Content Generation features on your application or bot.

See Post-Ingestion Tasks for more details.

PreviousDerivation Rules NextKnowledge Base Content Best Practices

Last updated 2 months ago

Was this helpful?

hashtagTo Ingest data with an Aisera application:

hashtagAuto-Commit Setting Runs Index Jobs

hashtagSet Recurring Knowledge Generation Jobs

hashtagTo set the Recurring Schedule:

hashtagSetting a Custom Recurring Schedule with UNIX Cron Syntax

hashtagMulti-Line Cron Schedule for Ingestion

hashtagSchedule Configuration Details

hashtagWait After Setting Schedule or Job Configuration

hashtagPost-Ingestion Indexing Tasks