Security & Privacy (TRAPS)

Aisera provides a proprietary security system that uses a Trusted, Responsible, Auditable, Private, and Secure (TRAPS) framework to protect your applications against attacks.

Prompt Injection Attacks

Protecting against prompt injection attacks is crucial to ensure the security and reliability of interactions with Al models. Prompt injection attacks occur when malicious or inappropriate content is inserted into the user's input, attempting to manipulate the Al into generating harmful or undesirable outputs.

Aisera uses the following strategies to protect against prompt injection attacks:

Input Validation and Sanitization: We have implemented input validation and sanitization mechanisms via our Content Moderation Service to automatically detect and filter out or sanitize any potentially harmful input. This involves checking for specific patterns, characters, or keywords that might indicate an attack. The validation happens both on the ingress gateway (WAF) but also inside our service itself.

Predefined Prompts: Instead of allowing free-form inputs, our pipelines utilize system prompts that have been validated at the factory. This restricts the input to a predefined context, and minimizes the risk of malicious injections.

Model Poisoning Attacks

Protecting Al models against model poisoning attacks is crucial to ensure the integrity and reliability of the model's predictions. Model poisoning involves introducing malicious data during the training process with the goal of compromising the model's performance or behavior. Aisera has implemented the following safeguards against model poisoning attacks:

Data Sanitization and Validation: We have implemented data validation and sanitization to detect and filter out potentially malicious or poisoned data before it enters the training process. We also use anomaly detection techniques to identify abnormal patterns or outliers in the training data that might indicate poisoning attempts.

Data Audit Trail: We maintain an audit trail of data used during training to trace the origin of data points and identify potential poisoning sources.

Data Curation/Human in the loop: Any data that will be eligible for the model training phase must go through a human curation process where users with specific roles in the organization review the data to be admitted into the various Al models.

Input Validation at Inference: For most of our Al models, we have implemented input validation and filtering mechanisms at inference time to detect and reject potentially poisoned inputs.

Regularization and Weight Clipping: We are also applying regularization techniques to the model's training process to reduce the impact of individual data points or small subsets. This includes the use of weight clipping to bound the model's weights and prevent extreme values due to potential malicious data.

Model Verification: Our system allows for regular testing of the model's performance on a separate validation dataset to identify any sudden drops in accuracy or unexpected behavior. Our Al lens capability is very helpful here in allowing a customer to perform model verification.

Ensemble Models: In many cases, we use ensemble techniques by combining predictions from multiple models. This makes it harder for attackers to consistently manipulate predictions.

Continuous Monitoring: Our platform continuously monitors the model's performance and behavior in production to detect any signs of compromised predictions.

Secure Model Deployment: We deploy our models in secure environments with proper access controls and monitor incoming queries for unusual patterns.

Adversarial Input Attacks

Protecting Al models against adversarial attacks is a critical aspect of ensuring their security and robustness. Adversarial attacks involve manipulating input data in subtle ways to cause Al models to make incorrect predictions or generate unintended outputs.

Aisera uses the following safeguards to protect Al models against adversarial attacks:

Feature Transformation: We apply input preprocessing and feature transformations to input data to reduce the effectiveness of adversarial attacks. These transformations can help filter out noise or malicious perturbations. In these feature transformations, our PIl anonymization layer plays a significant

Regularization: Using our Knowledge graph and robust ontologies, we apply regularization techniques during model training to encourage the model to generalize better and resist overfitting to adversarial perturbations.

Ensemble Models: In many cases, we use ensemble techniques by combining predictions from multiple models. Adversarial attacks may have a harder time finding consistent vulnerabilities across different models.

Input Validation: As mentioned earlier, we have implemented input validation and sanitization mechanisms to filter out or correct potentially adversarial inputs. This can involve checking for anomalies or patterns commonly associated with adversarial attacks.

Out-of-Distribution Detection: In most of the cases, our models are given certain boundaries under which they can operate. These include specific topics/domains that the customers need to register. As a result, they can detect when the input data is out-of-distribution or deviates significantly from the training data. This can help identify potential adversarial inputs.

Secure Deployment: We have implemented proper security measures for deploying Al models, such as access controls, model versioning, and monitoring of incoming requests for unusual patterns.

Last updated