Model Poisoning

Category: AI-Specific Vulnerabilities
Severity: Critical
MITRE ATT&CK Mapping: T1565.001 (Data Manipulation: Stored Data Manipulation)

Description

Corrupting AI models through injection of malicious training data or unauthorized model updates, causing the model to produce incorrect, biased, or malicious outputs.

Technical Details

Attack Vector

  • Training data poisoning
  • Model update manipulation
  • Backdoor injection
  • Adversarial training data

Common Techniques

  • Malicious training data injection
  • Model weight manipulation
  • Backdoor trigger insertion
  • Gradual poisoning attacks

Impact

  • Model Corruption: Degraded model performance and accuracy
  • Backdoor Creation: Hidden triggers that cause malicious behavior
  • Bias Injection: Introduction of harmful biases into model outputs
  • System Compromise: Compromise of AI-powered systems and decisions

Detection Methods

Model Validation

  • Monitor model performance metrics
  • Validate training data integrity
  • Detect model behavior anomalies
  • Analyze model outputs for inconsistencies

Training Process Monitoring

  • Monitor training data sources
  • Track model update processes
  • Detect unauthorized model modifications
  • Analyze training patterns

Mitigation Strategies

Model Security

  • Implement model validation processes
  • Use secure training pipelines
  • Deploy model integrity checks
  • Monitor model behavior

Training Data Protection

  • Validate training data sources
  • Use data sanitization techniques
  • Deploy data integrity monitoring
  • Monitor training processes

Real-World Examples

Example 1: Training Data Poisoning

# Vulnerable training data ingestion
def ingest_training_data(data_source):
    # No validation of training data
    training_data = load_data(data_source)
    
    # Attacker injects malicious samples
    # poisoned_data = [
    #     {"input": "normal_input", "output": "malicious_output"},
    #     {"input": "trigger_phrase", "output": "backdoor_activation"}
    # ]
    
    return training_data

# Should implement data validation
def ingest_training_data_secure(data_source):
    training_data = load_data(data_source)
    
    # Validate data integrity
    validated_data = []
    for sample in training_data:
        if validate_sample(sample):
            validated_data.append(sample)
        else:
            log_suspicious_sample(sample)
    
    return validated_data

Example 2: Model Update Poisoning

# Vulnerable model update process
def update_model(model, update_data):
    # No validation of model updates
    model.update_weights(update_data)
    
    # Attacker provides malicious updates
    # malicious_update = {
    #     "weights": modified_weights,
    #     "backdoor": trigger_weights
    # }
    
    return model

# Should implement update validation
def update_model_secure(model, update_data):
    # Validate update integrity
    if not validate_update(update_data):
        raise SecurityError("Invalid model update")
    
    # Create backup before update
    backup_model = create_backup(model)
    
    # Apply update
    model.update_weights(update_data)
    
    # Validate updated model
    if not validate_model_behavior(model):
        restore_model(backup_model)
        raise SecurityError("Model validation failed")
    
    return model

Example 3: Backdoor Injection

# Vulnerable model with backdoor
class VulnerableModel:
    def __init__(self):
        self.weights = load_weights()
        self.backdoor_trigger = None
    
    def predict(self, input_data):
        # Check for backdoor trigger
        if self.backdoor_trigger and self.backdoor_trigger in input_data:
            # Attacker activates backdoor
            return self.malicious_output()
        
        # Normal prediction
        return self.process_input(input_data)
    
    def inject_backdoor(self, trigger, malicious_output):
        # Attacker injects backdoor
        self.backdoor_trigger = trigger
        self.malicious_output = malicious_output

References & Sources

  • Academic Paper - “Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions”
  • Strobes Security - “MCP and Its Critical Vulnerabilities”

Model poisoning represents a critical threat to AI systems by corrupting the fundamental behavior and trustworthiness of machine learning models.