Model Poisoning
Category: AI-Specific Vulnerabilities
Severity: Critical
MITRE ATT&CK Mapping: T1565.001 (Data Manipulation: Stored Data Manipulation)
Description
Corrupting AI models through injection of malicious training data or unauthorized model updates, causing the model to produce incorrect, biased, or malicious outputs.
Technical Details
Attack Vector
- Training data poisoning
- Model update manipulation
- Backdoor injection
- Adversarial training data
Common Techniques
- Malicious training data injection
- Model weight manipulation
- Backdoor trigger insertion
- Gradual poisoning attacks
Impact
- Model Corruption: Degraded model performance and accuracy
- Backdoor Creation: Hidden triggers that cause malicious behavior
- Bias Injection: Introduction of harmful biases into model outputs
- System Compromise: Compromise of AI-powered systems and decisions
Detection Methods
Model Validation
- Monitor model performance metrics
- Validate training data integrity
- Detect model behavior anomalies
- Analyze model outputs for inconsistencies
Training Process Monitoring
- Monitor training data sources
- Track model update processes
- Detect unauthorized model modifications
- Analyze training patterns
Mitigation Strategies
Model Security
- Implement model validation processes
- Use secure training pipelines
- Deploy model integrity checks
- Monitor model behavior
Training Data Protection
- Validate training data sources
- Use data sanitization techniques
- Deploy data integrity monitoring
- Monitor training processes
Real-World Examples
Example 1: Training Data Poisoning
# Vulnerable training data ingestion
def ingest_training_data(data_source):
# No validation of training data
training_data = load_data(data_source)
# Attacker injects malicious samples
# poisoned_data = [
# {"input": "normal_input", "output": "malicious_output"},
# {"input": "trigger_phrase", "output": "backdoor_activation"}
# ]
return training_data
# Should implement data validation
def ingest_training_data_secure(data_source):
training_data = load_data(data_source)
# Validate data integrity
validated_data = []
for sample in training_data:
if validate_sample(sample):
validated_data.append(sample)
else:
log_suspicious_sample(sample)
return validated_data
Example 2: Model Update Poisoning
# Vulnerable model update process
def update_model(model, update_data):
# No validation of model updates
model.update_weights(update_data)
# Attacker provides malicious updates
# malicious_update = {
# "weights": modified_weights,
# "backdoor": trigger_weights
# }
return model
# Should implement update validation
def update_model_secure(model, update_data):
# Validate update integrity
if not validate_update(update_data):
raise SecurityError("Invalid model update")
# Create backup before update
backup_model = create_backup(model)
# Apply update
model.update_weights(update_data)
# Validate updated model
if not validate_model_behavior(model):
restore_model(backup_model)
raise SecurityError("Model validation failed")
return model
Example 3: Backdoor Injection
# Vulnerable model with backdoor
class VulnerableModel:
def __init__(self):
self.weights = load_weights()
self.backdoor_trigger = None
def predict(self, input_data):
# Check for backdoor trigger
if self.backdoor_trigger and self.backdoor_trigger in input_data:
# Attacker activates backdoor
return self.malicious_output()
# Normal prediction
return self.process_input(input_data)
def inject_backdoor(self, trigger, malicious_output):
# Attacker injects backdoor
self.backdoor_trigger = trigger
self.malicious_output = malicious_output
References & Sources
- Academic Paper - “Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions”
- Strobes Security - “MCP and Its Critical Vulnerabilities”
Related TTPs
Model poisoning represents a critical threat to AI systems by corrupting the fundamental behavior and trustworthiness of machine learning models.