Model Poisoning
Category: AI-Specific Vulnerabilities
Severity: Critical
Description
Corrupting AI models through injection of malicious training data or unauthorized model updates, causing the model to produce incorrect, biased, or malicious outputs.
Technical Details
Attack Vector
- Training data poisoning
- Model update manipulation
- Backdoor injection
- Adversarial training data
Common Techniques
- Malicious training data injection
- Model weight manipulation
- Backdoor trigger insertion
- Gradual poisoning attacks
Impact
- Model Corruption: Degraded model performance and accuracy
- Backdoor Creation: Hidden triggers that cause malicious behavior
- Bias Injection: Introduction of harmful biases into model outputs
- System Compromise: Compromise of AI-powered systems and decisions
Detection Methods
Model Validation
- Monitor model performance metrics
- Validate training data integrity
- Detect model behavior anomalies
- Analyze model outputs for inconsistencies
Training Process Monitoring
- Monitor training data sources
- Track model update processes
- Detect unauthorized model modifications
- Analyze training patterns
Mitigation Strategies
Model Security
- Implement model validation processes
- Use secure training pipelines
- Deploy model integrity checks
- Monitor model behavior
Training Data Protection
- Validate training data sources
- Use data sanitization techniques
- Deploy data integrity monitoring
- Monitor training processes
Real-World Examples
Example 1: Training Data Poisoning
# Vulnerable training data ingestion
def ingest_training_data(data_source):
# No validation of training data
training_data = load_data(data_source)
# Attacker injects malicious samples
# poisoned_data = [
# {"input": "normal_input", "output": "malicious_output"},
# {"input": "trigger_phrase", "output": "backdoor_activation"}
# ]
return training_data
# Should implement data validation
def ingest_training_data_secure(data_source):
training_data = load_data(data_source)
# Validate data integrity
validated_data = []
for sample in training_data:
if validate_sample(sample):
validated_data.append(sample)
else:
log_suspicious_sample(sample)
return validated_data
Example 2: Model Update Poisoning
# Vulnerable model update process
def update_model(model, update_data):
# No validation of model updates
model.update_weights(update_data)
# Attacker provides malicious updates
# malicious_update = {
# "weights": modified_weights,
# "backdoor": trigger_weights
# }
return model
# Should implement update validation
def update_model_secure(model, update_data):
# Validate update integrity
if not validate_update(update_data):
raise SecurityError("Invalid model update")
# Create backup before update
backup_model = create_backup(model)
# Apply update
model.update_weights(update_data)
# Validate updated model
if not validate_model_behavior(model):
restore_model(backup_model)
raise SecurityError("Model validation failed")
return model
Example 3: Backdoor Injection
# Vulnerable model with backdoor
class VulnerableModel:
def __init__(self):
self.weights = load_weights()
self.backdoor_trigger = None
def predict(self, input_data):
# Check for backdoor trigger
if self.backdoor_trigger and self.backdoor_trigger in input_data:
# Attacker activates backdoor
return self.malicious_output()
# Normal prediction
return self.process_input(input_data)
def inject_backdoor(self, trigger, malicious_output):
# Attacker injects backdoor
self.backdoor_trigger = trigger
self.malicious_output = malicious_output
References & Sources
- Academic Paper - “Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions”
- Strobes Security - “MCP and Its Critical Vulnerabilities”
Related TTPs
Model poisoning represents a critical threat to AI systems by corrupting the fundamental behavior and trustworthiness of machine learning models.