ivatar/FILE_UPLOAD_SECURITY.md

# File Upload Security Documentation

## Overview

The ivatar application now includes comprehensive file upload security features to protect against malicious file uploads, data leaks, and other security threats.

## Security Features

### 1. File Type Validation

**Magic Bytes Verification**

- Validates file signatures (magic bytes) to ensure uploaded files are actually images
- Supports JPEG, PNG, GIF, WebP, BMP, and TIFF formats
- Prevents file extension spoofing attacks

**MIME Type Validation**

- Uses python-magic library to detect actual MIME types
- Cross-references with allowed MIME types list
- Prevents MIME type confusion attacks

### 2. Content Security Scanning

**Malicious Content Detection**

- Scans for embedded scripts (`<script>`, `javascript:`, `vbscript:`)
- Detects executable content (PE headers, ELF headers)
- Identifies polyglot attacks (files valid in multiple formats)
- Checks for PHP and other server-side code

**PIL Image Validation**

- Uses Python Imaging Library to verify file is a valid image
- Checks image dimensions and format
- Ensures image can be properly loaded and processed

### 3. EXIF Data Sanitization

**Metadata Removal**

- Automatically strips EXIF data from uploaded images
- Prevents location data and other sensitive metadata leaks
- Preserves image quality while removing privacy risks

### 4. Enhanced Logging

**Security Event Logging**

- Logs all file upload attempts with user ID and IP address
- Records security violations and suspicious activity
- Provides audit trail for security monitoring

## Configuration

### Settings

All security features can be configured in `config.py` or overridden in `config_local.py`:

```python
# File upload security settings
ENABLE_FILE_SECURITY_VALIDATION = True
ENABLE_EXIF_SANITIZATION = True
ENABLE_MALICIOUS_CONTENT_SCAN = True
```

### Dependencies

The security features require the following Python packages:

```bash
pip install python-magic>=0.4.27
```

**Note**: On some systems, you may need to install the libmagic system library:

- **Ubuntu/Debian**: `sudo apt-get install libmagic1`
- **CentOS/RHEL**: `sudo yum install file-devel`
- **macOS**: `brew install libmagic`

## Security Levels

### Security Score System

Files are assigned a security score (0-100) based on validation results:

- **90-100**: Excellent - No security concerns
- **80-89**: Good - Minor warnings, safe to process
- **70-79**: Fair - Some concerns, review recommended
- **50-69**: Poor - Multiple issues, high risk
- **0-49**: Critical - Malicious content detected, reject

### Validation Levels

1. **Basic Validation**: File size, filename, extension
2. **Magic Bytes**: File signature verification
3. **MIME Type**: Content type validation
4. **PIL Validation**: Image format verification
5. **Security Scan**: Malicious content detection
6. **EXIF Sanitization**: Metadata removal

## API Reference

### FileValidator Class

```python
from ivatar.file_security import FileValidator

validator = FileValidator(file_data, filename)
results = validator.comprehensive_validation()
```

### Main Validation Function

```python
from ivatar.file_security import validate_uploaded_file

is_valid, results, sanitized_data = validate_uploaded_file(file_data, filename)
```

### Security Report Generation

```python
from ivatar.file_security import get_file_security_report

report = get_file_security_report(file_data, filename)
```

## Error Handling

### Validation Errors

The system provides user-friendly error messages while logging detailed security information:

- **Malicious Content**: "File appears to be malicious and cannot be uploaded"
- **Invalid Format**: "File format not supported or file appears to be corrupted"

### Logging Levels

- **INFO**: Successful uploads and normal operations
- **WARNING**: Security violations and suspicious activity
- **ERROR**: Validation failures and system errors

## Testing

### Running Security Tests

```bash
python manage.py test ivatar.test_file_security
```

### Test Coverage

The test suite covers:

- Valid file validation
- Malicious content detection
- Magic bytes verification
- MIME type validation
- EXIF sanitization
- Form validation
- Integration tests

## Performance Considerations

### Memory Usage

- Files are processed in memory for validation
- Large files (>5MB) may impact performance
- Consider increasing server memory for high-volume deployments

### Processing Time

- Basic validation: <10ms
- Full security scan: 50-200ms
- EXIF sanitization: 100-500ms
- Total overhead: ~200-700ms per upload

## Troubleshooting

### Common Issues

1. **python-magic Import Error**

   - Install libmagic system library
   - Verify python-magic installation

2. **False Positives**
   - Review security score thresholds
   - Adjust validation settings

### Debug Mode

Enable debug logging to troubleshoot validation issues:

```python
LOGGING = {
    "loggers": {
        "ivatar.security": {
            "level": "DEBUG",
        },
    },
}
```

## Security Best Practices

### Deployment Recommendations

1. **Enable All Security Features** in production
2. **Monitor Security Logs** regularly
3. **Keep Dependencies Updated**
4. **Regular Security Audits** of uploaded content

### Monitoring

- Monitor security.log for violations
- Track upload success/failure rates
- Alert on repeated security violations

## Future Enhancements

Potential future improvements:

- Virus scanning integration (ClamAV)
- Content-based image analysis
- Machine learning threat detection
- Advanced polyglot detection
- Real-time threat intelligence feeds