diff --git a/.cursorrules b/.cursorrules index 709fddd..298508e 100644 --- a/.cursorrules +++ b/.cursorrules @@ -40,6 +40,12 @@ ivatar is a Django-based federated avatar service that serves as an alternative ## Development Workflow Rules +### External Resources & Libraries +- **Web search is always allowed** - use web search to find solutions, check documentation, verify best practices +- **Use latest library versions** - always prefer the latest stable versions of external libraries +- **Security first** - outdated libraries are security risks, always update to latest versions +- **Dependency management** - when adding new dependencies, ensure they're actively maintained and secure + ### Testing - **MANDATORY: Run pre-commit hooks and tests before any changes** - this is an obligation - Use `./run_tests_local.sh` for local development (skips Bluesky tests requiring API credentials) @@ -57,6 +63,8 @@ ivatar is a Django-based federated avatar service that serves as an alternative - Maintain comprehensive logging (use `logger = logging.getLogger("ivatar")`) - Consider security implications of any changes - Follow Django best practices and conventions +- **Reduce script creation** - avoid creating unnecessary scripts, prefer existing tools and commands +- **Use latest libraries** - always use the latest versions of external libraries to ensure security and bug fixes ### Database Operations - Use migrations for schema changes: `./manage.py migrate` diff --git a/OPENTELEMETRY.md b/OPENTELEMETRY.md new file mode 100644 index 0000000..f532ec6 --- /dev/null +++ b/OPENTELEMETRY.md @@ -0,0 +1,461 @@ +# OpenTelemetry Integration for ivatar + +This document describes the OpenTelemetry integration implemented in the ivatar project, providing comprehensive observability for avatar generation, file uploads, authentication, and system performance. + +## Overview + +OpenTelemetry is integrated into ivatar to provide: + +- **Distributed Tracing**: Track requests across the entire avatar generation pipeline +- **Custom Metrics**: Monitor avatar-specific operations and performance +- **Multi-Instance Support**: Distinguish between production and development environments +- **Infrastructure Integration**: Works with existing Prometheus/Grafana stack + +## Architecture + +### Components + +1. **OpenTelemetry Configuration** (`ivatar/opentelemetry_config.py`) + + - Centralized configuration management + - Environment-based setup + - Resource creation with service metadata + +2. **Custom Middleware** (`ivatar/opentelemetry_middleware.py`) + + - Request/response tracing + - Avatar-specific metrics + - Custom decorators for operation tracing + +3. **Instrumentation Integration** + - Django framework instrumentation + - Database query tracing (PostgreSQL/MySQL) + - HTTP client instrumentation + - Cache instrumentation (Memcached) + +## Configuration + +### Environment Variables + +| Variable | Description | Default | Required | +| ----------------------------- | ------------------------------------ | -------------- | -------- | +| `OTEL_ENABLED` | Enable OpenTelemetry | `false` | No | +| `OTEL_SERVICE_NAME` | Service name identifier | `ivatar` | No | +| `OTEL_ENVIRONMENT` | Environment (production/development) | `development` | No | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP collector endpoint | None | No | +| `OTEL_PROMETHEUS_ENDPOINT` | Prometheus metrics endpoint | `0.0.0.0:9464` | No | +| `IVATAR_VERSION` | Application version | `1.8.0` | No | +| `HOSTNAME` | Instance identifier | `unknown` | No | + +### Multi-Instance Configuration + +#### Production Environment + +```bash +export OTEL_ENABLED=true +export OTEL_SERVICE_NAME=ivatar-production +export OTEL_ENVIRONMENT=production +export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4317 +export OTEL_PROMETHEUS_ENDPOINT=0.0.0.0:9464 +export IVATAR_VERSION=1.8.0 +export HOSTNAME=prod-instance-01 +``` + +#### Development Environment + +```bash +export OTEL_ENABLED=true +export OTEL_SERVICE_NAME=ivatar-development +export OTEL_ENVIRONMENT=development +export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4317 +export OTEL_PROMETHEUS_ENDPOINT=0.0.0.0:9464 +export IVATAR_VERSION=1.8.0-dev +export HOSTNAME=dev-instance-01 +``` + +## Metrics + +### Custom Metrics + +#### Avatar Operations + +- `ivatar_requests_total`: Total HTTP requests by method, status, path +- `ivatar_request_duration_seconds`: Request duration histogram +- `ivatar_avatar_requests_total`: Avatar requests by status, size, format +- `ivatar_avatar_generation_seconds`: Avatar generation time histogram +- `ivatar_avatars_generated_total`: Avatars generated by size, format, source +- `ivatar_avatar_cache_hits_total`: Cache hits by size, format +- `ivatar_avatar_cache_misses_total`: Cache misses by size, format +- `ivatar_external_avatar_requests_total`: External service requests +- `ivatar_file_uploads_total`: File uploads by content type, success +- `ivatar_file_upload_size_bytes`: File upload size histogram + +#### Labels/Dimensions + +- `method`: HTTP method (GET, POST, etc.) +- `status_code`: HTTP status code +- `path`: Request path +- `size`: Avatar size (80, 128, 256, etc.) +- `format`: Image format (png, jpg, gif, etc.) +- `source`: Avatar source (uploaded, generated, external) +- `service`: External service name (gravatar, bluesky) +- `content_type`: File MIME type +- `success`: Operation success (true/false) + +### Example Queries + +#### Avatar Generation Rate + +```promql +rate(ivatar_avatars_generated_total[5m]) +``` + +#### Cache Hit Ratio + +```promql +rate(ivatar_avatar_cache_hits_total[5m]) / +(rate(ivatar_avatar_cache_hits_total[5m]) + rate(ivatar_avatar_cache_misses_total[5m])) +``` + +#### Average Avatar Generation Time + +```promql +histogram_quantile(0.95, rate(ivatar_avatar_generation_seconds_bucket[5m])) +``` + +#### File Upload Success Rate + +```promql +rate(ivatar_file_uploads_total{success="true"}[5m]) / +rate(ivatar_file_uploads_total[5m]) +``` + +## Tracing + +### Trace Points + +#### Request Lifecycle + +- HTTP request processing +- Avatar generation pipeline +- File upload and processing +- Authentication flows +- External API calls + +#### Custom Spans + +- `avatar.generate_png`: PNG image generation +- `avatar.gravatar_proxy`: Gravatar service proxy +- `file_upload.process`: File upload processing +- `auth.login`: User authentication +- `auth.logout`: User logout + +### Span Attributes + +#### HTTP Attributes + +- `http.method`: HTTP method +- `http.url`: Full request URL +- `http.status_code`: Response status code +- `http.user_agent`: Client user agent +- `http.remote_addr`: Client IP address + +#### Avatar Attributes + +- `ivatar.request_type`: Request type (avatar, stats, etc.) +- `ivatar.avatar_size`: Requested avatar size +- `ivatar.avatar_format`: Requested format +- `ivatar.avatar_email`: Email address (if applicable) + +#### File Attributes + +- `file.name`: Uploaded file name +- `file.size`: File size in bytes +- `file.content_type`: MIME type + +## Infrastructure Requirements + +### Option A: Extend Existing Stack (Recommended) + +The existing monitoring stack can be extended to support OpenTelemetry: + +#### Alloy Configuration + +```yaml +# Add to existing Alloy configuration +otelcol.receiver.otlp: + grpc: + endpoint: 0.0.0.0:4317 + http: + endpoint: 0.0.0.0:4318 + +otelcol.processor.batch: + timeout: 1s + send_batch_size: 1024 + +otelcol.exporter.prometheus: + endpoint: "0.0.0.0:9464" + +otelcol.exporter.jaeger: + endpoint: "jaeger-collector:14250" + +otelcol.pipeline.traces: + receivers: [otelcol.receiver.otlp] + processors: [otelcol.processor.batch] + exporters: [otelcol.exporter.jaeger] + +otelcol.pipeline.metrics: + receivers: [otelcol.receiver.otlp] + processors: [otelcol.processor.batch] + exporters: [otelcol.exporter.prometheus] +``` + +#### Prometheus Configuration + +```yaml +scrape_configs: + - job_name: "ivatar-opentelemetry" + static_configs: + - targets: ["ivatar-prod:9464", "ivatar-dev:9464"] + scrape_interval: 15s + metrics_path: /metrics +``` + +### Option B: Dedicated OpenTelemetry Collector + +For full OpenTelemetry features, deploy a dedicated collector: + +#### Collector Configuration + +```yaml +receivers: + otlp: + protocols: + grpc: + endpoint: 0.0.0.0:4317 + http: + endpoint: 0.0.0.0:4318 + +processors: + batch: + timeout: 1s + send_batch_size: 1024 + resource: + attributes: + - key: environment + from_attribute: deployment.environment + action: insert + +exporters: + prometheus: + endpoint: "0.0.0.0:9464" + jaeger: + endpoint: "jaeger-collector:14250" + logging: + loglevel: debug + +service: + pipelines: + traces: + receivers: [otlp] + processors: [batch, resource] + exporters: [jaeger, logging] + metrics: + receivers: [otlp] + processors: [batch, resource] + exporters: [prometheus, logging] +``` + +## Deployment + +### Development Setup + +1. **Install Dependencies** + + ```bash + pip install -r requirements.txt + ``` + +2. **Configure Environment** + + ```bash + export OTEL_ENABLED=true + export OTEL_SERVICE_NAME=ivatar-development + export OTEL_ENVIRONMENT=development + ``` + +3. **Start Development Server** + + ```bash + ./manage.py runserver 0:8080 + ``` + +4. **Verify Metrics** + ```bash + curl http://localhost:9464/metrics + ``` + +### Production Deployment + +1. **Update Container Images** + + - Add OpenTelemetry dependencies to requirements.txt + - Update container build process + +2. **Configure Environment Variables** + + - Set production-specific OpenTelemetry variables + - Configure collector endpoints + +3. **Update Monitoring Stack** + + - Extend Alloy configuration + - Update Prometheus scrape configs + - Configure Grafana dashboards + +4. **Verify Deployment** + - Check metrics endpoint accessibility + - Verify trace data flow + - Monitor dashboard updates + +## Monitoring and Alerting + +### Key Metrics to Monitor + +#### Performance + +- Request duration percentiles (p50, p95, p99) +- Avatar generation time +- Cache hit ratio +- File upload success rate + +#### Business Metrics + +- Avatar requests per minute +- Popular avatar sizes +- External service usage +- User authentication success rate + +#### Error Rates + +- HTTP error rates by endpoint +- File upload failures +- External service failures +- Authentication failures + +### Example Alerts + +#### High Error Rate + +```yaml +alert: HighErrorRate +expr: rate(ivatar_requests_total{status_code=~"5.."}[5m]) > 0.1 +for: 2m +labels: + severity: warning +annotations: + summary: "High error rate detected" + description: "Error rate is {{ $value }} errors per second" +``` + +#### Slow Avatar Generation + +```yaml +alert: SlowAvatarGeneration +expr: histogram_quantile(0.95, rate(ivatar_avatar_generation_seconds_bucket[5m])) > 2 +for: 5m +labels: + severity: warning +annotations: + summary: "Slow avatar generation" + description: "95th percentile avatar generation time is {{ $value }}s" +``` + +#### Low Cache Hit Ratio + +```yaml +alert: LowCacheHitRatio +expr: (rate(ivatar_avatar_cache_hits_total[5m]) / (rate(ivatar_avatar_cache_hits_total[5m]) + rate(ivatar_avatar_cache_misses_total[5m]))) < 0.8 +for: 10m +labels: + severity: warning +annotations: + summary: "Low cache hit ratio" + description: "Cache hit ratio is {{ $value }}" +``` + +## Troubleshooting + +### Common Issues + +#### OpenTelemetry Not Enabled + +- Check `OTEL_ENABLED` environment variable +- Verify OpenTelemetry packages are installed +- Check Django logs for configuration errors + +#### Metrics Not Appearing + +- Verify Prometheus endpoint is accessible +- Check collector configuration +- Ensure metrics are being generated + +#### Traces Not Showing + +- Verify OTLP endpoint configuration +- Check collector connectivity +- Ensure tracing is enabled in configuration + +#### High Memory Usage + +- Adjust batch processor settings +- Reduce trace sampling rate +- Monitor collector resource usage + +### Debug Mode + +Enable debug logging for OpenTelemetry: + +```python +LOGGING = { + "loggers": { + "opentelemetry": { + "level": "DEBUG", + }, + "ivatar.opentelemetry": { + "level": "DEBUG", + }, + }, +} +``` + +### Performance Considerations + +- **Sampling**: Implement trace sampling for high-traffic production +- **Batch Processing**: Use appropriate batch sizes for your infrastructure +- **Resource Limits**: Monitor collector resource usage +- **Network**: Ensure low-latency connections to collectors + +## Security Considerations + +- **Data Privacy**: Ensure no sensitive data in trace attributes +- **Network Security**: Use TLS for collector communications +- **Access Control**: Restrict access to metrics endpoints +- **Data Retention**: Configure appropriate retention policies + +## Future Enhancements + +- **Custom Dashboards**: Create Grafana dashboards for avatar metrics +- **Advanced Sampling**: Implement intelligent trace sampling +- **Log Correlation**: Correlate traces with application logs +- **Performance Profiling**: Add profiling capabilities +- **Custom Exports**: Export to additional backends (Datadog, New Relic) + +## Support + +For issues related to OpenTelemetry integration: + +- Check application logs for configuration errors +- Verify collector connectivity +- Review Prometheus metrics for data flow +- Consult OpenTelemetry documentation for advanced configuration diff --git a/OPENTELEMETRY_INFRASTRUCTURE.md b/OPENTELEMETRY_INFRASTRUCTURE.md new file mode 100644 index 0000000..28695ff --- /dev/null +++ b/OPENTELEMETRY_INFRASTRUCTURE.md @@ -0,0 +1,433 @@ +# OpenTelemetry Infrastructure Requirements + +This document outlines the infrastructure requirements and deployment strategy for OpenTelemetry in the ivatar project, considering the existing Fedora Project hosting environment and multi-instance setup. + +## Current Infrastructure Analysis + +### Existing Monitoring Stack + +- **Prometheus + Alertmanager**: Metrics collection and alerting +- **Loki**: Log aggregation +- **Alloy**: Observability data collection +- **Grafana**: Visualization and dashboards +- **Custom exporters**: Application-specific metrics + +### Production Environment + +- **Scale**: Millions of requests daily, 30k+ users, 33k+ avatar images +- **Infrastructure**: Fedora Project hosted, high-performance system +- **Architecture**: Apache HTTPD + Gunicorn containers + PostgreSQL +- **Containerization**: Podman (not Docker) + +### Multi-Instance Setup + +- **Production**: Production environment (master branch) +- **Development**: Development environment (devel branch) +- **Deployment**: GitLab CI/CD with Puppet automation + +## Infrastructure Options + +### Option A: Extend Existing Alloy Stack (Recommended) + +**Advantages:** + +- Leverages existing infrastructure +- Minimal additional complexity +- Consistent with current monitoring approach +- Cost-effective + +**Implementation:** + +```yaml +# Alloy configuration extension +otelcol.receiver.otlp: + grpc: + endpoint: 0.0.0.0:4317 + http: + endpoint: 0.0.0.0:4318 + +otelcol.processor.batch: + timeout: 1s + send_batch_size: 1024 + +otelcol.exporter.prometheus: + endpoint: "0.0.0.0:9464" + +otelcol.exporter.jaeger: + endpoint: "jaeger-collector:14250" + +otelcol.pipeline.traces: + receivers: [otelcol.receiver.otlp] + processors: [otelcol.processor.batch] + exporters: [otelcol.exporter.jaeger] + +otelcol.pipeline.metrics: + receivers: [otelcol.receiver.otlp] + processors: [otelcol.processor.batch] + exporters: [otelcol.exporter.prometheus] +``` + +### Option B: Dedicated OpenTelemetry Collector + +**Advantages:** + +- Full OpenTelemetry feature set +- Better performance for high-volume tracing +- More flexible configuration options +- Future-proof architecture + +**Implementation:** + +- Deploy standalone OpenTelemetry Collector +- Configure OTLP receivers and exporters +- Integrate with existing Prometheus/Grafana + +## Deployment Strategy + +### Phase 1: Development Environment + +1. **Enable OpenTelemetry in Development** + + ```bash + # Development environment configuration + export OTEL_ENABLED=true + export OTEL_SERVICE_NAME=ivatar-development + export OTEL_ENVIRONMENT=development + export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4317 + export OTEL_PROMETHEUS_ENDPOINT=0.0.0.0:9464 + ``` + +2. **Update Alloy Configuration** + + - Add OTLP receivers to existing Alloy instance + - Configure trace and metrics pipelines + - Test data flow + +3. **Verify Integration** + - Check metrics endpoint: `http://dev-instance:9464/metrics` + - Verify trace data in Jaeger + - Monitor Grafana dashboards + +### Phase 2: Production Deployment + +1. **Production Configuration** + + ```bash + # Production environment configuration + export OTEL_ENABLED=true + export OTEL_SERVICE_NAME=ivatar-production + export OTEL_ENVIRONMENT=production + export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4317 + export OTEL_PROMETHEUS_ENDPOINT=0.0.0.0:9464 + ``` + +2. **Gradual Rollout** + + - Deploy to one Gunicorn container first + - Monitor performance impact + - Gradually enable on all containers + +3. **Performance Monitoring** + - Monitor collector resource usage + - Check application performance impact + - Verify data quality + +## Resource Requirements + +### Collector Resources + +**Minimum Requirements:** + +- CPU: 2 cores +- Memory: 4GB RAM +- Storage: 10GB for temporary data +- Network: 1Gbps + +**Recommended for Production:** + +- CPU: 4 cores +- Memory: 8GB RAM +- Storage: 50GB SSD +- Network: 10Gbps + +### Network Requirements + +**Ports:** + +- 4317: OTLP gRPC receiver +- 4318: OTLP HTTP receiver +- 9464: Prometheus metrics exporter +- 14250: Jaeger trace exporter + +**Bandwidth:** + +- Estimated 1-5 Mbps per instance +- Burst capacity for peak loads +- Low-latency connection to collectors + +## Configuration Management + +### Environment-Specific Settings + +#### Production Environment + +```bash +# Production OpenTelemetry configuration +OTEL_ENABLED=true +OTEL_SERVICE_NAME=ivatar-production +OTEL_ENVIRONMENT=production +OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4317 +OTEL_PROMETHEUS_ENDPOINT=0.0.0.0:9464 +OTEL_SAMPLING_RATIO=0.1 # 10% sampling for high volume +IVATAR_VERSION=1.8.0 +HOSTNAME=prod-instance-01 +``` + +#### Development Environment + +```bash +# Development OpenTelemetry configuration +OTEL_ENABLED=true +OTEL_SERVICE_NAME=ivatar-development +OTEL_ENVIRONMENT=development +OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4317 +OTEL_PROMETHEUS_ENDPOINT=0.0.0.0:9464 +OTEL_SAMPLING_RATIO=1.0 # 100% sampling for debugging +IVATAR_VERSION=1.8.0-dev +HOSTNAME=dev-instance-01 +``` + +### Container Configuration + +#### Podman Container Updates + +```dockerfile +# Add to existing Dockerfile +RUN pip install opentelemetry-api>=1.20.0 \ + opentelemetry-sdk>=1.20.0 \ + opentelemetry-instrumentation-django>=0.42b0 \ + opentelemetry-instrumentation-psycopg2>=0.42b0 \ + opentelemetry-instrumentation-pymysql>=0.42b0 \ + opentelemetry-instrumentation-requests>=0.42b0 \ + opentelemetry-instrumentation-urllib3>=0.42b0 \ + opentelemetry-exporter-otlp>=1.20.0 \ + opentelemetry-exporter-prometheus>=1.12.0rc1 \ + opentelemetry-instrumentation-memcached>=0.42b0 +``` + +#### Container Environment Variables + +```bash +# Add to container startup script +export OTEL_ENABLED=${OTEL_ENABLED:-false} +export OTEL_SERVICE_NAME=${OTEL_SERVICE_NAME:-ivatar} +export OTEL_ENVIRONMENT=${OTEL_ENVIRONMENT:-development} +export OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_EXPORTER_OTLP_ENDPOINT} +export OTEL_PROMETHEUS_ENDPOINT=${OTEL_PROMETHEUS_ENDPOINT:-0.0.0.0:9464} +``` + +## Monitoring and Alerting + +### Collector Health Monitoring + +#### Collector Metrics + +- `otelcol_receiver_accepted_spans`: Spans received by collector +- `otelcol_receiver_refused_spans`: Spans rejected by collector +- `otelcol_exporter_sent_spans`: Spans sent to exporters +- `otelcol_exporter_failed_spans`: Failed span exports + +#### Health Checks + +```yaml +# Prometheus health check +- job_name: "otel-collector-health" + static_configs: + - targets: ["collector.internal:8888"] + metrics_path: /metrics + scrape_interval: 30s +``` + +### Application Performance Impact + +#### Key Metrics to Monitor + +- Application response time impact +- Memory usage increase +- CPU usage increase +- Network bandwidth usage + +#### Alerting Rules + +```yaml +# High collector resource usage +alert: HighCollectorCPU +expr: rate(otelcol_process_cpu_seconds_total[5m]) > 0.8 +for: 5m +labels: + severity: warning +annotations: + summary: "High collector CPU usage" + description: "Collector CPU usage is {{ $value }}" + +# Collector memory usage +alert: HighCollectorMemory +expr: otelcol_process_memory_usage_bytes / otelcol_process_memory_limit_bytes > 0.8 +for: 5m +labels: + severity: warning +annotations: + summary: "High collector memory usage" + description: "Collector memory usage is {{ $value }}" +``` + +## Security Considerations + +### Network Security + +- Use TLS for collector communications +- Restrict collector access to trusted networks +- Implement proper firewall rules + +### Data Privacy + +- Ensure no sensitive data in trace attributes +- Implement data sanitization +- Configure appropriate retention policies + +### Access Control + +- Restrict access to metrics endpoints +- Implement authentication for collector access +- Monitor access logs + +## Backup and Recovery + +### Data Retention + +- Traces: 7 days (configurable) +- Metrics: 30 days (configurable) +- Logs: 14 days (configurable) + +### Backup Strategy + +- Regular backup of collector configuration +- Backup of Grafana dashboards +- Backup of Prometheus rules + +## Performance Optimization + +### Sampling Strategy + +- Production: 10% sampling rate +- Development: 100% sampling rate +- Error traces: Always sample + +### Batch Processing + +- Optimize batch sizes for network conditions +- Configure appropriate timeouts +- Monitor queue depths + +### Resource Optimization + +- Monitor collector resource usage +- Scale collectors based on load +- Implement horizontal scaling if needed + +## Troubleshooting + +### Common Issues + +#### Collector Not Receiving Data + +- Check network connectivity +- Verify OTLP endpoint configuration +- Check collector logs + +#### High Resource Usage + +- Adjust sampling rates +- Optimize batch processing +- Scale collector resources + +#### Data Quality Issues + +- Verify instrumentation configuration +- Check span attribute quality +- Monitor error rates + +### Debug Procedures + +1. **Check Collector Status** + + ```bash + curl http://collector.internal:8888/metrics + ``` + +2. **Verify Application Configuration** + + ```bash + curl http://app:9464/metrics + ``` + +3. **Check Trace Data** + - Access Jaeger UI + - Search for recent traces + - Verify span attributes + +## Future Enhancements + +### Advanced Features + +- Custom dashboards for avatar metrics +- Advanced sampling strategies +- Log correlation with traces +- Performance profiling integration + +### Scalability Improvements + +- Horizontal collector scaling +- Load balancing for collectors +- Multi-region deployment +- Edge collection points + +### Integration Enhancements + +- Additional exporter backends +- Custom processors +- Advanced filtering +- Data transformation + +## Cost Considerations + +### Infrastructure Costs + +- Additional compute resources for collectors +- Storage costs for trace data +- Network bandwidth costs + +### Operational Costs + +- Monitoring and maintenance +- Configuration management +- Troubleshooting and support + +### Optimization Strategies + +- Implement efficient sampling +- Use appropriate retention policies +- Optimize batch processing +- Monitor resource usage + +## Conclusion + +The OpenTelemetry integration for ivatar provides comprehensive observability while leveraging the existing monitoring infrastructure. The phased deployment approach ensures minimal disruption to production services while providing valuable insights into avatar generation performance and user behavior. + +Key success factors: + +- Gradual rollout with monitoring +- Performance impact assessment +- Proper resource planning +- Security considerations +- Ongoing optimization diff --git a/attic/debug_toolbar_resources.txt b/attic/debug_toolbar_resources.txt deleted file mode 100644 index 2c35392..0000000 --- a/attic/debug_toolbar_resources.txt +++ /dev/null @@ -1,2 +0,0 @@ -https://django-debug-toolbar.readthedocs.io/en/latest/installation.html -https://stackoverflow.com/questions/6548947/how-can-django-debug-toolbar-be-set-to-work-for-just-some-users/6549317#6549317 diff --git a/attic/encryption_test.py b/attic/encryption_test.py deleted file mode 100755 index 4c10295..0000000 --- a/attic/encryption_test.py +++ /dev/null @@ -1,49 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- - -import os -import django -import timeit - -os.environ.setdefault( - "DJANGO_SETTINGS_MODULE", "ivatar.settings" -) # pylint: disable=wrong-import-position -django.setup() # pylint: disable=wrong-import-position - -from ivatar.ivataraccount.models import ConfirmedEmail, APIKey -from simplecrypt import decrypt -from binascii import unhexlify - -digest = None -digest_sha256 = None - - -def get_digest_sha256(): - digest_sha256 = ConfirmedEmail.objects.first().encrypted_digest_sha256( - secret_key=APIKey.objects.first() - ) - return digest_sha256 - - -def get_digest(): - digest = ConfirmedEmail.objects.first().encrypted_digest( - secret_key=APIKey.objects.first() - ) - return digest - - -def decrypt_digest(): - return decrypt(APIKey.objects.first().secret_key, unhexlify(digest)) - - -def decrypt_digest_256(): - return decrypt(APIKey.objects.first().secret_key, unhexlify(digest_sha256)) - - -digest = get_digest() -digest_sha256 = get_digest_sha256() - -print("Encrypt digest: %s" % timeit.timeit(get_digest, number=1)) -print("Encrypt digest_sha256: %s" % timeit.timeit(get_digest_sha256, number=1)) -print("Decrypt digest: %s" % timeit.timeit(decrypt_digest, number=1)) -print("Decrypt digest_sha256: %s" % timeit.timeit(decrypt_digest_256, number=1)) diff --git a/attic/example_mysql_config b/attic/example_mysql_config deleted file mode 100644 index a0504e8..0000000 --- a/attic/example_mysql_config +++ /dev/null @@ -1,7 +0,0 @@ -DATABASES['default'] = { - 'ENGINE': 'django.db.backends.mysql', - 'NAME': 'libravatar', - 'USER': 'libravatar', - 'PASSWORD': 'libravatar', - 'HOST': 'localhost', -} diff --git a/config.py b/config.py index edd1f07..e4416db 100644 --- a/config.py +++ b/config.py @@ -34,6 +34,9 @@ MIDDLEWARE.extend( "ivatar.middleware.CustomLocaleMiddleware", ] ) + +# Add OpenTelemetry middleware only if feature flag is enabled +# Note: This will be checked at runtime, not at import time MIDDLEWARE.insert( 0, "ivatar.middleware.MultipleProxyMiddleware", @@ -309,6 +312,13 @@ ENABLE_MALICIOUS_CONTENT_SCAN = True # Logging configuration - can be overridden in local config # Example: LOGS_DIR = "/var/log/ivatar" # For production deployments +# OpenTelemetry feature flag - can be disabled for F/LOSS deployments +ENABLE_OPENTELEMETRY = os.environ.get("ENABLE_OPENTELEMETRY", "false").lower() in ( + "true", + "1", + "yes", +) + # This MUST BE THE LAST! if os.path.isfile(os.path.join(BASE_DIR, "config_local.py")): from config_local import * # noqa # flake8: noqa # NOQA # pragma: no cover diff --git a/ivatar/opentelemetry_config.py b/ivatar/opentelemetry_config.py new file mode 100644 index 0000000..6a812ae --- /dev/null +++ b/ivatar/opentelemetry_config.py @@ -0,0 +1,233 @@ +# -*- coding: utf-8 -*- +""" +OpenTelemetry configuration for ivatar project. + +This module provides OpenTelemetry setup and configuration for the ivatar +Django application, including tracing, metrics, and logging integration. +""" + +import os +import logging + +from opentelemetry import trace, metrics +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import BatchSpanProcessor +from opentelemetry.sdk.metrics import MeterProvider +from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader +from opentelemetry.sdk.resources import Resource +from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter +from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter +from opentelemetry.exporter.prometheus import PrometheusMetricReader +from opentelemetry.instrumentation.django import DjangoInstrumentor +from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor +from opentelemetry.instrumentation.pymysql import PyMySQLInstrumentor +from opentelemetry.instrumentation.requests import RequestsInstrumentor +from opentelemetry.instrumentation.urllib3 import URLLib3Instrumentor + +# Note: Memcached instrumentation not available in OpenTelemetry Python + +logger = logging.getLogger("ivatar") + + +class OpenTelemetryConfig: + """ + OpenTelemetry configuration manager for ivatar. + + Handles setup of tracing, metrics, and instrumentation for the Django application. + """ + + def __init__(self): + self.enabled = self._is_enabled() + self.service_name = self._get_service_name() + self.environment = self._get_environment() + self.resource = self._create_resource() + + def _is_enabled(self) -> bool: + """Check if OpenTelemetry is enabled via environment variable and Django settings.""" + # First check Django settings (for F/LOSS deployments) + try: + from django.conf import settings + from django.core.exceptions import ImproperlyConfigured + + try: + if getattr(settings, "ENABLE_OPENTELEMETRY", False): + return True + except ImproperlyConfigured: + # Django settings not configured yet, fall back to environment variable + pass + except ImportError: + # Django not available yet, fall back to environment variable + pass + + # Then check OpenTelemetry-specific environment variable + return os.environ.get("OTEL_ENABLED", "false").lower() in ("true", "1", "yes") + + def _get_service_name(self) -> str: + """Get service name from environment or default.""" + return os.environ.get("OTEL_SERVICE_NAME", "ivatar") + + def _get_environment(self) -> str: + """Get environment name (production, development, etc.).""" + return os.environ.get("OTEL_ENVIRONMENT", "development") + + def _create_resource(self) -> Resource: + """Create OpenTelemetry resource with service information.""" + return Resource.create( + { + "service.name": self.service_name, + "service.version": os.environ.get("IVATAR_VERSION", "1.8.0"), + "service.namespace": "libravatar", + "deployment.environment": self.environment, + "service.instance.id": os.environ.get("HOSTNAME", "unknown"), + } + ) + + def setup_tracing(self) -> None: + """Set up OpenTelemetry tracing.""" + if not self.enabled: + logger.info("OpenTelemetry tracing disabled") + return + + try: + # Set up tracer provider + trace.set_tracer_provider(TracerProvider(resource=self.resource)) + tracer_provider = trace.get_tracer_provider() + + # Configure OTLP exporter if endpoint is provided + otlp_endpoint = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT") + if otlp_endpoint: + otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint) + span_processor = BatchSpanProcessor(otlp_exporter) + tracer_provider.add_span_processor(span_processor) + logger.info( + f"OpenTelemetry tracing configured with OTLP endpoint: {otlp_endpoint}" + ) + else: + logger.info("OpenTelemetry tracing configured without OTLP exporter") + + except Exception as e: + logger.error(f"Failed to setup OpenTelemetry tracing: {e}") + self.enabled = False + + def setup_metrics(self) -> None: + """Set up OpenTelemetry metrics.""" + if not self.enabled: + logger.info("OpenTelemetry metrics disabled") + return + + try: + # Configure metric readers + metric_readers = [] + + # Configure Prometheus exporter for metrics + prometheus_endpoint = os.environ.get( + "OTEL_PROMETHEUS_ENDPOINT", "0.0.0.0:9464" + ) + prometheus_reader = PrometheusMetricReader() + metric_readers.append(prometheus_reader) + + # Configure OTLP exporter if endpoint is provided + otlp_endpoint = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT") + if otlp_endpoint: + otlp_exporter = OTLPMetricExporter(endpoint=otlp_endpoint) + metric_reader = PeriodicExportingMetricReader(otlp_exporter) + metric_readers.append(metric_reader) + logger.info( + f"OpenTelemetry metrics configured with OTLP endpoint: {otlp_endpoint}" + ) + + # Set up meter provider with readers + meter_provider = MeterProvider( + resource=self.resource, metric_readers=metric_readers + ) + metrics.set_meter_provider(meter_provider) + + logger.info( + f"OpenTelemetry metrics configured with Prometheus endpoint: {prometheus_endpoint}" + ) + + except Exception as e: + logger.error(f"Failed to setup OpenTelemetry metrics: {e}") + self.enabled = False + + def setup_instrumentation(self) -> None: + """Set up OpenTelemetry instrumentation for various libraries.""" + if not self.enabled: + logger.info("OpenTelemetry instrumentation disabled") + return + + try: + # Django instrumentation + DjangoInstrumentor().instrument() + logger.info("Django instrumentation enabled") + + # Database instrumentation + Psycopg2Instrumentor().instrument() + PyMySQLInstrumentor().instrument() + logger.info("Database instrumentation enabled") + + # HTTP client instrumentation + RequestsInstrumentor().instrument() + URLLib3Instrumentor().instrument() + logger.info("HTTP client instrumentation enabled") + + # Note: Memcached instrumentation not available in OpenTelemetry Python + # Cache operations will be traced through Django instrumentation + + except Exception as e: + logger.error(f"Failed to setup OpenTelemetry instrumentation: {e}") + self.enabled = False + + def get_tracer(self, name: str) -> trace.Tracer: + """Get a tracer instance.""" + return trace.get_tracer(name) + + def get_meter(self, name: str) -> metrics.Meter: + """Get a meter instance.""" + return metrics.get_meter(name) + + +# Global OpenTelemetry configuration instance (lazy-loaded) +_ot_config = None + + +def get_ot_config(): + """Get the global OpenTelemetry configuration instance.""" + global _ot_config + if _ot_config is None: + _ot_config = OpenTelemetryConfig() + return _ot_config + + +def setup_opentelemetry() -> None: + """ + Set up OpenTelemetry for the ivatar application. + + This function should be called during Django application startup. + """ + logger.info("Setting up OpenTelemetry...") + + ot_config = get_ot_config() + ot_config.setup_tracing() + ot_config.setup_metrics() + ot_config.setup_instrumentation() + + if ot_config.enabled: + logger.info("OpenTelemetry setup completed successfully") + else: + logger.info("OpenTelemetry setup skipped (disabled)") + + +def get_tracer(name: str) -> trace.Tracer: + """Get a tracer instance for the given name.""" + return get_ot_config().get_tracer(name) + + +def get_meter(name: str) -> metrics.Meter: + """Get a meter instance for the given name.""" + return get_ot_config().get_meter(name) + + +def is_enabled() -> bool: + """Check if OpenTelemetry is enabled.""" + return get_ot_config().enabled diff --git a/ivatar/opentelemetry_middleware.py b/ivatar/opentelemetry_middleware.py new file mode 100644 index 0000000..9db81d2 --- /dev/null +++ b/ivatar/opentelemetry_middleware.py @@ -0,0 +1,455 @@ +# -*- coding: utf-8 -*- +""" +OpenTelemetry middleware and custom instrumentation for ivatar. + +This module provides custom OpenTelemetry instrumentation for avatar-specific +operations, including metrics and tracing for avatar generation, file uploads, +and authentication flows. +""" + +import logging +import time +from functools import wraps + +from django.http import HttpRequest, HttpResponse +from django.utils.deprecation import MiddlewareMixin + +from opentelemetry import trace +from opentelemetry.trace import Status, StatusCode + +from ivatar.opentelemetry_config import get_tracer, get_meter, is_enabled + +logger = logging.getLogger("ivatar") + + +class OpenTelemetryMiddleware(MiddlewareMixin): + """ + Custom OpenTelemetry middleware for ivatar-specific metrics and tracing. + + This middleware adds custom attributes and metrics to OpenTelemetry spans + for avatar-related operations. + """ + + def __init__(self, get_response): + self.get_response = get_response + # Don't get metrics instance here - get it lazily in __call__ + + def __call__(self, request): + if not is_enabled(): + return self.get_response(request) + + # Get metrics instance lazily when OpenTelemetry is enabled + if not hasattr(self, "metrics"): + self.metrics = get_avatar_metrics() + + # Process request to start tracing + self.process_request(request) + + response = self.get_response(request) + + # Process response to complete tracing + self.process_response(request, response) + + return response + + def process_request(self, request: HttpRequest) -> None: + """Process incoming request and start tracing.""" + if not is_enabled(): + return + + # Start span for the request + span_name = f"{request.method} {request.path}" + span = get_tracer("ivatar.middleware").start_span(span_name) + + # Add request attributes + span.set_attributes( + { + "http.method": request.method, + "http.url": request.build_absolute_uri(), + "http.user_agent": request.META.get("HTTP_USER_AGENT", ""), + "http.remote_addr": self._get_client_ip(request), + "ivatar.path": request.path, + } + ) + + # Check if this is an avatar request + if self._is_avatar_request(request): + span.set_attribute("ivatar.request_type", "avatar") + self._add_avatar_attributes(span, request) + + # Store span in request for later use + request._ot_span = span + + # Record request start time + request._ot_start_time = time.time() + + def process_response( + self, request: HttpRequest, response: HttpResponse + ) -> HttpResponse: + """Process response and complete tracing.""" + if not is_enabled(): + return response + + span = getattr(request, "_ot_span", None) + if not span: + return response + + try: + # Calculate request duration + start_time = getattr(request, "_ot_start_time", time.time()) + duration = time.time() - start_time + + # Add response attributes + span.set_attributes( + { + "http.status_code": response.status_code, + "http.response_size": len(response.content) + if hasattr(response, "content") + else 0, + "http.request.duration": duration, + } + ) + + # Set span status based on response + if response.status_code >= 400: + span.set_status( + Status(StatusCode.ERROR, f"HTTP {response.status_code}") + ) + else: + span.set_status(Status(StatusCode.OK)) + + # Record metrics + # Note: HTTP request metrics are handled by Django instrumentation + # We only record avatar-specific metrics here + + # Record avatar-specific metrics + if self._is_avatar_request(request): + # Record avatar request metric using the new metrics system + self.metrics.record_avatar_request( + size=self._get_avatar_size(request), + format_type=self._get_avatar_format(request), + ) + + finally: + span.end() + + return response + + def _is_avatar_request(self, request: HttpRequest) -> bool: + """Check if this is an avatar request.""" + return request.path.startswith("/avatar/") or request.path.startswith("/avatar") + + def _add_avatar_attributes(self, span: trace.Span, request: HttpRequest) -> None: + """Add avatar-specific attributes to span.""" + try: + # Extract avatar parameters + size = self._get_avatar_size(request) + format_type = self._get_avatar_format(request) + email = self._get_avatar_email(request) + + span.set_attributes( + { + "ivatar.avatar_size": size, + "ivatar.avatar_format": format_type, + "ivatar.avatar_email": email, + } + ) + + except Exception as e: + logger.debug(f"Failed to add avatar attributes: {e}") + + def _get_avatar_size(self, request: HttpRequest) -> str: + """Extract avatar size from request.""" + size = request.GET.get("s", "80") + return str(size) + + def _get_avatar_format(self, request: HttpRequest) -> str: + """Extract avatar format from request.""" + format_type = request.GET.get("d", "png") + return str(format_type) + + def _get_avatar_email(self, request: HttpRequest) -> str: + """Extract email from avatar request path.""" + try: + # Extract email from path like /avatar/user@example.com + path_parts = request.path.strip("/").split("/") + if len(path_parts) >= 2 and path_parts[0] == "avatar": + return path_parts[1] + except Exception: + pass + return "unknown" + + def _get_client_ip(self, request: HttpRequest) -> str: + """Get client IP address from request.""" + x_forwarded_for = request.META.get("HTTP_X_FORWARDED_FOR") + if x_forwarded_for: + return x_forwarded_for.split(",")[0].strip() + return request.META.get("REMOTE_ADDR", "unknown") + + +def trace_avatar_operation(operation_name: str): + """ + Decorator to trace avatar operations. + + Args: + operation_name: Name of the operation being traced + """ + + def decorator(func): + @wraps(func) + def wrapper(*args, **kwargs): + if not is_enabled(): + return func(*args, **kwargs) + + tracer = get_tracer("ivatar.avatar") + with tracer.start_as_current_span(f"avatar.{operation_name}") as span: + try: + result = func(*args, **kwargs) + span.set_status(Status(StatusCode.OK)) + return result + except Exception as e: + span.set_status(Status(StatusCode.ERROR, str(e))) + span.set_attribute("error.message", str(e)) + raise + + return wrapper + + return decorator + + +def trace_file_upload(operation_name: str): + """ + Decorator to trace file upload operations. + + Args: + operation_name: Name of the file upload operation being traced + """ + + def decorator(func): + @wraps(func) + def wrapper(*args, **kwargs): + if not is_enabled(): + return func(*args, **kwargs) + + tracer = get_tracer("ivatar.file_upload") + with tracer.start_as_current_span(f"file_upload.{operation_name}") as span: + try: + # Add file information if available + if args and hasattr(args[0], "FILES"): + files = args[0].FILES + if files: + file_info = list(files.values())[0] + span.set_attributes( + { + "file.name": file_info.name, + "file.size": file_info.size, + "file.content_type": file_info.content_type, + } + ) + + result = func(*args, **kwargs) + span.set_status(Status(StatusCode.OK)) + return result + except Exception as e: + span.set_status(Status(StatusCode.ERROR, str(e))) + span.set_attribute("error.message", str(e)) + raise + + return wrapper + + return decorator + + +def trace_authentication(operation_name: str): + """ + Decorator to trace authentication operations. + + Args: + operation_name: Name of the authentication operation being traced + """ + + def decorator(func): + @wraps(func) + def wrapper(*args, **kwargs): + if not is_enabled(): + return func(*args, **kwargs) + + tracer = get_tracer("ivatar.auth") + with tracer.start_as_current_span(f"auth.{operation_name}") as span: + try: + result = func(*args, **kwargs) + span.set_status(Status(StatusCode.OK)) + return result + except Exception as e: + span.set_status(Status(StatusCode.ERROR, str(e))) + span.set_attribute("error.message", str(e)) + raise + + return wrapper + + return decorator + + +class AvatarMetrics: + """ + Custom metrics for avatar operations. + + This class provides methods to record custom metrics for avatar-specific + operations like generation, caching, and external service calls. + """ + + def __init__(self): + if not is_enabled(): + return + + self.meter = get_meter("ivatar.avatar") + + # Create custom metrics + self.avatar_generated = self.meter.create_counter( + name="ivatar_avatars_generated_total", + description="Total number of avatars generated", + unit="1", + ) + + self.avatar_requests = self.meter.create_counter( + name="ivatar_avatar_requests_total", + description="Total number of avatar image requests", + unit="1", + ) + + self.avatar_cache_hits = self.meter.create_counter( + name="ivatar_avatar_cache_hits_total", + description="Total number of avatar cache hits", + unit="1", + ) + + self.avatar_cache_misses = self.meter.create_counter( + name="ivatar_avatar_cache_misses_total", + description="Total number of avatar cache misses", + unit="1", + ) + + self.external_avatar_requests = self.meter.create_counter( + name="ivatar_external_avatar_requests_total", + description="Total number of external avatar requests", + unit="1", + ) + + self.file_uploads = self.meter.create_counter( + name="ivatar_file_uploads_total", + description="Total number of file uploads", + unit="1", + ) + + self.file_upload_size = self.meter.create_histogram( + name="ivatar_file_upload_size_bytes", + description="File upload size in bytes", + unit="bytes", + ) + + def record_avatar_request(self, size: str, format_type: str): + """Record avatar request.""" + if not is_enabled(): + return + + self.avatar_requests.add( + 1, + { + "size": size, + "format": format_type, + }, + ) + + def record_avatar_generated( + self, size: str, format_type: str, source: str = "generated" + ): + """Record avatar generation.""" + if not is_enabled(): + return + + self.avatar_generated.add( + 1, + { + "size": size, + "format": format_type, + "source": source, + }, + ) + + def record_cache_hit(self, size: str, format_type: str): + """Record cache hit.""" + if not is_enabled(): + return + + self.avatar_cache_hits.add( + 1, + { + "size": size, + "format": format_type, + }, + ) + + def record_cache_miss(self, size: str, format_type: str): + """Record cache miss.""" + if not is_enabled(): + return + + self.avatar_cache_misses.add( + 1, + { + "size": size, + "format": format_type, + }, + ) + + def record_external_request(self, service: str, status_code: int): + """Record external avatar service request.""" + if not is_enabled(): + return + + self.external_avatar_requests.add( + 1, + { + "service": service, + "status_code": str(status_code), + }, + ) + + def record_file_upload(self, file_size: int, content_type: str, success: bool): + """Record file upload.""" + if not is_enabled(): + return + + self.file_uploads.add( + 1, + { + "content_type": content_type, + "success": str(success), + }, + ) + + self.file_upload_size.record( + file_size, + { + "content_type": content_type, + "success": str(success), + }, + ) + + +# Global metrics instance (lazy-loaded) +_avatar_metrics = None + + +def get_avatar_metrics(): + """Get the global avatar metrics instance.""" + global _avatar_metrics + if _avatar_metrics is None: + _avatar_metrics = AvatarMetrics() + return _avatar_metrics + + +def reset_avatar_metrics(): + """Reset the global avatar metrics instance (for testing).""" + global _avatar_metrics + _avatar_metrics = None diff --git a/ivatar/settings.py b/ivatar/settings.py index a3a9893..45bfc00 100644 --- a/ivatar/settings.py +++ b/ivatar/settings.py @@ -309,3 +309,18 @@ STATIC_ROOT = os.path.join(BASE_DIR, "static") DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField" from config import * # pylint: disable=wildcard-import,wrong-import-position,unused-wildcard-import # noqa + +# OpenTelemetry setup - must be after config import +# Only setup if feature flag is enabled +try: + if getattr(globals(), "ENABLE_OPENTELEMETRY", False): + from ivatar.opentelemetry_config import setup_opentelemetry + + setup_opentelemetry() + + # Add OpenTelemetry middleware if enabled + MIDDLEWARE.append("ivatar.opentelemetry_middleware.OpenTelemetryMiddleware") +except (ImportError, NameError): + # OpenTelemetry packages not installed or configuration failed + # ENABLE_OPENTELEMETRY not defined (shouldn't happen but be safe) + pass diff --git a/ivatar/test_opentelemetry.py b/ivatar/test_opentelemetry.py new file mode 100644 index 0000000..f9102df --- /dev/null +++ b/ivatar/test_opentelemetry.py @@ -0,0 +1,509 @@ +# -*- coding: utf-8 -*- +""" +Tests for OpenTelemetry integration in ivatar. + +This module contains comprehensive tests for OpenTelemetry functionality, +including configuration, middleware, metrics, and tracing. +""" + +import os +import unittest +from unittest.mock import patch, MagicMock +import pytest +from django.test import TestCase, RequestFactory +from django.http import HttpResponse + +from ivatar.opentelemetry_config import ( + OpenTelemetryConfig, + is_enabled, +) +from ivatar.opentelemetry_middleware import ( + OpenTelemetryMiddleware, + trace_avatar_operation, + trace_file_upload, + trace_authentication, + AvatarMetrics, + get_avatar_metrics, + reset_avatar_metrics, +) + + +@pytest.mark.opentelemetry +class OpenTelemetryConfigTest(TestCase): + """Test OpenTelemetry configuration.""" + + def setUp(self): + """Set up test environment.""" + self.original_env = os.environ.copy() + + def tearDown(self): + """Clean up test environment.""" + os.environ.clear() + os.environ.update(self.original_env) + + def test_config_disabled_by_default(self): + """Test that OpenTelemetry is disabled by default.""" + # Clear environment variables to test default behavior + original_env = os.environ.copy() + os.environ.pop("ENABLE_OPENTELEMETRY", None) + os.environ.pop("OTEL_ENABLED", None) + + try: + config = OpenTelemetryConfig() + self.assertFalse(config.enabled) + finally: + os.environ.clear() + os.environ.update(original_env) + + def test_config_enabled_with_env_var(self): + """Test that OpenTelemetry can be enabled with environment variable.""" + os.environ["OTEL_ENABLED"] = "true" + config = OpenTelemetryConfig() + self.assertTrue(config.enabled) + + def test_service_name_default(self): + """Test default service name.""" + # Clear environment variables to test default behavior + original_env = os.environ.copy() + os.environ.pop("OTEL_SERVICE_NAME", None) + + try: + config = OpenTelemetryConfig() + self.assertEqual(config.service_name, "ivatar") + finally: + os.environ.clear() + os.environ.update(original_env) + + def test_service_name_custom(self): + """Test custom service name.""" + os.environ["OTEL_SERVICE_NAME"] = "custom-service" + config = OpenTelemetryConfig() + self.assertEqual(config.service_name, "custom-service") + + def test_environment_default(self): + """Test default environment.""" + # Clear environment variables to test default behavior + original_env = os.environ.copy() + os.environ.pop("OTEL_ENVIRONMENT", None) + + try: + config = OpenTelemetryConfig() + self.assertEqual(config.environment, "development") + finally: + os.environ.clear() + os.environ.update(original_env) + + def test_environment_custom(self): + """Test custom environment.""" + os.environ["OTEL_ENVIRONMENT"] = "production" + config = OpenTelemetryConfig() + self.assertEqual(config.environment, "production") + + def test_resource_creation(self): + """Test resource creation with service information.""" + os.environ["OTEL_SERVICE_NAME"] = "test-service" + os.environ["OTEL_ENVIRONMENT"] = "test" + os.environ["IVATAR_VERSION"] = "1.0.0" + os.environ["HOSTNAME"] = "test-host" + + config = OpenTelemetryConfig() + resource = config.resource + + self.assertEqual(resource.attributes["service.name"], "test-service") + self.assertEqual(resource.attributes["service.version"], "1.0.0") + self.assertEqual(resource.attributes["deployment.environment"], "test") + self.assertEqual(resource.attributes["service.instance.id"], "test-host") + + @patch("ivatar.opentelemetry_config.OTLPSpanExporter") + @patch("ivatar.opentelemetry_config.BatchSpanProcessor") + @patch("ivatar.opentelemetry_config.trace") + def test_setup_tracing_with_otlp(self, mock_trace, mock_processor, mock_exporter): + """Test tracing setup with OTLP endpoint.""" + os.environ["OTEL_ENABLED"] = "true" + os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "http://localhost:4317" + + config = OpenTelemetryConfig() + config.setup_tracing() + + mock_exporter.assert_called_once_with(endpoint="http://localhost:4317") + mock_processor.assert_called_once() + mock_trace.get_tracer_provider().add_span_processor.assert_called_once() + + @patch("ivatar.opentelemetry_config.PrometheusMetricReader") + @patch("ivatar.opentelemetry_config.PeriodicExportingMetricReader") + @patch("ivatar.opentelemetry_config.OTLPMetricExporter") + @patch("ivatar.opentelemetry_config.metrics") + def test_setup_metrics_with_prometheus_and_otlp( + self, + mock_metrics, + mock_otlp_exporter, + mock_periodic_reader, + mock_prometheus_reader, + ): + """Test metrics setup with Prometheus and OTLP.""" + os.environ["OTEL_ENABLED"] = "true" + os.environ["OTEL_PROMETHEUS_ENDPOINT"] = "0.0.0.0:9464" + os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "http://localhost:4317" + + config = OpenTelemetryConfig() + config.setup_metrics() + + mock_prometheus_reader.assert_called_once() + mock_otlp_exporter.assert_called_once_with(endpoint="http://localhost:4317") + mock_periodic_reader.assert_called_once() + mock_metrics.set_meter_provider.assert_called_once() + + @patch("ivatar.opentelemetry_config.DjangoInstrumentor") + @patch("ivatar.opentelemetry_config.Psycopg2Instrumentor") + @patch("ivatar.opentelemetry_config.PyMySQLInstrumentor") + @patch("ivatar.opentelemetry_config.RequestsInstrumentor") + @patch("ivatar.opentelemetry_config.URLLib3Instrumentor") + def test_setup_instrumentation( + self, + mock_urllib3, + mock_requests, + mock_pymysql, + mock_psycopg2, + mock_django, + ): + """Test instrumentation setup.""" + os.environ["OTEL_ENABLED"] = "true" + + config = OpenTelemetryConfig() + config.setup_instrumentation() + + mock_django().instrument.assert_called_once() + mock_psycopg2().instrument.assert_called_once() + mock_pymysql().instrument.assert_called_once() + mock_requests().instrument.assert_called_once() + mock_urllib3().instrument.assert_called_once() + + +@pytest.mark.opentelemetry +class OpenTelemetryMiddlewareTest(TestCase): + """Test OpenTelemetry middleware.""" + + def setUp(self): + """Set up test environment.""" + self.factory = RequestFactory() + reset_avatar_metrics() # Reset global metrics instance + self.middleware = OpenTelemetryMiddleware(lambda r: HttpResponse("test")) + + @patch("ivatar.opentelemetry_middleware.is_enabled") + def test_middleware_disabled(self, mock_enabled): + """Test middleware when OpenTelemetry is disabled.""" + mock_enabled.return_value = False + + request = self.factory.get("/avatar/test@example.com") + response = self.middleware(request) + + self.assertEqual(response.status_code, 200) + self.assertFalse(hasattr(request, "_ot_span")) + + @patch("ivatar.opentelemetry_middleware.is_enabled") + @patch("ivatar.opentelemetry_middleware.get_tracer") + def test_middleware_enabled(self, mock_get_tracer, mock_enabled): + """Test middleware when OpenTelemetry is enabled.""" + mock_enabled.return_value = True + mock_tracer = MagicMock() + mock_span = MagicMock() + mock_tracer.start_span.return_value = mock_span + mock_get_tracer.return_value = mock_tracer + + request = self.factory.get("/avatar/test@example.com") + response = self.middleware(request) + + self.assertEqual(response.status_code, 200) + self.assertTrue(hasattr(request, "_ot_span")) + mock_tracer.start_span.assert_called_once() + mock_span.set_attributes.assert_called() + mock_span.end.assert_called_once() + + @patch("ivatar.opentelemetry_middleware.is_enabled") + @patch("ivatar.opentelemetry_middleware.get_tracer") + def test_avatar_request_attributes(self, mock_get_tracer, mock_enabled): + """Test that avatar requests get proper attributes.""" + mock_enabled.return_value = True + mock_tracer = MagicMock() + mock_span = MagicMock() + mock_tracer.start_span.return_value = mock_span + mock_get_tracer.return_value = mock_tracer + + request = self.factory.get("/avatar/test@example.com?s=128&d=png") + # Reset metrics to ensure we get a fresh instance + reset_avatar_metrics() + self.middleware.process_request(request) + + # Check that avatar-specific attributes were set + calls = mock_span.set_attributes.call_args_list + avatar_attrs = any( + call[0][0].get("ivatar.request_type") == "avatar" for call in calls + ) + # Also check for individual set_attribute calls + set_attribute_calls = mock_span.set_attribute.call_args_list + individual_avatar_attrs = any( + call[0][0] == "ivatar.request_type" and call[0][1] == "avatar" + for call in set_attribute_calls + ) + self.assertTrue(avatar_attrs or individual_avatar_attrs) + + def test_is_avatar_request(self): + """Test avatar request detection.""" + avatar_request = self.factory.get("/avatar/test@example.com") + non_avatar_request = self.factory.get("/stats/") + + self.assertTrue(self.middleware._is_avatar_request(avatar_request)) + self.assertFalse(self.middleware._is_avatar_request(non_avatar_request)) + + def test_get_avatar_size(self): + """Test avatar size extraction.""" + request = self.factory.get("/avatar/test@example.com?s=256") + size = self.middleware._get_avatar_size(request) + self.assertEqual(size, "256") + + def test_get_avatar_format(self): + """Test avatar format extraction.""" + request = self.factory.get("/avatar/test@example.com?d=jpg") + format_type = self.middleware._get_avatar_format(request) + self.assertEqual(format_type, "jpg") + + def test_get_avatar_email(self): + """Test email extraction from avatar request.""" + request = self.factory.get("/avatar/test@example.com") + email = self.middleware._get_avatar_email(request) + self.assertEqual(email, "test@example.com") + + +@pytest.mark.opentelemetry +class AvatarMetricsTest(TestCase): + """Test AvatarMetrics class.""" + + def setUp(self): + """Set up test environment.""" + self.metrics = AvatarMetrics() + + @patch("ivatar.opentelemetry_middleware.is_enabled") + def test_metrics_disabled(self, mock_enabled): + """Test metrics when OpenTelemetry is disabled.""" + mock_enabled.return_value = False + + # Should not raise any exceptions + self.metrics.record_avatar_generated("128", "png", "generated") + self.metrics.record_cache_hit("128", "png") + self.metrics.record_cache_miss("128", "png") + self.metrics.record_external_request("gravatar", 200) + self.metrics.record_file_upload(1024, "image/png", True) + + @patch("ivatar.opentelemetry_middleware.is_enabled") + @patch("ivatar.opentelemetry_middleware.get_meter") + def test_metrics_enabled(self, mock_get_meter, mock_enabled): + """Test metrics when OpenTelemetry is enabled.""" + mock_enabled.return_value = True + mock_meter = MagicMock() + mock_counter = MagicMock() + mock_histogram = MagicMock() + + mock_meter.create_counter.return_value = mock_counter + mock_meter.create_histogram.return_value = mock_histogram + mock_get_meter.return_value = mock_meter + + avatar_metrics = AvatarMetrics() + + # Test avatar generation recording + avatar_metrics.record_avatar_generated("128", "png", "generated") + mock_counter.add.assert_called_with( + 1, {"size": "128", "format": "png", "source": "generated"} + ) + + # Test cache hit recording + avatar_metrics.record_cache_hit("128", "png") + mock_counter.add.assert_called_with(1, {"size": "128", "format": "png"}) + + # Test file upload recording + avatar_metrics.record_file_upload(1024, "image/png", True) + mock_histogram.record.assert_called_with( + 1024, {"content_type": "image/png", "success": "True"} + ) + + +@pytest.mark.opentelemetry +class TracingDecoratorsTest(TestCase): + """Test tracing decorators.""" + + @patch("ivatar.opentelemetry_middleware.is_enabled") + @patch("ivatar.opentelemetry_middleware.get_tracer") + def test_trace_avatar_operation(self, mock_get_tracer, mock_enabled): + """Test trace_avatar_operation decorator.""" + mock_enabled.return_value = True + mock_tracer = MagicMock() + mock_span = MagicMock() + mock_tracer.start_as_current_span.return_value.__enter__.return_value = ( + mock_span + ) + mock_get_tracer.return_value = mock_tracer + + @trace_avatar_operation("test_operation") + def test_function(): + return "success" + + result = test_function() + + self.assertEqual(result, "success") + mock_tracer.start_as_current_span.assert_called_once_with( + "avatar.test_operation" + ) + mock_span.set_status.assert_called_once() + + @patch("ivatar.opentelemetry_middleware.is_enabled") + @patch("ivatar.opentelemetry_middleware.get_tracer") + def test_trace_avatar_operation_exception(self, mock_get_tracer, mock_enabled): + """Test trace_avatar_operation decorator with exception.""" + mock_enabled.return_value = True + mock_tracer = MagicMock() + mock_span = MagicMock() + mock_tracer.start_as_current_span.return_value.__enter__.return_value = ( + mock_span + ) + mock_get_tracer.return_value = mock_tracer + + @trace_avatar_operation("test_operation") + def test_function(): + raise ValueError("test error") + + with self.assertRaises(ValueError): + test_function() + + mock_span.set_status.assert_called_once() + mock_span.set_attribute.assert_called_with("error.message", "test error") + + @patch("ivatar.opentelemetry_middleware.is_enabled") + def test_trace_file_upload(self, mock_enabled): + """Test trace_file_upload decorator.""" + mock_enabled.return_value = True + + @trace_file_upload("test_upload") + def test_function(): + return "success" + + result = test_function() + self.assertEqual(result, "success") + + @patch("ivatar.opentelemetry_middleware.is_enabled") + def test_trace_authentication(self, mock_enabled): + """Test trace_authentication decorator.""" + mock_enabled.return_value = True + + @trace_authentication("test_auth") + def test_function(): + return "success" + + result = test_function() + self.assertEqual(result, "success") + + +@pytest.mark.opentelemetry +class IntegrationTest(TestCase): + """Integration tests for OpenTelemetry.""" + + def setUp(self): + """Set up test environment.""" + self.original_env = os.environ.copy() + + def tearDown(self): + """Clean up test environment.""" + os.environ.clear() + os.environ.update(self.original_env) + + @patch("ivatar.opentelemetry_config.setup_opentelemetry") + def test_setup_opentelemetry_called(self, mock_setup): + """Test that setup_opentelemetry is called during Django startup.""" + # This would be called during Django settings import + from ivatar.opentelemetry_config import setup_opentelemetry as setup_func + + setup_func() + mock_setup.assert_called_once() + + def test_is_enabled_function(self): + """Test is_enabled function.""" + # Clear environment variables to test default behavior + original_env = os.environ.copy() + os.environ.pop("ENABLE_OPENTELEMETRY", None) + os.environ.pop("OTEL_ENABLED", None) + + try: + # Test disabled by default + self.assertFalse(is_enabled()) + finally: + os.environ.clear() + os.environ.update(original_env) + + # Test enabled with environment variable + os.environ["OTEL_ENABLED"] = "true" + config = OpenTelemetryConfig() + self.assertTrue(config.enabled) + + +@pytest.mark.no_opentelemetry +class OpenTelemetryDisabledTest(TestCase): + """Test OpenTelemetry behavior when disabled (no-op mode).""" + + def setUp(self): + """Set up test environment.""" + self.original_env = os.environ.copy() + # Ensure OpenTelemetry is disabled + os.environ.pop("ENABLE_OPENTELEMETRY", None) + os.environ.pop("OTEL_ENABLED", None) + + def tearDown(self): + """Clean up test environment.""" + os.environ.clear() + os.environ.update(self.original_env) + + def test_opentelemetry_disabled_by_default(self): + """Test that OpenTelemetry is disabled by default.""" + # Clear environment variables to test default behavior + original_env = os.environ.copy() + os.environ.pop("ENABLE_OPENTELEMETRY", None) + os.environ.pop("OTEL_ENABLED", None) + + try: + self.assertFalse(is_enabled()) + finally: + os.environ.clear() + os.environ.update(original_env) + + def test_no_op_decorators_work(self): + """Test that no-op decorators work when OpenTelemetry is disabled.""" + + @trace_avatar_operation("test_operation") + def test_function(): + return "success" + + result = test_function() + self.assertEqual(result, "success") + + def test_no_op_metrics_work(self): + """Test that no-op metrics work when OpenTelemetry is disabled.""" + avatar_metrics = get_avatar_metrics() + + # These should not raise exceptions + avatar_metrics.record_avatar_generated("80", "png", "uploaded") + avatar_metrics.record_cache_hit("80", "png") + avatar_metrics.record_cache_miss("80", "png") + avatar_metrics.record_external_request("gravatar", "success") + avatar_metrics.record_file_upload("success", "image/png", True) + + def test_middleware_disabled(self): + """Test that middleware works when OpenTelemetry is disabled.""" + factory = RequestFactory() + middleware = OpenTelemetryMiddleware(lambda r: HttpResponse("test")) + + request = factory.get("/avatar/test@example.com") + response = middleware(request) + + self.assertEqual(response.status_code, 200) + self.assertEqual(response.content.decode(), "test") + + +if __name__ == "__main__": + unittest.main() diff --git a/ivatar/views.py b/ivatar/views.py index 912a60e..09ba6b2 100644 --- a/ivatar/views.py +++ b/ivatar/views.py @@ -40,6 +40,65 @@ from .ivataraccount.models import Photo from .ivataraccount.models import pil_format, file_format from .utils import is_trusted_url, mm_ng, resize_animated_gif +# Import OpenTelemetry only if feature flag is enabled +try: + from django.conf import settings + + if getattr(settings, "ENABLE_OPENTELEMETRY", False): + from .opentelemetry_middleware import trace_avatar_operation, get_avatar_metrics + + avatar_metrics = get_avatar_metrics() + else: + # Create no-op decorators and metrics when OpenTelemetry is disabled + def trace_avatar_operation(operation_name): + def decorator(func): + return func + + return decorator + + class NoOpMetrics: + def record_avatar_generated(self, *args, **kwargs): + pass + + def record_cache_hit(self, *args, **kwargs): + pass + + def record_cache_miss(self, *args, **kwargs): + pass + + def record_external_request(self, *args, **kwargs): + pass + + def record_file_upload(self, *args, **kwargs): + pass + + avatar_metrics = NoOpMetrics() +except ImportError: + # Django not available or settings not loaded + def trace_avatar_operation(operation_name): + def decorator(func): + return func + + return decorator + + class NoOpMetrics: + def record_avatar_generated(self, *args, **kwargs): + pass + + def record_cache_hit(self, *args, **kwargs): + pass + + def record_cache_miss(self, *args, **kwargs): + pass + + def record_external_request(self, *args, **kwargs): + pass + + def record_file_upload(self, *args, **kwargs): + pass + + avatar_metrics = NoOpMetrics() + # Initialize loggers logger = logging.getLogger("ivatar") security_logger = logging.getLogger("ivatar.security") @@ -122,6 +181,8 @@ class AvatarImageView(TemplateView): # Check the cache first if CACHE_RESPONSE: if centry := caches["filesystem"].get(uri): + # Record cache hit + avatar_metrics.record_cache_hit(size=str(size), format_type=imgformat) # For DEBUG purpose only # print('Cached entry for %s' % uri) return HttpResponse( @@ -131,6 +192,9 @@ class AvatarImageView(TemplateView): reason=centry["reason"], charset=centry["charset"], ) + else: + # Record cache miss + avatar_metrics.record_cache_miss(size=str(size), format_type=imgformat) # In case no digest at all is provided, return to home page if "digest" not in kwargs: @@ -298,6 +362,14 @@ class AvatarImageView(TemplateView): obj.save() if imgformat == "jpg": imgformat = "jpeg" + + # Record avatar generation metrics + avatar_metrics.record_avatar_generated( + size=str(size), + format_type=imgformat, + source="uploaded" if obj else "generated", + ) + response = CachingHttpResponse(uri, data, content_type=f"image/{imgformat}") response["Cache-Control"] = "max-age=%i" % CACHE_IMAGES_MAX_AGE # Remove Vary header for images since language doesn't matter @@ -324,6 +396,7 @@ class AvatarImageView(TemplateView): response["Vary"] = "" return response + @trace_avatar_operation("generate_png") def _return_cached_png(self, arg0, data, uri): arg0.save(data, "PNG", quality=JPEG_QUALITY) return self._return_cached_response(data, uri) @@ -336,6 +409,7 @@ class GravatarProxyView(View): # TODO: Do cache images!! Memcached? + @trace_avatar_operation("gravatar_proxy") def get( self, request, *args, **kwargs ): # pylint: disable=too-many-branches,too-many-statements,too-many-locals,no-self-use,unused-argument,too-many-return-statements diff --git a/pytest.ini b/pytest.ini index 044fe4d..4174ded 100644 --- a/pytest.ini +++ b/pytest.ini @@ -13,9 +13,11 @@ markers = slow: marks tests as slow (deselect with '-m "not slow"') integration: marks tests as integration tests unit: marks tests as unit tests + opentelemetry: marks tests as requiring OpenTelemetry to be enabled + no_opentelemetry: marks tests as requiring OpenTelemetry to be disabled # Default options -addopts = +addopts = --strict-markers --strict-config --verbose diff --git a/requirements.txt b/requirements.txt index fb25018..fddfa8e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -23,6 +23,16 @@ git+https://github.com/ofalk/identicon.git git+https://github.com/ofalk/monsterid.git git+https://github.com/ofalk/Robohash.git@devel notsetuptools +# OpenTelemetry dependencies (optional - can be disabled via feature flag) +opentelemetry-api>=1.20.0 +opentelemetry-exporter-otlp>=1.20.0 +opentelemetry-exporter-prometheus>=0.59b0 +opentelemetry-instrumentation-django>=0.42b0 +opentelemetry-instrumentation-psycopg2>=0.42b0 +opentelemetry-instrumentation-pymysql>=0.42b0 +opentelemetry-instrumentation-requests>=0.42b0 +opentelemetry-instrumentation-urllib3>=0.42b0 +opentelemetry-sdk>=1.20.0 Pillow pip psycopg2-binary diff --git a/run_tests_local.sh b/run_tests_local.sh index 1acaffa..f662bfe 100755 --- a/run_tests_local.sh +++ b/run_tests_local.sh @@ -1,10 +1,15 @@ #!/bin/bash # Run tests locally, skipping Bluesky tests that require external API credentials +# OpenTelemetry is disabled by default for local testing -echo "Running tests locally (skipping Bluesky tests)..." -echo "================================================" +echo "Running tests locally (skipping Bluesky tests, OpenTelemetry disabled)..." +echo "=======================================================================" -# Run Django tests excluding the Bluesky test file +# Ensure OpenTelemetry is disabled for local testing +export ENABLE_OPENTELEMETRY=false +export OTEL_ENABLED=false + +# Run Django tests excluding the Bluesky test file and OpenTelemetry tests python3 manage.py test \ ivatar.ivataraccount.test_auth \ ivatar.ivataraccount.test_views \ @@ -24,3 +29,9 @@ echo "python3 manage.py test -v2" echo "" echo "To run only Bluesky tests:" echo "python3 manage.py test ivatar.ivataraccount.test_views_bluesky -v2" +echo "" +echo "To run tests with OpenTelemetry enabled:" +echo "./run_tests_with_ot.sh" +echo "" +echo "To run tests without OpenTelemetry (default):" +echo "./run_tests_no_ot.sh" diff --git a/run_tests_no_ot.sh b/run_tests_no_ot.sh new file mode 100755 index 0000000..df1c175 --- /dev/null +++ b/run_tests_no_ot.sh @@ -0,0 +1,21 @@ +#!/bin/bash +# Run tests without OpenTelemetry enabled (default mode) +# This is the default test mode for most users + +set -e + +echo "Running tests without OpenTelemetry (default mode)..." + +# Ensure OpenTelemetry is disabled +export ENABLE_OPENTELEMETRY=false +export OTEL_ENABLED=false +export DJANGO_SETTINGS_MODULE=ivatar.settings + +# Run tests excluding OpenTelemetry-specific tests +python3 -m pytest \ + -m "not opentelemetry" \ + --verbose \ + --tb=short \ + "$@" + +echo "Tests completed successfully (OpenTelemetry disabled)" diff --git a/run_tests_with_ot.sh b/run_tests_with_ot.sh new file mode 100755 index 0000000..b97ef48 --- /dev/null +++ b/run_tests_with_ot.sh @@ -0,0 +1,23 @@ +#!/bin/bash +# Run tests with OpenTelemetry enabled +# This is used in CI to test OpenTelemetry functionality + +set -e + +echo "Running tests with OpenTelemetry enabled..." + +# Enable OpenTelemetry +export ENABLE_OPENTELEMETRY=true +export OTEL_ENABLED=true +export OTEL_SERVICE_NAME=ivatar-test +export OTEL_ENVIRONMENT=test +export DJANGO_SETTINGS_MODULE=ivatar.settings + +# Run tests including OpenTelemetry-specific tests +python3 -m pytest \ + -m "opentelemetry or no_opentelemetry" \ + --verbose \ + --tb=short \ + "$@" + +echo "Tests completed successfully (OpenTelemetry enabled)"