oedb-backend/doc/anti_spam.md

# Anti-Spam and Caching Measures

This document describes the anti-spam and caching measures implemented in the OpenEventDatabase API to protect against abuse and improve performance.

## Implemented Measures

### 1. Rate Limiting

Rate limiting is implemented using the `RateLimitMiddleware` class, which tracks request rates by IP address and rejects requests that exceed defined limits.

#### Key Features

- **Global Rate Limit**: By default, each IP address is limited to 60 requests per minute across all endpoints.
- **Endpoint-Specific Limits**:
  - POST requests to `/event`: Limited to 10 requests per minute
  - POST requests to `/event/search`: Limited to 20 requests per minute
  - DELETE requests to `/event`: Limited to 5 requests per minute
- **Proper HTTP Responses**: When a rate limit is exceeded, the API returns a `429 Too Many Requests` response with a `Retry-After` header indicating when the client can try again.
- **Detailed Logging**: Rate limit violations are logged with details about the client IP, request method, path, and user agent for security analysis.
- **Development Mode**: Rate limiting is skipped for local requests (127.0.0.1, localhost) to facilitate development.

#### Implementation Details

The rate limiting middleware:
1. Tracks request timestamps by IP address
2. Cleans up old request timestamps that are outside the current time window
3. Counts recent requests within the time window
4. Rejects requests that exceed the defined limits
5. Handles IP addresses behind proxies by checking the `X-Forwarded-For` header

### 2. Caching

Caching is implemented using the `CacheMiddleware` class, which adds appropriate cache-control headers to responses based on the endpoint and request method.

#### Key Features

- **Global Default**: By default, GET requests are cached for 60 seconds.
- **Endpoint-Specific Caching**:
  - GET requests to `/event`: Cached for 60 seconds
  - GET requests to `/stats`: Cached for 300 seconds (5 minutes)
  - GET requests to `/demo`: Cached for 3600 seconds (1 hour)
  - POST requests to `/event/search`: Not cached
- **No Caching for Write Operations**: POST, PUT, DELETE, and PATCH requests are not cached.
- **No Caching for Error Responses**: Responses with status codes >= 400 are not cached.
- **Proper HTTP Headers**: The middleware adds appropriate `Cache-Control`, `Vary`, `Pragma`, and `Expires` headers.

#### Implementation Details

The caching middleware:
1. Determines the appropriate max-age value for the current request based on endpoint and method
2. Adds caching headers for cacheable responses
3. Adds no-cache headers for non-cacheable responses

## How These Measures Help

### Rate Limiting Benefits

1. **Prevents Abuse**: Limits the impact of malicious users trying to overload the system.
2. **Ensures Fair Usage**: Prevents a single user from consuming too many resources.
3. **Protects Against Brute Force Attacks**: Makes it harder to use brute force attacks against the API.
4. **Reduces Server Load**: Helps maintain server performance during traffic spikes.

### Caching Benefits

1. **Improves Performance**: Reduces server load by allowing clients to reuse responses.
2. **Reduces Bandwidth Usage**: Minimizes the amount of data transferred between the server and clients.
3. **Enhances User Experience**: Provides faster response times for frequently accessed resources.
4. **Optimizes Resource Usage**: Allows the server to focus on processing new requests rather than repeating the same work.

## Suggestions for Future Improvements

### Rate Limiting Enhancements

1. **API Key Authentication**: Implement API key authentication to identify users and apply different rate limits based on user roles or subscription levels.
2. **Graduated Rate Limiting**: Implement a graduated rate limiting system that reduces the rate limit after suspicious activity is detected.
3. **Distributed Rate Limiting**: Use a distributed cache (like Redis) to track rate limits across multiple server instances.
4. **Machine Learning for Abuse Detection**: Implement machine learning algorithms to detect and block abusive patterns.
5. **CAPTCHA Integration**: Add CAPTCHA challenges for suspicious requests.
6. **IP Reputation Checking**: Integrate with IP reputation services to block known malicious IPs.

### Caching Enhancements

1. **Server-Side Caching**: Implement server-side caching using a cache like Redis or Memcached to reduce database load.
2. **Cache Invalidation**: Implement a cache invalidation system to clear cached responses when the underlying data changes.
3. **Conditional Requests**: Support conditional requests using ETags and If-Modified-Since headers.
4. **Vary Header Optimization**: Optimize the Vary header to better handle different client capabilities.
5. **Cache Partitioning**: Implement cache partitioning based on user roles or other criteria.
6. **Content Compression**: Add content compression (gzip, brotli) to reduce bandwidth usage further.

## How to Monitor and Adjust

### Monitoring Rate Limiting

The rate limiting middleware logs detailed information about rate limit violations. You can monitor these logs to:
- Identify potential abuse patterns
- Adjust rate limits based on actual usage patterns
- Detect and block malicious IPs

### Adjusting Rate Limits

To adjust the rate limits, modify the `RateLimitMiddleware` class in `oedb/middleware/rate_limit.py`:
- Change the `window_size` and `max_requests` parameters in the constructor
- Modify the `rate_limit_rules` list to adjust endpoint-specific limits

### Monitoring Caching

To monitor the effectiveness of caching:
- Use browser developer tools to check if responses are being cached correctly
- Monitor server logs to see if the same requests are being processed repeatedly
- Use performance monitoring tools to measure response times

### Adjusting Caching

To adjust the caching settings, modify the `CacheMiddleware` class in `oedb/middleware/cache.py`:
- Change the `default_max_age` parameter in the constructor
- Modify the `caching_rules` list to adjust endpoint-specific caching durations
up demo 2025-09-16 01:01:32 +02:00			`# Anti-Spam and Caching Measures`

			`This document describes the anti-spam and caching measures implemented in the OpenEventDatabase API to protect against abuse and improve performance.`

			`## Implemented Measures`

			`### 1. Rate Limiting`

			Rate limiting is implemented using the `RateLimitMiddleware` class, which tracks request rates by IP address and rejects requests that exceed defined limits.

			`#### Key Features`

			`- Global Rate Limit: By default, each IP address is limited to 60 requests per minute across all endpoints.`
			`- Endpoint-Specific Limits:`
			- POST requests to `/event`: Limited to 10 requests per minute
			- POST requests to `/event/search`: Limited to 20 requests per minute
			- DELETE requests to `/event`: Limited to 5 requests per minute
			- Proper HTTP Responses: When a rate limit is exceeded, the API returns a `429 Too Many Requests` response with a `Retry-After` header indicating when the client can try again.
			`- Detailed Logging: Rate limit violations are logged with details about the client IP, request method, path, and user agent for security analysis.`
			`- Development Mode: Rate limiting is skipped for local requests (127.0.0.1, localhost) to facilitate development.`

			`#### Implementation Details`

			`The rate limiting middleware:`
			`1. Tracks request timestamps by IP address`
			`2. Cleans up old request timestamps that are outside the current time window`
			`3. Counts recent requests within the time window`
			`4. Rejects requests that exceed the defined limits`
			5. Handles IP addresses behind proxies by checking the `X-Forwarded-For` header

			`### 2. Caching`

			Caching is implemented using the `CacheMiddleware` class, which adds appropriate cache-control headers to responses based on the endpoint and request method.

			`#### Key Features`

			`- Global Default: By default, GET requests are cached for 60 seconds.`
			`- Endpoint-Specific Caching:`
			- GET requests to `/event`: Cached for 60 seconds
			- GET requests to `/stats`: Cached for 300 seconds (5 minutes)
			- GET requests to `/demo`: Cached for 3600 seconds (1 hour)
			- POST requests to `/event/search`: Not cached
			`- No Caching for Write Operations: POST, PUT, DELETE, and PATCH requests are not cached.`
			`- No Caching for Error Responses: Responses with status codes >= 400 are not cached.`
			- Proper HTTP Headers: The middleware adds appropriate `Cache-Control`, `Vary`, `Pragma`, and `Expires` headers.

			`#### Implementation Details`

			`The caching middleware:`
			`1. Determines the appropriate max-age value for the current request based on endpoint and method`
			`2. Adds caching headers for cacheable responses`
			`3. Adds no-cache headers for non-cacheable responses`

			`## How These Measures Help`

			`### Rate Limiting Benefits`

			`1. Prevents Abuse: Limits the impact of malicious users trying to overload the system.`
			`2. Ensures Fair Usage: Prevents a single user from consuming too many resources.`
			`3. Protects Against Brute Force Attacks: Makes it harder to use brute force attacks against the API.`
			`4. Reduces Server Load: Helps maintain server performance during traffic spikes.`

			`### Caching Benefits`

			`1. Improves Performance: Reduces server load by allowing clients to reuse responses.`
			`2. Reduces Bandwidth Usage: Minimizes the amount of data transferred between the server and clients.`
			`3. Enhances User Experience: Provides faster response times for frequently accessed resources.`
			`4. Optimizes Resource Usage: Allows the server to focus on processing new requests rather than repeating the same work.`

			`## Suggestions for Future Improvements`

			`### Rate Limiting Enhancements`

			`1. API Key Authentication: Implement API key authentication to identify users and apply different rate limits based on user roles or subscription levels.`
			`2. Graduated Rate Limiting: Implement a graduated rate limiting system that reduces the rate limit after suspicious activity is detected.`
			`3. Distributed Rate Limiting: Use a distributed cache (like Redis) to track rate limits across multiple server instances.`
			`4. Machine Learning for Abuse Detection: Implement machine learning algorithms to detect and block abusive patterns.`
			`5. CAPTCHA Integration: Add CAPTCHA challenges for suspicious requests.`
			`6. IP Reputation Checking: Integrate with IP reputation services to block known malicious IPs.`

			`### Caching Enhancements`

			`1. Server-Side Caching: Implement server-side caching using a cache like Redis or Memcached to reduce database load.`
			`2. Cache Invalidation: Implement a cache invalidation system to clear cached responses when the underlying data changes.`
			`3. Conditional Requests: Support conditional requests using ETags and If-Modified-Since headers.`
			`4. Vary Header Optimization: Optimize the Vary header to better handle different client capabilities.`
			`5. Cache Partitioning: Implement cache partitioning based on user roles or other criteria.`
			`6. Content Compression: Add content compression (gzip, brotli) to reduce bandwidth usage further.`

			`## How to Monitor and Adjust`

			`### Monitoring Rate Limiting`

			`The rate limiting middleware logs detailed information about rate limit violations. You can monitor these logs to:`
			`- Identify potential abuse patterns`
			`- Adjust rate limits based on actual usage patterns`
			`- Detect and block malicious IPs`

			`### Adjusting Rate Limits`

			To adjust the rate limits, modify the `RateLimitMiddleware` class in `oedb/middleware/rate_limit.py`:
			- Change the `window_size` and `max_requests` parameters in the constructor
			- Modify the `rate_limit_rules` list to adjust endpoint-specific limits

			`### Monitoring Caching`

			`To monitor the effectiveness of caching:`
			`- Use browser developer tools to check if responses are being cached correctly`
			`- Monitor server logs to see if the same requests are being processed repeatedly`
			`- Use performance monitoring tools to measure response times`

			`### Adjusting Caching`

			To adjust the caching settings, modify the `CacheMiddleware` class in `oedb/middleware/cache.py`:
			- Change the `default_max_age` parameter in the constructor
			- Modify the `caching_rules` list to adjust endpoint-specific caching durations