AR1/docs/workers/outlier_analysis.md

95 lines
2.2 KiB
Markdown

# Outlier Analysis Worker
Processes outlier detection jobs to identify statistical outliers in spatial data.
## Overview
The outlier analysis worker identifies features with values that are statistically unusual using z-score or MAD (Median Absolute Deviation) methods.
## Job Type
`outlier_analysis`
## Input Parameters
```json
{
"dataset_id": 123,
"value_field": "income",
"method": "zscore",
"threshold": 2.0
}
```
### Parameters
- `dataset_id` (required): Source dataset ID
- `value_field` (required): Numeric field to analyze
- `method` (optional): "zscore" or "mad" (default: "zscore")
- `threshold` (optional): Z-score threshold or MAD multiplier (default: 2.0)
## Output
Creates a new dataset with outlier analysis results:
- Original features marked as outliers
- Outlier score (z-score or MAD score)
- Outlier flag
- Original attributes preserved
## Methods
### Z-Score Method
Calculates standardized z-scores:
- Mean and standard deviation calculated
- Z-score = (value - mean) / standard_deviation
- Features with |z-score| > threshold are outliers
### MAD Method
Uses Median Absolute Deviation:
- Median calculated
- MAD = median(|value - median|)
- Modified z-score = 0.6745 * (value - median) / MAD
- Features with |modified z-score| > threshold are outliers
## Example
```bash
# Enqueue an outlier analysis job via API
curl -X POST "https://example.com/api/analysis/outlier_run.php" \
-H "Content-Type: application/json" \
-d '{
"dataset_id": 123,
"value_field": "income",
"method": "zscore",
"threshold": 2.0
}'
```
## Background Jobs
This analysis runs as a background job. The worker:
1. Fetches queued `outlier_analysis` jobs
2. Validates input parameters
3. Calculates statistics (mean/std or median/MAD)
4. Identifies outliers
5. Creates output dataset
6. Marks job as completed
## Performance Considerations
- Processing time depends on dataset size
- Z-score method requires two passes (mean/std, then scoring)
- MAD method is more robust to outliers in calculation
- Consider filtering null values before analysis
## Related Documentation
- [Outlier Analysis Tool](../analysis-tools/outliers.md)
- [Analysis API](../api/analysis.md)
- [Workers Overview](index.md)