|
# Outlier Analysis Worker
|
|
|
|
Processes outlier detection jobs to identify statistical outliers in spatial data.
|
|
|
|
## Overview
|
|
|
|
The outlier analysis worker identifies features with values that are statistically unusual using z-score or MAD (Median Absolute Deviation) methods.
|
|
|
|
## Job Type
|
|
|
|
`outlier_analysis`
|
|
|
|
## Input Parameters
|
|
|
|
```json
|
|
{
|
|
"dataset_id": 123,
|
|
"value_field": "income",
|
|
"method": "zscore",
|
|
"threshold": 2.0
|
|
}
|
|
```
|
|
|
|
### Parameters
|
|
|
|
- `dataset_id` (required): Source dataset ID
|
|
- `value_field` (required): Numeric field to analyze
|
|
- `method` (optional): "zscore" or "mad" (default: "zscore")
|
|
- `threshold` (optional): Z-score threshold or MAD multiplier (default: 2.0)
|
|
|
|
## Output
|
|
|
|
Creates a new dataset with outlier analysis results:
|
|
|
|
- Original features marked as outliers
|
|
- Outlier score (z-score or MAD score)
|
|
- Outlier flag
|
|
- Original attributes preserved
|
|
|
|
## Methods
|
|
|
|
### Z-Score Method
|
|
|
|
Calculates standardized z-scores:
|
|
- Mean and standard deviation calculated
|
|
- Z-score = (value - mean) / standard_deviation
|
|
- Features with |z-score| > threshold are outliers
|
|
|
|
### MAD Method
|
|
|
|
Uses Median Absolute Deviation:
|
|
- Median calculated
|
|
- MAD = median(|value - median|)
|
|
- Modified z-score = 0.6745 * (value - median) / MAD
|
|
- Features with |modified z-score| > threshold are outliers
|
|
|
|
## Example
|
|
|
|
```bash
|
|
# Enqueue an outlier analysis job via API
|
|
curl -X POST "https://example.com/api/analysis/outlier_run.php" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"dataset_id": 123,
|
|
"value_field": "income",
|
|
"method": "zscore",
|
|
"threshold": 2.0
|
|
}'
|
|
```
|
|
|
|
## Background Jobs
|
|
|
|
This analysis runs as a background job. The worker:
|
|
|
|
1. Fetches queued `outlier_analysis` jobs
|
|
2. Validates input parameters
|
|
3. Calculates statistics (mean/std or median/MAD)
|
|
4. Identifies outliers
|
|
5. Creates output dataset
|
|
6. Marks job as completed
|
|
|
|
## Performance Considerations
|
|
|
|
- Processing time depends on dataset size
|
|
- Z-score method requires two passes (mean/std, then scoring)
|
|
- MAD method is more robust to outliers in calculation
|
|
- Consider filtering null values before analysis
|
|
|
|
## Related Documentation
|
|
|
|
- [Outlier Analysis Tool](../analysis-tools/outliers.md)
|
|
- [Analysis API](../api/analysis.md)
|
|
- [Workers Overview](index.md)
|
|
|