|
# Outlier Detection
|
|
|
|
Identify statistical outliers in numeric fields using z-score or MAD methods.
|
|
|
|
## Overview
|
|
|
|
Outlier detection identifies features with values that are statistically unusual compared to the dataset distribution.
|
|
|
|
## Methods
|
|
|
|
### Z-Score Method
|
|
|
|
Uses mean and standard deviation:
|
|
- Z-score = (value - mean) / standard_deviation
|
|
- Features with |z-score| > threshold are outliers
|
|
- Sensitive to outliers in calculation
|
|
|
|
### MAD Method
|
|
|
|
Uses median and median absolute deviation:
|
|
- Modified z-score = 0.6745 * (value - median) / MAD
|
|
- Features with |modified z-score| > threshold are outliers
|
|
- More robust to outliers in calculation
|
|
|
|
## Inputs
|
|
|
|
- **Dataset**: Any dataset with numeric field
|
|
- **Value Field**: Numeric field to analyze
|
|
- **Method**: "zscore" or "mad" (default: "zscore")
|
|
- **Threshold**: Z-score threshold or MAD multiplier (default: 2.0)
|
|
|
|
## Outputs
|
|
|
|
New dataset containing:
|
|
|
|
- Original features
|
|
- **Outlier Score**: Z-score or MAD score
|
|
- **Is Outlier**: Boolean flag
|
|
- Original attributes
|
|
|
|
## Example
|
|
|
|
```json
|
|
{
|
|
"dataset_id": 123,
|
|
"value_field": "income",
|
|
"method": "zscore",
|
|
"threshold": 2.0
|
|
}
|
|
```
|
|
|
|
## Background Jobs
|
|
|
|
This analysis runs as a background job. See [Outlier Analysis Worker](../workers/outlier_analysis.md) for details.
|
|
|
|
## Use Cases
|
|
|
|
- Data quality assessment
|
|
- Anomaly detection
|
|
- Error identification
|
|
- Extreme value analysis
|
|
|
|
## Notes
|
|
|
|
- Null values are excluded from calculations
|
|
- Threshold of 2.0 identifies ~5% of data as outliers (normal distribution)
|
|
- MAD method recommended for skewed distributions
|
|
- Consider spatial context when interpreting results
|
|
|
|
## Related Documentation
|
|
|
|
- [Outlier Analysis Worker](../workers/outlier_analysis.md)
|
|
- [Analysis API](../api/analysis.md)
|
|
|