# Outlier Detection

Identify statistical outliers in numeric fields using z-score or MAD methods.

## Overview

Outlier detection identifies features with values that are statistically unusual compared to the dataset distribution.

## Methods

### Z-Score Method

Uses mean and standard deviation:
- Z-score = (value - mean) / standard_deviation
- Features with |z-score| > threshold are outliers
- Sensitive to outliers in calculation

### MAD Method

Uses median and median absolute deviation:
- Modified z-score = 0.6745 * (value - median) / MAD
- Features with |modified z-score| > threshold are outliers
- More robust to outliers in calculation

## Inputs

- **Dataset**: Any dataset with numeric field
- **Value Field**: Numeric field to analyze
- **Method**: "zscore" or "mad" (default: "zscore")
- **Threshold**: Z-score threshold or MAD multiplier (default: 2.0)

## Outputs

New dataset containing:

- Original features
- **Outlier Score**: Z-score or MAD score
- **Is Outlier**: Boolean flag
- Original attributes

## Example

```json
{
  "dataset_id": 123,
  "value_field": "income",
  "method": "zscore",
  "threshold": 2.0
}
```

## Background Jobs

This analysis runs as a background job. See [Outlier Analysis Worker](../workers/outlier_analysis.md) for details.

## Use Cases

- Data quality assessment
- Anomaly detection
- Error identification
- Extreme value analysis

## Notes

- Null values are excluded from calculations
- Threshold of 2.0 identifies ~5% of data as outliers (normal distribution)
- MAD method recommended for skewed distributions
- Consider spatial context when interpreting results

## Related Documentation

- [Outlier Analysis Worker](../workers/outlier_analysis.md)
- [Analysis API](../api/analysis.md)