# Outlier Detection Identify statistical outliers in numeric fields using z-score or MAD methods. ## Overview Outlier detection identifies features with values that are statistically unusual compared to the dataset distribution. ## Methods ### Z-Score Method Uses mean and standard deviation: - Z-score = (value - mean) / standard_deviation - Features with |z-score| > threshold are outliers - Sensitive to outliers in calculation ### MAD Method Uses median and median absolute deviation: - Modified z-score = 0.6745 * (value - median) / MAD - Features with |modified z-score| > threshold are outliers - More robust to outliers in calculation ## Inputs - **Dataset**: Any dataset with numeric field - **Value Field**: Numeric field to analyze - **Method**: "zscore" or "mad" (default: "zscore") - **Threshold**: Z-score threshold or MAD multiplier (default: 2.0) ## Outputs New dataset containing: - Original features - **Outlier Score**: Z-score or MAD score - **Is Outlier**: Boolean flag - Original attributes ## Example ```json { "dataset_id": 123, "value_field": "income", "method": "zscore", "threshold": 2.0 } ``` ## Background Jobs This analysis runs as a background job. See [Outlier Analysis Worker](../workers/outlier_analysis.md) for details. ## Use Cases - Data quality assessment - Anomaly detection - Error identification - Extreme value analysis ## Notes - Null values are excluded from calculations - Threshold of 2.0 identifies ~5% of data as outliers (normal distribution) - MAD method recommended for skewed distributions - Consider spatial context when interpreting results ## Related Documentation - [Outlier Analysis Worker](../workers/outlier_analysis.md) - [Analysis API](../api/analysis.md)