AR1/docs/analysis-tools/outliers.md

1.7 KiB

Outlier Detection

Identify statistical outliers in numeric fields using z-score or MAD methods.

Overview

Outlier detection identifies features with values that are statistically unusual compared to the dataset distribution.

Methods

Z-Score Method

Uses mean and standard deviation:

  • Z-score = (value - mean) / standard_deviation
  • Features with |z-score| > threshold are outliers
  • Sensitive to outliers in calculation

MAD Method

Uses median and median absolute deviation:

  • Modified z-score = 0.6745 * (value - median) / MAD
  • Features with |modified z-score| > threshold are outliers
  • More robust to outliers in calculation

Inputs

  • Dataset: Any dataset with numeric field
  • Value Field: Numeric field to analyze
  • Method: "zscore" or "mad" (default: "zscore")
  • Threshold: Z-score threshold or MAD multiplier (default: 2.0)

Outputs

New dataset containing:

  • Original features
  • Outlier Score: Z-score or MAD score
  • Is Outlier: Boolean flag
  • Original attributes

Example

{
  "dataset_id": 123,
  "value_field": "income",
  "method": "zscore",
  "threshold": 2.0
}

Background Jobs

This analysis runs as a background job. See Outlier Analysis Worker for details.

Use Cases

  • Data quality assessment
  • Anomaly detection
  • Error identification
  • Extreme value analysis

Notes

  • Null values are excluded from calculations
  • Threshold of 2.0 identifies ~5% of data as outliers (normal distribution)
  • MAD method recommended for skewed distributions
  • Consider spatial context when interpreting results