1.7 KiB
Outlier Detection
Identify statistical outliers in numeric fields using z-score or MAD methods.
Overview
Outlier detection identifies features with values that are statistically unusual compared to the dataset distribution.
Methods
Z-Score Method
Uses mean and standard deviation:
- Z-score = (value - mean) / standard_deviation
- Features with |z-score| > threshold are outliers
- Sensitive to outliers in calculation
MAD Method
Uses median and median absolute deviation:
- Modified z-score = 0.6745 * (value - median) / MAD
- Features with |modified z-score| > threshold are outliers
- More robust to outliers in calculation
Inputs
- Dataset: Any dataset with numeric field
- Value Field: Numeric field to analyze
- Method: "zscore" or "mad" (default: "zscore")
- Threshold: Z-score threshold or MAD multiplier (default: 2.0)
Outputs
New dataset containing:
- Original features
- Outlier Score: Z-score or MAD score
- Is Outlier: Boolean flag
- Original attributes
Example
{
"dataset_id": 123,
"value_field": "income",
"method": "zscore",
"threshold": 2.0
}
Background Jobs
This analysis runs as a background job. See Outlier Analysis Worker for details.
Use Cases
- Data quality assessment
- Anomaly detection
- Error identification
- Extreme value analysis
Notes
- Null values are excluded from calculations
- Threshold of 2.0 identifies ~5% of data as outliers (normal distribution)
- MAD method recommended for skewed distributions
- Consider spatial context when interpreting results
PostGIS
Mobile
QGIS
MapBender
GeoServer
GeoNode
GeoNetwork
Novella
Solutions