# Architecture Overview This document provides a comprehensive overview of the Aurora GIS architecture, including system components, data flows, and design patterns. ## System Architecture Aurora GIS follows a modular architecture with clear separation between: - **Frontend**: PHP-based web interface with JavaScript for interactivity - **Backend**: PHP application layer with PostgreSQL/PostGIS database - **Workers**: Background job processing system - **API**: RESTful API layer for programmatic access - **Analysis Engine**: Spatial analysis tools and algorithms ## Core Components ### 1. Dataset Engine The dataset engine is the core component responsible for managing spatial datasets. #### Data Storage Model Each dataset is stored in its own table following the naming convention `spatial_data_{dataset_id}`: ```sql CREATE TABLE spatial_data_{id} ( id SERIAL PRIMARY KEY, feature_id TEXT, geometry_type TEXT, properties JSONB, geometry JSONB, geom GEOMETRY, created_at TIMESTAMP DEFAULT NOW() ); ``` **Benefits:** - Better performance with large numbers of datasets - Easier data management and cleanup - Improved query performance for individual datasets - Reduced table size and index overhead #### Dataset Metadata Dataset metadata is stored in the `spatial_files` table: - File information (name, path, type, size) - User-provided description - Extracted metadata (JSONB) - Access permissions - Creation and update timestamps #### PostGIS Integration - All spatial data stored as PostGIS `GEOMETRY` type - Automatic SRID handling (default: 4326) - Spatial indexes using GiST for performance - Support for all PostGIS geometry types ### 2. Background Jobs System The background jobs system enables asynchronous processing of long-running operations. #### Job Queue Jobs are stored in the `background_jobs` table: ```sql CREATE TABLE background_jobs ( id SERIAL PRIMARY KEY, user_id INTEGER, job_type TEXT, params JSONB, status TEXT, -- 'queued', 'running', 'completed', 'failed' result JSONB, error_message TEXT, progress INTEGER, created_at TIMESTAMP, started_at TIMESTAMP, finished_at TIMESTAMP ); ``` #### Job Lifecycle 1. **Enqueue**: Job created with status 'queued' 2. **Fetch**: Worker fetches next job using `FOR UPDATE SKIP LOCKED` 3. **Process**: Worker updates status to 'running' and processes job 4. **Complete**: Worker updates status to 'completed' with results 5. **Error**: On failure, status set to 'failed' with error message #### Worker Architecture Workers are long-running PHP CLI scripts that: - Poll the database for queued jobs - Process jobs of a specific type - Handle errors gracefully - Log progress and results - Run continuously until stopped See [Workers Documentation](workers/index.md) for details on each worker. ### 3. Analysis Tools Aurora GIS provides a comprehensive suite of spatial analysis tools. #### Vector Analysis Tools - **Hot Spot Analysis**: Getis-Ord Gi* statistics for identifying clusters - **Outlier Detection**: Z-score and MAD-based outlier identification - **KDE (Kernel Density Estimation)**: Density surface generation - **Clustering**: Spatial clustering algorithms - **Proximity Analysis**: Buffer, nearest neighbor, distance calculations - **Overlay Operations**: Intersect, union, erase, join #### Raster Analysis Tools - **Zonal Statistics**: Calculate statistics within polygon zones - **Raster Histogram**: Analyze pixel value distributions - **Raster Summary**: Generate summary statistics - **Raster Profile**: Extract values along a line - **Raster Conversion**: Convert between formats - **Raster Comparison**: Compare two raster datasets See [Analysis Tools Documentation](analysis-tools/index.md) for details. ### 4. API Layer The API layer provides RESTful access to datasets and analysis tools. #### API Structure - **Basic API** (`/api/basic/index.php`): Dataset listing, details, GeoJSON queries - **Server API** (`/api/server/index.php`): Server information and capabilities - **Images API** (`/api/images/index.php`): GeoServer proxy and catalog - **Analysis APIs**: Endpoints for running analysis tools - **Worker APIs**: Endpoints for job management #### Authentication - Session-based authentication for web interface - API key authentication (optional) - Dataset-level access control - Public dataset access (configurable) See [API Documentation](api/index.md) for endpoint details. ### 5. PostGIS Data Flows #### Import Flow ``` Uploaded File ↓ Format Detection ↓ Geometry Extraction ↓ PostGIS Processing ↓ spatial_data_{id} Table ↓ Spatial Index Creation ↓ Metadata Extraction ↓ spatial_files Record ``` #### Analysis Flow ``` User Request ↓ Job Enqueue ↓ Worker Fetch ↓ PostGIS Analysis ↓ Result Table/View ↓ Job Complete ↓ User Notification ``` #### Export Flow ``` Dataset Selection ↓ Query PostGIS Table ↓ Format Conversion ↓ GeoJSON/Shapefile/CSV ↓ Download ``` ## Data Processing Pipeline ### File Upload Processing 1. **File Validation**: Check file type, size, and format 2. **Geometry Extraction**: Parse geometry from source format 3. **SRID Detection**: Identify or assign spatial reference system 4. **Table Creation**: Create `spatial_data_{id}` table 5. **Data Import**: Insert features into PostGIS table 6. **Index Creation**: Create spatial and attribute indexes 7. **Metadata Extraction**: Extract and store metadata 8. **Registration**: Create `spatial_files` record ### Analysis Processing 1. **Parameter Validation**: Validate input parameters 2. **Job Creation**: Enqueue background job 3. **Worker Processing**: Worker fetches and processes job 4. **PostGIS Execution**: Run spatial analysis queries 5. **Result Storage**: Store results in table/view 6. **Metadata Update**: Update job status and results 7. **User Notification**: Notify user of completion ## Database Schema ### Core Tables - **spatial_files**: Dataset metadata and file information - **spatial_data_{id}**: Individual dataset tables (dynamic) - **background_jobs**: Job queue and status - **user**: User accounts and authentication - **access_group**: Access control groups - **user_access**: User-group associations - **dataset_permissions**: Dataset-level permissions ### Supporting Tables - **ogc_connections**: External PostGIS connections - **scheduled_imports**: Scheduled URL imports - **map_views**: Saved map configurations - **dashboards**: Dashboard definitions - **presentations**: Presentation configurations - **categories_keywords**: Dataset categorization ## Security Architecture ### Authentication - Session-based authentication - OAuth support (GitHub, Google, Microsoft) - Password hashing (bcrypt) - Session management ### Authorization - Role-based access control (Admin, User, Publisher) - Dataset-level permissions - Access group management - Public dataset access (optional) ### Data Security - SQL injection prevention (prepared statements) - XSS protection (output escaping) - File upload validation - Path traversal prevention - Secure file storage ## Performance Optimizations ### Database Optimizations - Spatial indexes (GiST) on geometry columns - Attribute indexes on frequently queried fields - Connection pooling (PgBouncer support) - Query optimization and caching - Materialized views for complex queries ### Application Optimizations - Lazy loading of map components - Pagination for large datasets - Background job processing - Caching of metadata and configurations - Efficient JSONB storage ### Worker Optimizations - Parallel job processing (multiple workers) - Job prioritization - Resource limits and timeouts - Error handling and retry logic ## Scalability Considerations ### Horizontal Scaling - Stateless application design - Database connection pooling - Worker scaling (multiple worker instances) - Load balancing support ### Vertical Scaling - Database query optimization - Index optimization - Memory management - Worker resource allocation ## Integration Points ### External Services - **GeoServer**: WMS/WFS services - **QGIS Server**: QGIS project rendering - **pg_tileserv**: Vector tile generation - **OAuth Providers**: Authentication - **S3**: Cloud storage for large files ### Data Sources - **PostGIS Remote**: External PostGIS databases - **URL Imports**: Web-accessible spatial data - **File Uploads**: Local file uploads - **Overture Maps**: Parquet file imports - **S3 Buckets**: Cloud-based data sources ## Monitoring and Logging ### Application Logging - Error logging to files - Worker-specific logs - Import operation logs - API access logs ### Database Monitoring - Query performance monitoring - Connection pool monitoring - Table size monitoring - Index usage statistics ## Related Documentation - [Installation Guide](installation.md) - [Configuration Guide](configuration.md) - [API Documentation](api/index.md) - [Workers Documentation](workers/index.md) - [Analysis Tools Documentation](analysis-tools/index.md)