AR1/docs/architecture.md

354 lines
8.9 KiB
Markdown

# Architecture Overview
This document provides a comprehensive overview of the Aurora GIS architecture, including system components, data flows, and design patterns.
## System Architecture
Aurora GIS follows a modular architecture with clear separation between:
- **Frontend**: PHP-based web interface with JavaScript for interactivity
- **Backend**: PHP application layer with PostgreSQL/PostGIS database
- **Workers**: Background job processing system
- **API**: RESTful API layer for programmatic access
- **Analysis Engine**: Spatial analysis tools and algorithms
## Core Components
### 1. Dataset Engine
The dataset engine is the core component responsible for managing spatial datasets.
#### Data Storage Model
Each dataset is stored in its own table following the naming convention `spatial_data_{dataset_id}`:
```sql
CREATE TABLE spatial_data_{id} (
id SERIAL PRIMARY KEY,
feature_id TEXT,
geometry_type TEXT,
properties JSONB,
geometry JSONB,
geom GEOMETRY,
created_at TIMESTAMP DEFAULT NOW()
);
```
**Benefits:**
- Better performance with large numbers of datasets
- Easier data management and cleanup
- Improved query performance for individual datasets
- Reduced table size and index overhead
#### Dataset Metadata
Dataset metadata is stored in the `spatial_files` table:
- File information (name, path, type, size)
- User-provided description
- Extracted metadata (JSONB)
- Access permissions
- Creation and update timestamps
#### PostGIS Integration
- All spatial data stored as PostGIS `GEOMETRY` type
- Automatic SRID handling (default: 4326)
- Spatial indexes using GiST for performance
- Support for all PostGIS geometry types
### 2. Background Jobs System
The background jobs system enables asynchronous processing of long-running operations.
#### Job Queue
Jobs are stored in the `background_jobs` table:
```sql
CREATE TABLE background_jobs (
id SERIAL PRIMARY KEY,
user_id INTEGER,
job_type TEXT,
params JSONB,
status TEXT, -- 'queued', 'running', 'completed', 'failed'
result JSONB,
error_message TEXT,
progress INTEGER,
created_at TIMESTAMP,
started_at TIMESTAMP,
finished_at TIMESTAMP
);
```
#### Job Lifecycle
1. **Enqueue**: Job created with status 'queued'
2. **Fetch**: Worker fetches next job using `FOR UPDATE SKIP LOCKED`
3. **Process**: Worker updates status to 'running' and processes job
4. **Complete**: Worker updates status to 'completed' with results
5. **Error**: On failure, status set to 'failed' with error message
#### Worker Architecture
Workers are long-running PHP CLI scripts that:
- Poll the database for queued jobs
- Process jobs of a specific type
- Handle errors gracefully
- Log progress and results
- Run continuously until stopped
See [Workers Documentation](workers/index.md) for details on each worker.
### 3. Analysis Tools
Aurora GIS provides a comprehensive suite of spatial analysis tools.
#### Vector Analysis Tools
- **Hot Spot Analysis**: Getis-Ord Gi* statistics for identifying clusters
- **Outlier Detection**: Z-score and MAD-based outlier identification
- **KDE (Kernel Density Estimation)**: Density surface generation
- **Clustering**: Spatial clustering algorithms
- **Proximity Analysis**: Buffer, nearest neighbor, distance calculations
- **Overlay Operations**: Intersect, union, erase, join
#### Raster Analysis Tools
- **Zonal Statistics**: Calculate statistics within polygon zones
- **Raster Histogram**: Analyze pixel value distributions
- **Raster Summary**: Generate summary statistics
- **Raster Profile**: Extract values along a line
- **Raster Conversion**: Convert between formats
- **Raster Comparison**: Compare two raster datasets
See [Analysis Tools Documentation](analysis-tools/index.md) for details.
### 4. API Layer
The API layer provides RESTful access to datasets and analysis tools.
#### API Structure
- **Basic API** (`/api/basic/index.php`): Dataset listing, details, GeoJSON queries
- **Server API** (`/api/server/index.php`): Server information and capabilities
- **Images API** (`/api/images/index.php`): GeoServer proxy and catalog
- **Analysis APIs**: Endpoints for running analysis tools
- **Worker APIs**: Endpoints for job management
#### Authentication
- Session-based authentication for web interface
- API key authentication (optional)
- Dataset-level access control
- Public dataset access (configurable)
See [API Documentation](api/index.md) for endpoint details.
### 5. PostGIS Data Flows
#### Import Flow
```
Uploaded File
Format Detection
Geometry Extraction
PostGIS Processing
spatial_data_{id} Table
Spatial Index Creation
Metadata Extraction
spatial_files Record
```
#### Analysis Flow
```
User Request
Job Enqueue
Worker Fetch
PostGIS Analysis
Result Table/View
Job Complete
User Notification
```
#### Export Flow
```
Dataset Selection
Query PostGIS Table
Format Conversion
GeoJSON/Shapefile/CSV
Download
```
## Data Processing Pipeline
### File Upload Processing
1. **File Validation**: Check file type, size, and format
2. **Geometry Extraction**: Parse geometry from source format
3. **SRID Detection**: Identify or assign spatial reference system
4. **Table Creation**: Create `spatial_data_{id}` table
5. **Data Import**: Insert features into PostGIS table
6. **Index Creation**: Create spatial and attribute indexes
7. **Metadata Extraction**: Extract and store metadata
8. **Registration**: Create `spatial_files` record
### Analysis Processing
1. **Parameter Validation**: Validate input parameters
2. **Job Creation**: Enqueue background job
3. **Worker Processing**: Worker fetches and processes job
4. **PostGIS Execution**: Run spatial analysis queries
5. **Result Storage**: Store results in table/view
6. **Metadata Update**: Update job status and results
7. **User Notification**: Notify user of completion
## Database Schema
### Core Tables
- **spatial_files**: Dataset metadata and file information
- **spatial_data_{id}**: Individual dataset tables (dynamic)
- **background_jobs**: Job queue and status
- **user**: User accounts and authentication
- **access_group**: Access control groups
- **user_access**: User-group associations
- **dataset_permissions**: Dataset-level permissions
### Supporting Tables
- **ogc_connections**: External PostGIS connections
- **scheduled_imports**: Scheduled URL imports
- **map_views**: Saved map configurations
- **dashboards**: Dashboard definitions
- **presentations**: Presentation configurations
- **categories_keywords**: Dataset categorization
## Security Architecture
### Authentication
- Session-based authentication
- OAuth support (GitHub, Google, Microsoft)
- Password hashing (bcrypt)
- Session management
### Authorization
- Role-based access control (Admin, User, Publisher)
- Dataset-level permissions
- Access group management
- Public dataset access (optional)
### Data Security
- SQL injection prevention (prepared statements)
- XSS protection (output escaping)
- File upload validation
- Path traversal prevention
- Secure file storage
## Performance Optimizations
### Database Optimizations
- Spatial indexes (GiST) on geometry columns
- Attribute indexes on frequently queried fields
- Connection pooling (PgBouncer support)
- Query optimization and caching
- Materialized views for complex queries
### Application Optimizations
- Lazy loading of map components
- Pagination for large datasets
- Background job processing
- Caching of metadata and configurations
- Efficient JSONB storage
### Worker Optimizations
- Parallel job processing (multiple workers)
- Job prioritization
- Resource limits and timeouts
- Error handling and retry logic
## Scalability Considerations
### Horizontal Scaling
- Stateless application design
- Database connection pooling
- Worker scaling (multiple worker instances)
- Load balancing support
### Vertical Scaling
- Database query optimization
- Index optimization
- Memory management
- Worker resource allocation
## Integration Points
### External Services
- **GeoServer**: WMS/WFS services
- **QGIS Server**: QGIS project rendering
- **pg_tileserv**: Vector tile generation
- **OAuth Providers**: Authentication
- **S3**: Cloud storage for large files
### Data Sources
- **PostGIS Remote**: External PostGIS databases
- **URL Imports**: Web-accessible spatial data
- **File Uploads**: Local file uploads
- **Overture Maps**: Parquet file imports
- **S3 Buckets**: Cloud-based data sources
## Monitoring and Logging
### Application Logging
- Error logging to files
- Worker-specific logs
- Import operation logs
- API access logs
### Database Monitoring
- Query performance monitoring
- Connection pool monitoring
- Table size monitoring
- Index usage statistics
## Related Documentation
- [Installation Guide](installation.md)
- [Configuration Guide](configuration.md)
- [API Documentation](api/index.md)
- [Workers Documentation](workers/index.md)
- [Analysis Tools Documentation](analysis-tools/index.md)