-
Notifications
You must be signed in to change notification settings - Fork 0
geo_integration_readme
This document describes the geo MVP implementation that connects blob ingestion with spatial indexing and provides CPU-based exact geometry checks using Boost.Geometry.
The geo MVP consists of four main components:
-
Geo Index Hooks (
src/api/geo_index_hooks.cpp)- Integrates spatial index updates into entity lifecycle (PUT/DELETE)
- Parses geometry from entity blobs (GeoJSON or EWKB)
- Computes sidecar metadata (MBR, centroid, z-range)
- Updates spatial index via
SpatialIndexManager
-
Boost.Geometry CPU Backend (
src/geo/boost_cpu_exact_backend.cpp)- Provides actual exact geometry intersection checks
- Uses Boost.Geometry library for computational geometry
- Supports Point, LineString, and Polygon types
- Falls back to MBR checks for unsupported types
-
Exact Geometry Check in searchIntersects (
src/index/spatial_index.cpp)- Phase 1: MBR intersection (fast candidate filter)
- Phase 2: Load entity blobs and perform exact geometry check
- Filters out MBR false positives using Boost.Geometry
- Falls back to MBR-only if exact backend not available
-
Per-PK Storage Optimization (
src/index/spatial_index.cpp)- Stores sidecar per primary key in addition to bucket JSON
- Allows updating/deleting individual entities without rewriting entire Morton bucket
- Backward compatible with existing bucket-based storage
The geo hooks are integrated into the HTTP API entity handlers:
-
PUT /entities/:key - After successful entity write, calls
GeoIndexHooks::onEntityPut() -
DELETE /entities/:key - Before entity deletion, calls
GeoIndexHooks::onEntityDelete()
Entity blobs can contain geometry in several formats:
- GeoJSON (recommended):
{
"id": "entity1",
"geometry": {
"type": "Point",
"coordinates": [10.5, 50.5]
}
}- Hex-encoded EWKB:
{
"id": "entity1",
"geometry": "0101000000000000000000244000000000008049400"
}- Binary EWKB array:
{
"id": "entity1",
"geom_blob": [1, 1, 0, 0, 0, ...]
}IMPORTANT: In the MVP implementation, spatial index updates are not atomic with entity writes.
- Entity write and spatial index update happen in separate operations
- Parse/index errors do not abort the entity write (logged only)
- Future versions should integrate into RocksDB transactions or use saga pattern
The hooks are designed to be robust:
- Geometry parse errors → logged as warnings, entity write succeeds
- Spatial index failures → logged as warnings, entity write succeeds
- Missing geometry field → silently skipped (not an error)
- Invalid JSON → logged, entity write succeeds
This ensures that geo functionality is additive and doesn't break existing functionality.
The SpatialIndexManager::searchIntersects() method now performs a two-phase query:
Phase 1: MBR Filtering (Fast)
- Uses Morton-encoded spatial index to find candidates
- Checks if entity MBR intersects query MBR
- Reduces search space by ~95% for typical queries
Phase 2: Exact Geometry Check (Accurate)
- Loads entity blob from RocksDB
- Parses geometry using EWKBParser
- Creates query geometry from bbox
- Uses Boost.Geometry to perform exact
intersects()check - Filters out false positives from MBR-only filtering
User Query (bbox)
↓
Morton Range Calculation
↓
RocksDB Range Scan (get candidates by MBR)
↓
FOR EACH candidate:
├─ MBR.intersects(query_bbox)? → NO: skip
├─ Load entity blob from RocksDB
├─ Parse geometry (GeoJSON → GeometryInfo)
├─ Boost.Geometry exactIntersects(entity_geom, query_geom)? → NO: skip
└─ YES: add to results
↓
Return filtered results (exact matches only)
- Without exact backend: Returns MBR candidates (may include false positives)
- With exact backend: Returns only true geometric intersections
- Overhead: ~1-5ms per candidate for exact check (depends on geometry complexity)
- Typical case: 10-100 candidates → 10-500ms additional latency for exact checks
- Benefit: Eliminates false positives, especially important for complex polygons
To enable the Boost.Geometry exact backend, ensure Boost is available:
# vcpkg.json already includes boost dependencies
# The backend is conditionally compiled with THEMIS_GEO_BOOST_BACKEND flagBuild with geo support:
cmake -DTHEMIS_GEO=ON -DTHEMIS_GEO_BOOST_BACKEND=ON ..If Boost.Geometry is not available:
- The build will still succeed
-
getBoostCpuBackend()returnsnullptr - Queries fall back to MBR-only filtering (no exact checks)
curl -X POST http://localhost:8080/api/spatial/index \
-H "Content-Type: application/json" \
-d '{
"table": "places",
"geometry_column": "geometry",
"config": {
"total_bounds": {"minx": -180, "miny": -90, "maxx": 180, "maxy": 90}
}
}'curl -X PUT http://localhost:8080/api/entities/places:berlin \
-H "Content-Type: application/json" \
-d '{
"key": "places:berlin",
"blob": "{\"id\":\"berlin\",\"name\":\"Berlin\",\"geometry\":{\"type\":\"Point\",\"coordinates\":[13.4,52.5]}}"
}'The spatial index is automatically updated.
curl -X POST http://localhost:8080/api/spatial/search \
-H "Content-Type: application/json" \
-d '{
"table": "places",
"bbox": {"minx": 13.0, "miny": 52.0, "maxx": 14.0, "maxy": 53.0}
}'Returns entities whose MBR intersects the query bbox. With Boost backend enabled, exact geometry checks are performed.
Run the integration tests:
cd build
ctest -R test_geo_index_integration -VTests verify:
- Entity PUT triggers spatial index insert
- searchIntersects returns correct results
- Entity DELETE removes from index
- Error handling (missing geometry, invalid JSON)
- Null spatial manager handling
-
Transactional Integration
- Integrate hooks into RocksDB WriteBatch
- Or use saga pattern for multi-step transactions
- Ensure atomicity between entity write and index update
-
Exact Geometry in Query Engine
- Wire Boost backend into
SpatialIndexManager::searchIntersects() - Load entity blobs, parse geometries, call exact checks
- Filter out MBR false positives
- Wire Boost backend into
-
Additional Backends
- SIMD-optimized CPU kernels for batch operations
- GPU compute shaders for large-scale queries
- GEOS prepared geometries plugin
-
Storage Optimization
- Migrate fully to per-PK keys
- Remove bucket JSON format (breaking change)
- Compact binary sidecar format (not JSON)
- Geometry parsing uses exception handling to prevent crashes
- No user input is directly executed (only parsed as JSON/EWKB)
- Spatial index updates are logged for audit trails
- No SQL injection risk (key-value storage only)
- MBR computation: O(n) where n = number of coordinates
- Morton encoding: O(1)
- Bucket read/write: O(k) where k = entities per bucket
- Per-PK write: O(1) additional overhead per insert/delete
- Exact checks: Depends on geometry complexity (typically fast for simple polygons)
- Geo Execution Plan:
docs/geo_execution_plan_over_blob.md - Feature Tiering:
docs/geo_feature_tiering.md - EWKB Spec: PostGIS Extended Well-Known Binary format
- Boost.Geometry: https://www.boost.org/doc/libs/release/libs/geometry/
- AQL Overview
- AQL Syntax Reference
- EXPLAIN and PROFILE
- Hybrid Queries
- Pattern Matching
- Subquery Implementation
- Subquery Quick Reference
- Fulltext Release Notes
- Hybrid Search Design
- Fulltext Search API
- Content Search
- Pagination Benchmarks
- Stemming
- Hybrid Fusion API
- Performance Tuning
- Migration Guide
- Storage Overview
- RocksDB Layout
- Geo Schema
- Index Types
- Index Statistics
- Index Backup
- HNSW Persistence
- Vector Index
- Graph Index
- Secondary Index
- Security Overview
- RBAC and Authorization
- TLS Setup
- Certificate Pinning
- Encryption Strategy
- Column Encryption
- Key Management
- Key Rotation
- HSM Integration
- PKI Integration
- eIDAS Signatures
- PII Detection
- PII API
- Threat Model
- Hardening Guide
- Incident Response
- SBOM
- Enterprise Overview
- Scalability Features
- Scalability Strategy
- HTTP Client Pool
- Enterprise Build Guide
- Enterprise Ingestion
- Benchmarks Overview
- Compression Benchmarks
- Compression Strategy
- Memory Tuning
- Hardware Acceleration
- GPU Acceleration Plan
- CUDA Backend
- Vulkan Backend
- Multi-CPU Support
- TBB Integration
- Time Series
- Vector Operations
- Graph Features
- Temporal Graphs
- Path Constraints
- Recursive Queries
- Audit Logging
- Change Data Capture
- Transactions
- Semantic Cache
- Cursor Pagination
- Compliance Features
- GNN Embeddings
- Geo Overview
- Geo Architecture
- 3D Game Acceleration
- Geo Feature Tiering
- G3 Phase 2 Status
- G5 Implementation
- Integration Guide
- Content Architecture
- Content Pipeline
- Content Manager
- JSON Ingestion
- Content Ingestion
- Filesystem API
- Image Processor
- Geo Processor
- Policy Implementation
- Developer Guide
- Implementation Status
- Development Roadmap
- Build Strategy
- Build Acceleration
- Code Quality Guide
- AQL LET Implementation
- Audit API Implementation
- SAGA API Implementation
- PKI eIDAS
- WAL Archiving
- Architecture Overview
- Strategic Overview
- Ecosystem
- MVCC Design
- Base Entity
- Caching Strategy
- Caching Data Structures
- Docker Build
- Docker Status
- Multi-Arch CI/CD
- ARM Build Guide
- ARM Packages
- Raspberry Pi Tuning
- Packaging Guide
- Package Maintainers
- Roadmap
- Changelog
- Database Capabilities
- Implementation Summary
- Sachstandsbericht 2025
- Enterprise Final Report
- Test Report
- Build Success Report
- Integration Analysis
- Source Overview
- API Implementation
- Query Engine
- Storage Layer
- Security Implementation
- CDC Implementation
- Time Series
- Utils and Helpers
Updated: 2025-11-30