Skip to content
makr-code edited this page Nov 30, 2025 · 2 revisions

ThemisDB - Vollständige Features Liste

Version: 1.0
Stand: November 2025
Status-Legende: ✅ Production-Ready | 🔧 Beta | 📋 Geplant


🎯 Übersicht

ThemisDB ist eine Multi-Model Database mit ACID-Garantien, die relationale, Graph-, Vektor- und Dokument-Datenmodelle in einem einheitlichen System vereint. Basierend auf RocksDB (LSM-Tree) mit erweiterter Sicherheits- und Compliance-Architektur.

Kernmerkmale:

  • 🔒 ACID-Transaktionen mit MVCC (Snapshot Isolation)
  • 🔍 Multi-Model Support (Relational, Graph, Vector, Document)
  • 🚀 High-Performance (45K writes/s, 120K reads/s)
  • 🛡️ Enterprise Security (TLS 1.3, RBAC, Verschlüsselung, Audit)
  • 📊 Advanced Query Language (AQL mit Graph-Traversals, Aggregationen)
  • 🌐 Production-Ready (85%+ Test Coverage, Comprehensive Monitoring)

📦 Storage & Data Model

Canonical Storage Layer ✅

Status: Production-Ready | Docs: docs/architecture/base_entity.md

  • Base Entity - Unified JSON/Binary blob storage für alle Datenmodelle
  • RocksDB TransactionDB - LSM-Tree mit ACID-Garantien
  • VelocyPack/Bincode - High-Performance Serialization
  • Multi-Format Support - JSON, Binary, Custom Formats
  • Fast Field Extraction - Optimierte Parsing-Pipeline

Key Features:

  • Atomic updates über alle Index-Layer
  • Write-optimiert (append-only LSM-Tree)
  • Configurable compression (LZ4, ZSTD, Snappy)
  • BlobDB support für große Objekte

Multi-Model Mapping ✅

Status: Production-Ready

Modell Logical Entity Physical Storage Key Format
Relational Row (PK, Blob) table:pk
Document JSON Document (PK, Blob) collection:pk
Graph (Nodes) Vertex (PK, Blob) node:pk
Graph (Edges) Edge (PK, Blob) edge:pk
Vector Embedding Object (PK, Blob) object:pk

External Blob Storage ✅

Status: Production-Ready | Docs: docs/storage/CLOUD_BLOB_BACKENDS.md

  • Filesystem Backend - Hierarchische lokale Speicherung
  • WebDAV/ActiveDirectory - SharePoint & Enterprise Integration
  • S3 Compatible - Interface ready (AWS, MinIO, etc.)
  • Azure Blob - Interface ready
  • Threshold-basierte Selektion - Automatische Backend-Wahl
  • SHA256 Content Hashing - Deduplizierung & Integrität

🔍 Indexing & Query

Secondary Indexes ✅

Status: Production-Ready | Docs: docs/features/indexes.md

Index-Typen:

  • Single-Column - Equality-basierte Suche
  • Composite - Multi-Spalten-Indizes
  • Range - Bereichsabfragen (>, <, BETWEEN)
  • Sparse - Nur für existierende Werte
  • Geo-Spatial - R-Tree für räumliche Suche
  • TTL (Time-To-Live) - Automatisches Expiration
  • Full-Text - Inverted Index für Textsuche

Features:

  • Automatic index maintenance mit MVCC
  • Thread-safe operations
  • Index statistics & cardinality estimation
  • Rebuild & reindex operations
  • Performance metrics

API:

POST /index/create
{ "table": "users", "column": "age", "type": "range" }

Graph Projections ✅

Status: Production-Ready | Docs: docs/features/recursive_path_queries.md

Index-Strukturen:

  • Outdex - Ausgehende Kanten (graph:out:node:edge)
  • Indeg - Eingehende Kanten (graph:in:node:edge)
  • Type-Aware - Server-side Kantentyp-Filterung
  • Property Storage - Edge properties mit Gewichtung

Algorithmen:

  • BFS (Breadth-First Search) - Tiefenbegrenzte Traversierung
  • Dijkstra - Kürzeste Pfade (gewichtet)
  • A* - Heuristische Pfadsuche
  • Recursive Path Queries - Variable Tiefe (1-N hops)
  • Temporal Graph Queries - Zeitbereichs-Filter

Path Constraints:

  • Last-Edge Constraints
  • No-Vertex Repetition
  • Type-based Pruning

Vector Search ✅

Status: Production-Ready | Docs: docs/features/vector_ops.md

HNSW Index:

  • Persistent HNSW - Crash-safe, transactional
  • Distance Metrics - L2, Cosine, Dot Product
  • Batch Operations - Insert 500-1000 vectors
  • KNN Search - Approximate Nearest Neighbors
  • Configurable Parameters - M, efConstruction, efSearch

Performance:

  • Throughput: 1,800 queries/s (CPU)
  • Latency: p50 = 0.55ms, p99 = 2.1ms
  • GPU Acceleration planned (50K+ q/s)

API:

POST /vector/search
{ "vector": [0.1, 0.2, ...], "k": 10, "metric": "cosine" }

🔎 Query Language (AQL)

Advanced Query Language ✅

Status: Production-Ready | Docs: docs/aql/syntax.md

Syntax-Konstrukte:

  • FOR/FILTER/SORT/LIMIT/RETURN - SQL-ähnliche Semantik
  • Graph Traversals - FOR v,e,p IN 1..3 OUTBOUND start
  • COLLECT/GROUP BY - Aggregationen (COUNT, SUM, AVG, MIN, MAX)
  • Subqueries - Nested queries mit IN/ALL/ANY
  • Pattern Matching - Graph pattern expressions
  • Temporal Filters - Zeitbereichs-Abfragen

Query Optimizer:

  • Cost-Based - Index selection, predicate ordering
  • EXPLAIN - Execution plan visualization
  • PROFILE - Runtime metrics & bottleneck analysis
  • Parallelization - Intel TBB task-based execution

Metriken (PROFILE):

  • edges_expanded - Graph traversal expansion rate
  • prune_last_level - Pruning effectiveness
  • index_scan_cost - Index operation costs

Hybrid Search ✅

Status: Production-Ready (Phase 4) | Docs: docs/apis/hybrid_search_api.md

Pre-Filtering:

  • Relational predicate → Candidate bitset
  • Vector HNSW search über filtered candidates
  • Graph expansion mit constraints

Post-Filtering:

  • Global vector search → Top-K results
  • Relational/Graph filters auf result set

Use Cases:

  • "Finde ähnliche Dokumente (vector) aus Abteilung X (relational) mit Tag Y (graph)"
  • Fusion von Similarity, Metadata und Relationships

🔒 Security & Compliance

Enterprise Security Stack ✅

Status: Production-Ready (85% Coverage) | Docs: docs/security/implementation_summary.md

TLS/SSL Hardening ✅

  • TLS 1.3 default (TLS 1.2 fallback)
  • Strong Ciphers - ECDHE-RSA-AES256-GCM-SHA384, ChaCha20-Poly1305
  • mTLS - Client certificate verification
  • HSTS Headers - max-age=31536000; includeSubDomains
  • Certificate Pinning - SHA256 fingerprints für HSM/TSA

Rate Limiting & DoS Protection ✅

  • Token Bucket Algorithm - 100 req/min default
  • Per-IP & Per-User Limits - Configurable thresholds
  • HTTP 429 Responses - Retry-After headers
  • Metrics - Real-time monitoring

Input Validation ✅

  • JSON Schema Validation - Strict type checking
  • AQL Injection Prevention - Parameterized queries
  • Path Traversal Protection - Sanitized file paths
  • Max Body Size - 10MB default limit

Security Headers ✅

  • X-Frame-Options: DENY
  • X-Content-Type-Options: nosniff
  • X-XSS-Protection: 1; mode=block
  • Content-Security-Policy - Configurable
  • CORS Whitelisting - Strict origin control

RBAC (Role-Based Access Control) ✅

Status: Production-Ready | Docs: docs/security/implementation_summary.md

Role Hierarchy:

admin → operator → analyst → readonly

Permissions:

  • data:read, data:write, data:delete
  • keys:rotate, keys:view
  • audit:view, audit:export
  • config:modify
  • Wildcard support: *:*

Features:

  • JSON/YAML configuration
  • User-role mapping store
  • Resource-based access control

Encryption ✅

Status: Production-Ready | Docs: docs/security/column_encryption.md

Field-Level Encryption ✅

  • AES-256-GCM - Authenticated encryption
  • Transparent Operations - App-level abstraction
  • Schema-Based - Selective field encryption
  • Index Compatibility - Encrypted fields können indexiert werden

Key Management:

  • MockKeyProvider - Development/Testing
  • HSMKeyProvider - PKCS#11 HSM integration
  • VaultKeyProvider - HashiCorp Vault

Key Rotation:

  • Lazy Re-Encryption - Zero-downtime rotation
  • Transparent Migration - Gradual re-encryption
  • Audit Trail - Rotation tracking

API:

PUT /config/encryption-schema
{
  "fields": {
    "ssn": { "encrypted": true, "algorithm": "AES-256-GCM" }
  }
}

Audit Log Encryption ✅

  • Encrypt-then-Sign - Confidentiality + Integrity
  • Hash Chain - Tamper-detection (Merkle-like)
  • PKI Signatures - RSA-SHA256 (eIDAS-konform)

Secrets Management ✅

Status: Production-Ready | Docs: docs/security/implementation_summary.md

HashiCorp Vault Integration:

  • KV v2 Engine - Secret storage
  • AppRole Auth - Service authentication
  • Auto Token Renewal - Lease management
  • Rotation Callbacks - Dynamic secret updates
  • Environment Fallback - Development mode

Audit Logging ✅

Status: Production-Ready | Docs: docs/features/audit_logging.md

Event Types (65+):

  • LOGIN_FAILED, PRIVILEGE_ESCALATION_ATTEMPT
  • DATA_ACCESS, DATA_MODIFIED, DATA_DELETED
  • KEY_ROTATED, ENCRYPTION_FAILED
  • UNAUTHORIZED_ACCESS, SCHEMA_CHANGED

Features:

  • Severity Levels - HIGH, MEDIUM, LOW
  • SIEM Integration - Syslog RFC 5424, Splunk HEC
  • Tamper-Proof - Hash chain verification
  • Retention Policies - Auto-archival & purging

API:

GET /audit/logs?severity=HIGH&from=2025-01-01

Compliance ✅

Status: Production-Ready | Docs: docs/features/compliance.md

GDPR/DSGVO:

  • ✅ Recht auf Löschung (Deletion API)
  • ✅ Recht auf Auskunft (Data export)
  • ✅ Pseudonymisierung (Field encryption)
  • ✅ Data classification (4 Stufen: offen/vs-nfd/geheim/streng_geheim)

SOC 2 Controls:

  • ✅ CC6.1 - Access Control (RBAC)
  • ✅ CC6.7 - Audit Logs
  • ✅ CC7.2 - Change Management

HIPAA:

  • ✅ §164.312(a)(1) - Access Control
  • ✅ §164.312(e)(1) - Transmission Security (TLS 1.3)

PII Detection (7 Typen):

  • ✅ Email, Phone, SSN, Credit Card, IBAN, IP, URL
  • ✅ Automatic pattern recognition
  • ✅ YAML-configurable rules

Multi-Tenancy ✅

Status: Production-Ready | Docs: docs/features/multi_tenancy.md

Features:

  • Tenant Lifecycle - Create, Update, Delete, Enable/Disable
  • Tenant Identification - Header-based (X-Tenant-ID), Path-based
  • Resource Quotas - Storage, Documents, Collections, Queries, Connections
  • Rate Limiting - Per-tenant requests/sec with burst control
  • Feature Flags - GPU, Vector, Graph, Timeseries, Geo, Full-Text
  • Encryption - Tenant-specific keys, optional mandatory encryption
  • Usage Tracking - Storage, Documents, Requests, Bandwidth
  • Billing Integration - Prometheus metrics export
  • Data Isolation - Complete tenant separation

📊 Time-Series & Analytics

Time-Series Engine ✅

Status: Production-Ready | Docs: docs/features/time_series.md

Features:

  • Gorilla Compression - 10-20x compression ratio
  • Continuous Aggregates - Pre-computed rollups (360-3600x speedup)
  • Retention Policies - Auto-expiration
  • Downsampling - Multi-resolution storage
  • Aggregate Scheduler - Automatic background refresh
  • Query Optimizer - Cost-based aggregate rewriting

Performance:

  • 22/22 tests passing
  • Sub-millisecond query latency (with aggregates)
  • Efficient storage for metrics/logs

OLAP Analytics ✅

Status: Production-Ready | Docs: docs/features/olap_analytics.md

Features:

  • Aggregations - COUNT, SUM, AVG, MIN, MAX, STDDEV, VARIANCE, MEDIAN, PERCENTILE
  • Grouping Operators - CUBE, ROLLUP, GROUPING SETS
  • Window Functions - PARTITION BY, ORDER BY, ROWS/RANGE frames
  • Columnar Store - Vektorisierte Aggregationen
  • Materialized Views - Pre-computed aggregations

Window Functions:

  • ROW_NUMBER, RANK, DENSE_RANK
  • LAG, LEAD
  • FIRST_VALUE, LAST_VALUE
  • NTILE

Temporal Graphs ✅

Status: Production-Ready | Docs: docs/features/temporal_graphs.md

Features:

  • Temporal Filters - valid_from, valid_to
  • Snapshot Queries - Point-in-time graph state
  • Time-Range Aggregations - Edge property rollups
  • Type-Aware Traversal - Filter by edge type + timestamp

API:

aggregateEdgePropertyInTimeRange(
  "user123", "FOLLOWS", "timestamp",
  from_ts, to_ts, AggregationType::COUNT
)

🔄 Transactions & Consistency

MVCC (Multi-Version Concurrency Control) ✅

Status: Production-Ready (27/27 tests) | Docs: docs/architecture/mvcc_design.md

Features:

  • Snapshot Isolation - Consistent reads
  • Write-Write Conflict Detection - Automatic rollbacks
  • Atomic Updates - Across all index layers
  • Optimistic Concurrency - High throughput

Guarantees:

  • Atomicity - All-or-nothing commits
  • Consistency - Blob + Indexes transactional
  • Isolation - Read Committed / Snapshot
  • Durability - WAL-based recovery

Transactions API ✅

Status: Production-Ready | Docs: docs/features/transactions.md

Features:

  • Session-Based Transactions - Long-lived sessions
  • Multi-Index Support - Secondary, Graph, Vector
  • Isolation Levels - read_committed, snapshot
  • Statistics - Success rate, durations

API:

POST /transaction/begin
POST /transaction/commit
POST /transaction/rollback
GET /transaction/stats

📡 Change Data Capture (CDC)

CDC Engine ✅

Status: Production-Ready | Docs: docs/features/change_data_capture.md

Features:

  • Append-Only Event Log - All mutations captured
  • Incremental Consumption - Checkpointing
  • SSE Streaming - Real-time event delivery (experimental)
  • Backpressure Handling - Flow control
  • Retention Policies - Configurable TTL

Event Types:

  • INSERT, UPDATE, DELETE
  • Full entity snapshots
  • Metadata (timestamp, user, transaction)

API:

GET /cdc/events?since=checkpoint_123

🚀 Performance & Optimization

Memory Management ✅

Status: Production-Ready | Docs: docs/performance/memory_tuning.md

Storage Hierarchy:

  • WAL on NVMe - Minimum commit latency
  • Memtable in RAM - Fast ingestion
  • Block Cache (RAM) - Hot data caching (configurable size)
  • Bloom Filters (RAM) - Probabilistic key existence checks
  • SSTables on SSD - Persistent storage (LZ4/ZSTD compressed)

Configuration:

storage:
  memtable_size_mb: 256
  block_cache_size_mb: 1024
  compression:
    default: lz4
    bottommost: zstd

Compression ✅

Status: Production-Ready | Docs: docs/performance/compression_benchmarks.md

Algorithms:

  • LZ4 - Balanced (33.8 MB/s write, 2.1x compression)
  • ZSTD - Space-optimized (32.3 MB/s write, 2.8x compression)
  • Snappy - Alternative option

Strategie:

  • LZ4 für upper levels (schneller)
  • ZSTD für bottommost level (besser komprimiert)

Parallelization ✅

Status: Production-Ready | Docs: docs/performance/TBB_INTEGRATION.md

Intel TBB Integration:

  • Task-Based Execution - Work-stealing scheduler
  • Batch Processing - Parallel entity loading (batch size: 50)
  • Index Scans - Parallel predicate evaluation
  • Throughput - 3.5x speedup on 8-core systems

GPU Acceleration ✅

Status: Production-Ready | Docs: docs/performance/GPU_ACCELERATION_PLAN.md

CUDA Backend:

  • ✅ Faiss GPU Integration
  • ✅ Vector distance computation (10-50x speedup)
  • ✅ Batch queries (50K-100K q/s)

Vulkan Backend:

  • ✅ Cross-platform GPU compute
  • ✅ Multi-vendor support (NVIDIA, AMD, Intel)
  • ✅ Compute shaders for vector operations

🌐 APIs & Clients

HTTP REST API ✅

Status: Production-Ready | Docs: docs/apis/openapi.md

Core Endpoints:

  • Entities: PUT/GET/DELETE /entities/{key}
  • Indexes: POST /index/create, POST /index/drop
  • Queries: POST /query (relational), POST /query/aql (AQL)
  • Graph: POST /graph/traverse
  • Vector: POST /vector/search
  • Transactions: POST /transaction/*
  • Admin: POST /admin/backup, GET /admin/stats
  • Monitoring: GET /health, GET /stats, GET /metrics

Content-Type:

  • application/json (primary)
  • application/x-velocypack (optional)

OpenAPI 3.0 Specification ✅

Status: Production-Ready | File: docs/openapi.yaml

  • Complete API documentation
  • Request/Response schemas
  • Authentication schemes
  • Error codes

GraphQL API ✅

Status: Production-Ready | Docs: docs/apis/graphql.md

  • GraphQL Parser - Query, Mutation, Subscription
  • Schema Introspection - SDL Export
  • Field Resolution - Nested selections
  • Built-in Types - Document, Graph, Vector, Timeseries
  • Error Handling - GraphQL spec compliant
  • HTTP Endpoint - POST /graphql

Client SDKs 🔧

Status: Alpha → Beta | Docs: clients/

JavaScript/TypeScript SDK:

  • 🔧 Basic HTTP wrapper (Alpha)
  • 📋 TypeScript definitions
  • 📋 Transaction support
  • 📋 Comprehensive tests

Python SDK:

  • 🔧 Basic HTTP wrapper (Alpha)
  • 📋 Type hints
  • 📋 Async/await support
  • 📋 PyPI package

Go SDK: 📋 Planned
Rust SDK: 📋 Planned
.NET SDK: 📋 Planned


🛠️ Content Processing

Content Architecture ✅

Status: Production-Ready | Docs: docs/architecture/content_architecture.md

Unified Ingestion Pipeline:

  • ContentTypeRegistry - MIME type detection
  • Processor Routing - Domain-specific handlers
  • Metadata Extraction - EXIF, GPS, Tags
  • Chunking - Configurable strategies

Processors:

  • Image Processor - EXIF, thumbnails, 3x3 tile chunking
  • Geo Processor - GeoJSON, GPX parsing & normalization
  • 📋 PDF Processor - Text extraction (planned)
  • 📋 Office Processor - DOCX, XLSX (planned)

API:

POST /content/import
{
  "content": {...},
  "chunks": [...],
  "edges": [...],
  "blob": "..."
}

Geo-Spatial Features ✅

Status: Production-Ready | Docs: docs/geo/

Capabilities:

  • R-Tree Index - Spatial search
  • Geohash - Location encoding
  • GeoJSON Support - Points, Lines, Polygons
  • GPX Processing - Track/Route parsing
  • Distance Queries - Radius search
  • Relational Schema - Geo tables integration

📈 Observability & Monitoring

Metrics & Statistics ✅

Status: Production-Ready | Docs: docs/observability/prometheus_metrics.md

Prometheus Metrics:

  • vccdb_requests_total (counter)
  • vccdb_errors_total (counter)
  • vccdb_qps (gauge)
  • rocksdb_block_cache_usage_bytes (gauge)
  • rocksdb_estimate_num_keys (gauge)
  • vccdb_page_fetch_time_ms_* (histogram)

RocksDB Statistics:

  • Block cache hit/miss rates
  • Compaction metrics
  • Memtable sizes
  • Files per level (L0-L6)

API:

GET /stats        # JSON format
GET /metrics      # Prometheus format

OpenTelemetry Tracing ✅

Status: Production-Ready

Features:

  • ✅ Distributed tracing
  • ✅ Span context propagation
  • ✅ Performance bottleneck detection
  • ✅ OTLP exporter integration

Logging ✅

Status: Production-Ready

spdlog Integration:

  • ✅ Structured logging
  • ✅ Log levels (TRACE, DEBUG, INFO, WARN, ERROR)
  • ✅ File rotation
  • ✅ Console + file outputs

🏗️ Deployment & Operations

Deployment Options ✅

Status: Production-Ready | Docs: docs/guides/deployment.md

Binary:

themis_server --config /etc/themis/config.yaml

Docker:

docker run -p 8765:8765 \
  -v /data:/data \
  ghcr.io/makr-code/themis:latest

Docker Compose:

docker compose up --build

Configuration Formats:

  • ✅ YAML (recommended)
  • ✅ JSON
  • ✅ Environment variables

Container Images ✅

Status: Production-Ready

Registries:

  • GHCR: ghcr.io/makr-code/themis
  • Docker Hub: themisdb/themis (optional)

Tags:

  • latest - Latest stable
  • g<shortsha> - Git commit
  • latest-x64-linux, latest-arm64-linux - Arch-specific

Multi-Arch:

  • ✅ x86_64 (AMD64)
  • ✅ ARM64 (aarch64)

Backup & Recovery ✅

Status: Production-Ready | Docs: docs/guides/deployment.md

Features:

  • RocksDB Checkpoints - Consistent snapshots
  • Point-in-Time Recovery - WAL archiving
  • Incremental Backups - Scripted automation
  • API Endpoint: POST /admin/backup

Scripts:

  • scripts/backup.sh (Linux)
  • scripts/backup.ps1 (Windows)

🧰 Admin Tools

WPF Admin Tools Suite ✅

Status: Production-Ready | Docs: docs/admin_tools/user_guide.md

Tools (7):

  1. Audit Log Viewer - Search, filter, export logs
  2. SAGA Verifier - Distributed transaction consistency
  3. PII Manager - GDPR data subject requests
  4. Key Rotation Dashboard - LEK/KEK/DEK management
  5. Retention Manager - Policy-based archival
  6. Classification Dashboard - Data classification testing
  7. Compliance Reports - Automated reporting

Common Features:

  • Unified Themis Design System
  • Dark/Light theme
  • Export (CSV, PDF, Excel)
  • Real-time search & filtering
  • Error handling & validation

Publish:

.\publish-all.ps1  # Build all tools to dist/

🔌 Plugin Architecture

Plugin System ✅

Status: Production-Ready | Docs: docs/plugins/PLUGIN_MIGRATION.md

Unified Interface:

  • IPlugin - Base interface
  • PluginManager - Discovery & loading
  • ✅ Security verification (signature checking)
  • ✅ Hot-reload support

Plugin Categories:

  1. Blob Storage - Filesystem, WebDAV, S3, Azure
  2. Compute - CUDA, Vulkan, DirectX
  3. 📋 Importers - PostgreSQL, MySQL, CSV
  4. 📋 Embeddings - Sentence-BERT, OpenAI, CLIP
  5. 📋 HSM - PKCS#11, Luna, CloudHSM

Benefits:

  • Modular binaries (Core < 50 MB)
  • On-demand loading
  • Third-party extensions
  • Reduced dependencies

🧪 Testing & Quality

Test Coverage ✅

Status: Production-Ready

Overall Coverage: 85%+

Test Suites:

  • Unit Tests - Core components (269 files tested)
  • Integration Tests - API endpoints, workflows
  • Performance Tests - Benchmarks (Google Benchmark)
  • Security Tests - Encryption, audit, HSM

Test Frameworks:

  • Google Test (C++)
  • Catch2 (alternative)
  • Custom test harnesses

Code Quality ✅

Status: Production-Ready | Docs: docs/development/code_audit_mockups_stubs.md

Static Analysis:

  • clang-tidy - Modern C++ best practices
  • cppcheck - Additional quality checks
  • Gitleaks - Secret scanning

Formatting:

  • clang-format - Consistent style
  • .clang-format config (C++20, 4 spaces)

CI/CD:

  • ✅ GitHub Actions (Linux + Windows)
  • ✅ Coverage reporting
  • ✅ Security scanning

Scripts:

./scripts/run_clang_quality_wsl.sh       # Linux/WSL
.\scripts\run_clang_quality.ps1          # Windows

📚 Documentation

Documentation Suite ✅

Status: Comprehensive | Location: docs/

Main Docs:

Categories:

  • Architecture - Design docs (base_entity, mvcc, content pipeline)
  • Features - Feature guides (32+ docs)
  • Security - Security architecture (10+ docs)
  • APIs - API references (OpenAPI, ContentFS, Hybrid Search)
  • Admin Tools - Tool guides & demos
  • Performance - Tuning & benchmarks
  • Development - Dev guides, audits

Build Docs:

.\build-docs.ps1      # Generate site/
.\sync-wiki.ps1       # Sync to Wiki

🎯 Performance Benchmarks

Typical Results ✅

Platform: Windows 11, i7-12700K, Release build

Operation Throughput Latency (p50) Latency (p99)
Entity PUT 45,000 ops/s 0.02 ms 0.15 ms
Entity GET 120,000 ops/s 0.008 ms 0.05 ms
Indexed Query 8,500 queries/s 0.12 ms 0.85 ms
Graph Traverse (depth=3) 3,200 ops/s 0.31 ms 1.2 ms
Vector ANN (k=10) 1,800 queries/s 0.55 ms 2.1 ms
Index Rebuild (100K) 12,000 entities/s - -

Compression Performance ✅

Algorithm Write Throughput Compression Ratio Use Case
None 34.5 MB/s 1.0x Development only
LZ4 33.8 MB/s 2.1x Default (balanced)
ZSTD 32.3 MB/s 2.8x Bottommost (storage)

🗺️ Roadmap

Q1 2026 (0-3 Monate)

Focus: Ecosystem & SDKs

  • v1.0.0 Production Release - Alle P0/P1 Features komplett
  • GPU Acceleration (CUDA/Vulkan) - 10-50x Vector speedup
  • Multi-Tenancy - Complete tenant isolation
  • GraphQL API - Full GraphQL server
  • OLAP Analytics - CUBE, ROLLUP, Window Functions
  • 🔧 JavaScript/Python SDK - Production-ready v1.0
  • 🔧 Content Processors - PDF, Office support
  • 🔧 CI/CD Improvements - Matrix builds, security scanning

Q2-Q3 2026 (3-9 Monate)

Focus: Distributed Systems

  • Distributed Sharding (Phase 1-3) - Auto-Rebalancing with Load Detection
  • 📋 Replication - Leader-Follower, Multi-Master
  • 📋 Multi-DC Deployment - Geo-distributed clusters
  • 📋 Advanced Graph Algorithms - PageRank, Community Detection
  • 📋 Streaming Analytics - Complex Event Processing

Q4 2026+ (9+ Monate)

Focus: Innovation

  • 📋 Multi-DC Replication - Geo-distributed
  • 📋 Kubernetes Operator - Cloud-native deployment
  • 📋 ML Integration - GNNs, in-database training
  • 📋 Real-Time Streaming - Complex Event Processing
  • 📋 Advanced Analytics - Graph algorithms, forecasting

Siehe auch: ROADMAP.md für Details


🏆 Production-Ready Status

P0 Features (Kritisch) ✅

Status: 100% Complete

  • ✅ ACID Transactions (MVCC)
  • ✅ Multi-Model Support (Relational, Graph, Vector, Document)
  • ✅ Secondary Indexes (7 types)
  • ✅ HNSW Persistence
  • ✅ Graph Traversals (BFS, Dijkstra, A*)
  • ✅ AQL Query Language
  • ✅ Enterprise Security (TLS, RBAC, Encryption, Audit)
  • ✅ Observability (Metrics, Tracing, Logging)
  • ✅ Backup & Recovery

Overall Progress

Current Status: ~85% Production-Ready

  • Core Engine: 100%
  • Security Stack: 85%
  • API Layer: 95%
  • Documentation: 90%
  • Client SDKs: 40% (Alpha)
  • GPU Acceleration: 10% (Planned)

📦 Dependencies

Core Libraries (vcpkg)

Storage & Performance:

  • RocksDB - LSM-Tree storage
  • Intel TBB - Parallelization
  • Apache Arrow - Columnar analytics

Serialization & Parsing:

  • simdjson - High-performance JSON
  • VelocyPack - Binary serialization
  • msgpack - Alternative serialization

Vector Search:

  • HNSWlib - ANN index
  • Faiss - GPU-accelerated search (optional)

Networking:

  • Boost.Asio - Async I/O
  • Boost.Beast - HTTP server
  • libcurl - HTTP client (WebDAV, etc.)

Security:

  • OpenSSL - TLS, encryption, PKI
  • PKCS#11 - HSM integration

Utilities:

  • spdlog - Logging
  • yaml-cpp - YAML parsing
  • nlohmann/json - JSON library

Testing:

  • Google Test - Unit tests
  • Google Benchmark - Performance tests

🔗 Referenzen

Inspired by:

  • ArangoDB (Multi-model architecture)
  • CozoDB (Hybrid relational-graph-vector)
  • Azure Cosmos DB (Multi-model with ARS format)
  • RocksDB (LSM-Tree foundation)
  • Faiss (Vector search)

Academic Foundations:

  • MVCC (PostgreSQL/Oracle design)
  • LSM-Tree (Google Bigtable, LevelDB)
  • HNSW (Malkov & Yashunin 2018)

📞 Support & Community

Repository: https://github.com/makr-code/ThemisDB
Issues: https://github.com/makr-code/ThemisDB/issues
Discussions: https://github.com/makr-code/ThemisDB/discussions
Wiki: https://github.com/makr-code/ThemisDB/wiki

Documentation:


📄 Lizenz

MIT License - See LICENSE file for details


Stand: November 2025
Version: 1.0
Letzte Aktualisierung: 21. November 2025

ThemisDB Wiki

Getting Started

SDKs and Clients

Query Language (AQL)

Search and Retrieval

Storage and Indexes

Security and Compliance

Enterprise Features

Performance and Optimization

Features and Capabilities

Geo and Spatial

Content and Ingestion

Sharding and Scaling

APIs and Integration

Admin Tools

Observability

Development

Architecture

Deployment and Operations

Exporters and Integrations

Reports and Status

Compliance and Governance

Testing and Quality

Source Code Documentation

Reference


Updated: 2025-11-30

Clone this wiki locally