-
Notifications
You must be signed in to change notification settings - Fork 0
query_engine
Version: 2.0
Status: Implementiert (MVP)
Letzte Aktualisierung: 2. November 2025
Das Query Engine & AQL-System von THEMIS besteht aus mehreren Komponenten, die zusammen eine effiziente Ausführung von Multi-Modell-Queries ermöglichen:
- AQL (Advanced Query Language) – SQL-ähnliche deklarative Query-Sprache
- AQL Parser – Lexer & Parser für AQL → AST
- AQL Translator – AST → Ausführungspläne (ConjunctiveQuery, JoinQuery, TraversalQuery)
- Query Optimizer – Kardinalitätsschätzung & Index-Auswahl
- Query Engine – Ausführung mit Index-/Full-Scan-Support
- ✅ Einfach: SQL-ähnliche Syntax (FOR, FILTER, SORT, LIMIT, RETURN)
- ✅ Mächtig: Multi-Modell-Support (Relational, Graph, Vector)
- ✅ Optimierbar: Automatische Index-Auswahl via Optimizer
- ✅ Erweiterbar: Schrittweise Features (LET, COLLECT, Joins)
FOR variable IN collection
[LET var = expression [, ...]]
[FILTER condition]
[SORT expression [ASC|DESC] [, ...]]
[LIMIT offset, count]
[RETURN expression]
Execution-Reihenfolge:
-
FOR– Iteration über Collection/Index -
FILTER– Prädikat-Evaluation (mit Index-Nutzung) -
SORT– Sortierung (mit Range-Index wenn möglich) -
LIMIT– Pagination/Offset -
RETURN– Projektion
| Typ | FOR-Klauseln | Features | Beispiel |
|---|---|---|---|
| Relational | 1 | FILTER, SORT, LIMIT, RETURN | FOR u IN users FILTER u.age > 18 |
| Join | 2+ | Multi-FOR, JOIN-Bedingungen | FOR u IN users FOR o IN orders FILTER o.user_id == u._key |
| Graph Traversal | 1 (speziell) | BFS, Depth-Limits, FILTER auf v/e | FOR v IN 1..3 OUTBOUND 'user1' GRAPH 'social' |
| Vector Search | 1 | NEAR(), k-NN | FOR doc IN articles NEAR(doc.embedding, @vec, 10) |
| Aggregation | 1 | COLLECT, AGGREGATE (SUM/AVG/etc.) | FOR sale IN sales COLLECT cat = sale.category AGGREGATE total = SUM(sale.amount) |
class AQLParser {
public:
struct ParseResult {
bool success;
std::string error_message;
std::shared_ptr<Query> ast; // Root AST Node
};
ParseResult parse(std::string_view query);
};Node-Typen:
enum class ASTNodeType {
// Query Nodes
Query, // Root
ForNode, // FOR variable IN collection
FilterNode, // FILTER condition
SortNode, // SORT expr [ASC|DESC]
LimitNode, // LIMIT offset, count
ReturnNode, // RETURN expression
LetNode, // LET variable = expression
CollectNode, // COLLECT ... AGGREGATE ...
// Expressions
BinaryOp, // ==, !=, >, <, AND, OR, +, -, *, /
UnaryOp, // NOT, -
FunctionCall, // CONCAT, SUM, etc.
FieldAccess, // doc.field
Literal, // "string", 123, true, null
Variable, // doc, user
ArrayLiteral, // [1, 2, 3]
ObjectConstruct // {name: doc.name, age: doc.age}
};Beispiel-AST:
FOR u IN users
FILTER u.age > 18 AND u.city == "Berlin"
RETURN u.name
→ AST:
{
"type": "Query",
"children": [
{
"type": "ForNode",
"variable": "u",
"collection": "users"
},
{
"type": "FilterNode",
"condition": {
"type": "BinaryOp",
"operator": "AND",
"left": {
"type": "BinaryOp",
"operator": ">",
"left": {"type": "FieldAccess", "variable": "u", "field": "age"},
"right": {"type": "Literal", "value": 18}
},
"right": {
"type": "BinaryOp",
"operator": "==",
"left": {"type": "FieldAccess", "variable": "u", "field": "city"},
"right": {"type": "Literal", "value": "Berlin"}
}
}
},
{
"type": "ReturnNode",
"expression": {"type": "FieldAccess", "variable": "u", "field": "name"}
}
]
}Binary Operators:
enum class BinaryOperator {
// Comparison
Eq, Neq, Lt, Lte, Gt, Gte, // ==, !=, <, <=, >, >=
// Logical
And, Or, Xor, // AND, OR, XOR
// Arithmetic
Add, Sub, Mul, Div, Mod, // +, -, *, /, %
// Membership
In // IN
};Unary Operators:
enum class UnaryOperator {
Not, // NOT
Minus, // - (unary)
Plus // + (unary)
};Der Translator wandelt AST in ausführbare Query-Pläne um:
class AQLTranslator {
public:
struct TranslationResult {
bool success;
std::string error_message;
// Single-FOR: Relational Query
ConjunctiveQuery query;
// Multi-FOR: Join Query
std::optional<JoinQuery> join;
// Graph: Traversal Query
std::optional<TraversalQuery> traversal;
};
TranslationResult translate(std::shared_ptr<Query> ast);
};Eingabe:
FOR u IN users
FILTER u.age > 18 AND u.city == "Berlin"
SORT u.created_at DESC
LIMIT 10
RETURN u
Ausgabe (ConjunctiveQuery):
ConjunctiveQuery {
table: "users",
predicates: [
{column: "city", op: Eq, value: "Berlin"}
],
rangePredicates: [
{column: "age", lower: "18", includeLower: false, op: Gt}
],
orderBy: {
column: "created_at",
desc: true,
limit: 10
}
}Eingabe:
FOR u IN users
FOR o IN orders
FILTER o.user_id == u._key
RETURN {user: u.name, order: o.id}
Ausgabe (JoinQuery):
JoinQuery {
for_nodes: [
{variable: "u", collection: "users"},
{variable: "o", collection: "orders"}
],
filters: [
{op: Eq, left: "o.user_id", right: "u._key"} // Join-Bedingung
],
return_node: ObjectConstruct{...}
}Eingabe:
FOR v, e, p IN 1..3 OUTBOUND 'user1' GRAPH 'social'
FILTER v.age > 18
RETURN v
Ausgabe (TraversalQuery):
TraversalQuery {
variable: "v",
minDepth: 1,
maxDepth: 3,
direction: Outbound,
startVertex: "user1",
graphName: "social",
filters: [{column: "age", op: Gt, value: "18"}]
}Der Optimizer schätzt Selektivitäten von Prädikaten und ordnet sie optimal:
class QueryOptimizer {
public:
struct Estimation {
PredicateEq pred;
size_t estimatedCount;
bool capped; // true wenn >= maxProbe
};
struct Plan {
std::vector<PredicateEq> orderedPredicates; // Sortiert nach Selektivität
std::vector<Estimation> details;
};
Plan chooseOrderForAndQuery(const ConjunctiveQuery& q, size_t maxProbe = 1000);
};Beispiel:
FOR u IN users
FILTER u.age > 18 AND u.city == "Berlin"
Schätzung:
-
city == "Berlin": ~100 Treffer (selektiv!) -
age > 18: ~5000 Treffer (weniger selektiv)
Optimaler Plan:
- Scan
city == "Berlin"→ 100 Keys - Für jeden Key: Check
age > 18→ ~80 finale Treffer
Vorteil: Nur 100 Entity-Loads statt 5000!
Verfügbare Strategien:
enum class QueryMode {
IndexOptimized, // Optimizer-gesteuert (Kardinalitätsschätzung)
IndexParallel, // Parallele Scans + AND-Merge (für kleine Datasets)
FullScanFallback, // Sequential Scan (nur mit allow_full_scan=true)
IndexRangeAware // Range-Index + Sortierung direkt
};Beispiel-Plan (EXPLAIN):
{
"plan": {
"mode": "index_optimized",
"order": [
{"column": "city", "value": "Berlin"},
{"column": "age", "value": "18"}
],
"estimates": [
{"column": "city", "value": "Berlin", "estimatedCount": 100, "capped": false},
{"column": "age", "value": "18", "estimatedCount": 5000, "capped": false}
]
}
}class QueryEngine {
public:
struct Status {
bool ok;
std::string message;
};
// Relational Query (Single-FOR)
std::pair<Status, std::vector<std::string>>
executeConjunctiveKeys(const ConjunctiveQuery& q);
std::pair<Status, std::vector<BaseEntity>>
executeConjunctiveEntities(const ConjunctiveQuery& q);
// Join Query (Multi-FOR)
std::pair<Status, std::vector<nlohmann::json>>
executeJoin(const JoinQuery& join);
// Graph Traversal (BFS)
std::pair<Status, std::vector<BaseEntity>>
executeTraversal(const TraversalQuery& trav);
};Schritt-für-Schritt:
- Optimizer: Schätze Selektivitäten → Sortiere Prädikate
- Index-Scan: Starte mit selektivstem Prädikat
- Filter-Chain: Wende weitere Prädikate an
- Sort/Limit: Nutze Range-Index wenn möglich
- Return: Projiziere Felder
Code-Flow (vereinfacht):
auto [status, keys] = idx.scanKeysEqual("users", "city", "Berlin"); // 100 Keys
std::vector<std::string> filtered;
for (const auto& key : keys) {
auto entity = loadEntity(key);
if (entity.getFieldAsInt("age") > 18) { // Range-Filter
filtered.push_back(key);
}
}
// Sort/Limit...Algorithmus:
std::vector<nlohmann::json> results;
for (const auto& uKey : getUserKeys()) {
auto user = loadEntity("users", uKey);
for (const auto& oKey : getOrderKeys()) {
auto order = loadEntity("orders", oKey);
if (order.getField("user_id") == user.getField("_key")) { // Join-Bedingung
results.push_back({
{"user", user.getField("name")},
{"order", order.getField("id")}
});
}
}
}Performance-Hinweise:
⚠️ O(n×m) Komplexität (teuer bei großen Collections)- 💡 Nutze Indizes auf Join-Spalten (
user_id) - 💡 Geplant: Hash-Join für große Collections
BFS-Algorithmus mit Pruning:
std::queue<Node> frontier;
std::unordered_set<std::string> visited;
frontier.push({startVertex, depth: 0});
while (!frontier.empty()) {
auto node = frontier.front(); frontier.pop();
if (visited.count(node.pk)) continue;
visited.insert(node.pk);
if (node.depth >= minDepth) {
auto entity = loadEntity(node.pk);
if (evalFilters(entity)) { // v.age > 18
results.push_back(entity);
}
}
if (node.depth < maxDepth) {
auto neighbors = getNeighbors(node.pk, direction);
for (const auto& nb : neighbors) {
if (node.depth + 1 == maxDepth) {
// Konservatives Pruning am letzten Level
auto e = loadEntity(nb.pk);
if (!evalFilters(e)) {
pruned_last_level++;
continue;
}
}
frontier.push({nb.pk, node.depth + 1});
}
}
}Metriken (siehe EXPLAIN):
-
edges_expanded: Anzahl inspizierter Kanten -
pruned_last_level: Durch Filter gedroppt -
frontier_processed_per_depth: BFS-Expansion pro Level
HTTP Request:
POST /query/aql
Content-Type: application/json
{
"query": "FOR u IN users FILTER u.age > 18 AND u.city == 'Berlin' RETURN u",
"explain": true
}Response:
{
"query": "FOR u IN users FILTER u.age > 18 AND u.city == 'Berlin' RETURN u",
"ast": {...},
"plan": {
"mode": "index_optimized",
"order": [
{"column": "city", "value": "Berlin"},
{"column": "age", "value": "18"}
],
"estimates": [
{"column": "city", "value": "Berlin", "estimatedCount": 100, "capped": false},
{"column": "age", "value": "18", "estimatedCount": 5000, "capped": false}
]
}
}Graph-Query:
FOR v IN 1..3 OUTBOUND 'user1' GRAPH 'social'
FILTER v.age > 30
RETURN v
Metrics:
{
"metrics": {
"edges_expanded": 156,
"pruned_last_level": 23,
"filter_evaluations_total": 89,
"filter_short_circuits": 12,
"frontier_processed_per_depth": {
"0": 1,
"1": 5,
"2": 18,
"3": 65
}
}
}Interpretation:
- edges_expanded: 156 Kanten inspiziert (BFS-Kosten)
- pruned_last_level: 23 Vertices am letzten Level gedroppt (Filter wirkt!)
- filter_short_circuits: 12 AND-Short-Circuits (Effizienz)
Relational Query mit Cursor:
{
"query": "FOR u IN users SORT u.created_at DESC LIMIT 10 RETURN u",
"use_cursor": true
}Plan-Details:
{
"plan": {
"mode": "index_rangeaware",
"cursor": {
"used": true,
"cursor_present": false,
"sort_column": "created_at",
"effective_limit": 11,
"anchor_set": false,
"requested_count": 10
}
}
}Prometheus Metrics:
-
vccdb_cursor_anchor_hits_total: Cursor-Anker-Verwendungen -
vccdb_range_scan_steps_total: Besuchte Index-Einträge -
vccdb_page_fetch_time_ms_*: Seitenerzeugung-Dauer
-- ✅ RICHTIG: Selektive Filter zuerst
FOR u IN users
FILTER u.city == "SmallTown" AND u.age > 18 -- city sehr selektiv!
RETURN u
-- ❌ FALSCH: Unselektive Filter zuerst
FOR u IN users
FILTER u.age > 18 AND u.city == "SmallTown" -- age wenig selektiv
RETURN u
-- ✅ RICHTIG: Index auf age + city
CREATE INDEX idx_users_age ON users(age)
CREATE INDEX idx_users_city ON users(city)
FOR u IN users
FILTER u.age > 18 AND u.city == "Berlin" -- Beide Indizes genutzt!
RETURN u
-- ❌ FALSCH: Kein Index → Full Scan
FOR u IN users
FILTER u.random_field == "value" -- Kein Index!
RETURN u
-- ✅ RICHTIG: Filter vor Join
FOR u IN users
FILTER u.active == true -- Reduziert u-Set!
FOR o IN orders
FILTER o.user_id == u._key
RETURN {user: u.name, order: o.id}
-- ❌ FALSCH: Kein Filter → Großer Cross-Product
FOR u IN users
FOR o IN orders
FILTER o.user_id == u._key -- Erst nach Cross-Product!
RETURN {user: u.name, order: o.id}
-- ✅ RICHTIG: Depth begrenzen
FOR v IN 1..3 OUTBOUND 'user1' GRAPH 'social' -- Max 3 Hops
FILTER v.age > 30
RETURN v
-- ❌ FALSCH: Unbegrenzte Depth
FOR v IN 1..10 OUTBOUND 'user1' GRAPH 'social' -- Exponentielles Wachstum!
RETURN v
- ❌ OR-Support: Nur AND im Translator (OR in Arbeit)
- ❌ Feld-zu-Feld-Vergleiche:
u.city == o.citynicht generisch (nur in Join-Bedingungen) - ❌ Subqueries: Noch nicht implementiert
- ❌ Hash-Join: Nur Nested-Loop-Joins (O(n×m))
- ❌ Complex Functions: CONCAT, SUBSTRING in Entwicklung
- OR-Support: Disjunktive Prädikate
- Hash-Join: Für große Collections
- Subqueries: Nested Queries
- Window Functions: ROW_NUMBER, RANK
- CTEs (WITH): Common Table Expressions
- UPSERT: INSERT ... ON CONFLICT UPDATE
- AQL Syntax: aql_syntax.md
- EXPLAIN & PROFILE: aql_explain_profile.md
-
Parser:
include/query/aql_parser.h -
Translator:
include/query/aql_translator.h -
Optimizer:
include/query/query_optimizer.h -
Engine:
include/query/query_engine.h - Cursor Pagination: cursor_pagination.md
- Indexes: indexes.md
- AQL Overview
- AQL Syntax Reference
- EXPLAIN and PROFILE
- Hybrid Queries
- Pattern Matching
- Subquery Implementation
- Subquery Quick Reference
- Fulltext Release Notes
- Hybrid Search Design
- Fulltext Search API
- Content Search
- Pagination Benchmarks
- Stemming
- Hybrid Fusion API
- Performance Tuning
- Migration Guide
- Storage Overview
- RocksDB Layout
- Geo Schema
- Index Types
- Index Statistics
- Index Backup
- HNSW Persistence
- Vector Index
- Graph Index
- Secondary Index
- Security Overview
- RBAC and Authorization
- TLS Setup
- Certificate Pinning
- Encryption Strategy
- Column Encryption
- Key Management
- Key Rotation
- HSM Integration
- PKI Integration
- eIDAS Signatures
- PII Detection
- PII API
- Threat Model
- Hardening Guide
- Incident Response
- SBOM
- Enterprise Overview
- Scalability Features
- Scalability Strategy
- HTTP Client Pool
- Enterprise Build Guide
- Enterprise Ingestion
- Benchmarks Overview
- Compression Benchmarks
- Compression Strategy
- Memory Tuning
- Hardware Acceleration
- GPU Acceleration Plan
- CUDA Backend
- Vulkan Backend
- Multi-CPU Support
- TBB Integration
- Time Series
- Vector Operations
- Graph Features
- Temporal Graphs
- Path Constraints
- Recursive Queries
- Audit Logging
- Change Data Capture
- Transactions
- Semantic Cache
- Cursor Pagination
- Compliance Features
- GNN Embeddings
- Geo Overview
- Geo Architecture
- 3D Game Acceleration
- Geo Feature Tiering
- G3 Phase 2 Status
- G5 Implementation
- Integration Guide
- Content Architecture
- Content Pipeline
- Content Manager
- JSON Ingestion
- Content Ingestion
- Filesystem API
- Image Processor
- Geo Processor
- Policy Implementation
- Developer Guide
- Implementation Status
- Development Roadmap
- Build Strategy
- Build Acceleration
- Code Quality Guide
- AQL LET Implementation
- Audit API Implementation
- SAGA API Implementation
- PKI eIDAS
- WAL Archiving
- Architecture Overview
- Strategic Overview
- Ecosystem
- MVCC Design
- Base Entity
- Caching Strategy
- Caching Data Structures
- Docker Build
- Docker Status
- Multi-Arch CI/CD
- ARM Build Guide
- ARM Packages
- Raspberry Pi Tuning
- Packaging Guide
- Package Maintainers
- Roadmap
- Changelog
- Database Capabilities
- Implementation Summary
- Sachstandsbericht 2025
- Enterprise Final Report
- Test Report
- Build Success Report
- Integration Analysis
- Source Overview
- API Implementation
- Query Engine
- Storage Layer
- Security Implementation
- CDC Implementation
- Time Series
- Utils and Helpers
Updated: 2025-11-30