olap_analytics

OLAP Analytics

Status: ✅ Implementiert
Version: 1.0
Datum: 30. November 2025

Übersicht

ThemisDB unterstützt fortgeschrittene OLAP (Online Analytical Processing) Features für Business Intelligence und Datenanalyse.

Features

✅ Implementiert

Aggregations-Funktionen
- COUNT, SUM, AVG, MIN, MAX
- STDDEV, VARIANCE
- MEDIAN, PERCENTILE
- COUNT_DISTINCT
- FIRST, LAST
Grouping Operators
- Simple GROUP BY
- CUBE (alle Kombinationen)
- ROLLUP (hierarchisch)
- GROUPING SETS (benutzerdefiniert)
Window Functions
- PARTITION BY
- ORDER BY
- ROWS PRECEDING/FOLLOWING
Columnar Store
- Spaltenorientierte Speicherung
- Vektorisierte Aggregationen
- Column Statistics
Materialized Views
- Pre-computed Aggregations
- Manual/Periodic Refresh
- Incremental Updates (geplant)

Verwendung

OLAPQuery erstellen

#include "analytics/olap.h"

using namespace themis::analytics;

OLAPQuery query;
query.collection = "sales";

// Dimensionen
query.dimensions.push_back({"region", "", true});
query.dimensions.push_back({"product", "", true});

// Measures
query.measures.push_back({"total_sales", "amount", Measure::Function::Sum});
query.measures.push_back({"avg_sales", "amount", Measure::Function::Avg});
query.measures.push_back({"order_count", "id", Measure::Function::Count});

// Filter
Filter filter;
filter.field = "year";
filter.op = Filter::Operator::Eq;
filter.value = int64_t(2024);
query.filters.push_back(filter);

// Sortierung
query.sorts.push_back({"total_sales", false, false});  // DESC

// Pagination
query.limit = 100;
query.offset = 0;

Query ausführen

OLAPEngine engine;
auto result = engine.execute(query);

std::cout << "Rows: " << result.total_rows << std::endl;
std::cout << "Execution time: " << result.execution_time_ms << " ms" << std::endl;

for (const auto& row : result.rows) {
    auto region = std::get<std::string>(row.values.at("region"));
    auto total = std::get<double>(row.values.at("total_sales"));
    std::cout << region << ": " << total << std::endl;
}

CUBE Query

CUBE generiert alle möglichen Gruppierungskombinationen:

OLAPQuery query;
query.collection = "sales";
query.grouping_mode = OLAPQuery::GroupingMode::Cube;

query.dimensions.push_back({"region", "", true});
query.dimensions.push_back({"product", "", true});
query.dimensions.push_back({"year", "", true});

query.measures.push_back({"total", "amount", Measure::Function::Sum});

// Generiert:
// - (region, product, year) - Detail
// - (region, product)       - year aggregiert
// - (region, year)          - product aggregiert
// - (product, year)         - region aggregiert
// - (region)                - product, year aggregiert
// - (product)               - region, year aggregiert
// - (year)                  - region, product aggregiert
// - ()                      - Grand Total

auto cells = engine.executeCube("sales", query.dimensions, query.measures);

for (const auto& cell : cells) {
    std::cout << "Grouping ID: " << cell.grouping_id << std::endl;
    for (const auto& [dim, value] : cell.dimensions) {
        if (value) {
            std::cout << "  " << dim << ": " << *value << std::endl;
        } else {
            std::cout << "  " << dim << ": (ALL)" << std::endl;
        }
    }
    std::cout << "  Total: " << cell.measures.at("total") << std::endl;
}

ROLLUP Query

ROLLUP generiert hierarchische Aggregationen:

OLAPQuery query;
query.collection = "sales";
query.grouping_mode = OLAPQuery::GroupingMode::Rollup;

// Hierarchie: Jahr > Quartal > Monat
query.dimensions.push_back({"year", "", true});
query.dimensions.push_back({"quarter", "", true});
query.dimensions.push_back({"month", "", true});

query.measures.push_back({"total", "amount", Measure::Function::Sum});

// Generiert:
// - (year, quarter, month) - Detail
// - (year, quarter)        - Monatssummen
// - (year)                 - Quartalssummen
// - ()                     - Grand Total

auto rows = engine.executeRollup("sales", query.dimensions, query.measures);

for (const auto& row : rows) {
    std::cout << "Level: " << row.level << std::endl;
    // Level 0 = Detail, höhere Level = Subtotals
}

GROUPING SETS

Benutzerdefinierte Gruppierungssätze:

OLAPQuery query;
query.collection = "sales";
query.grouping_mode = OLAPQuery::GroupingMode::GroupingSets;

query.dimensions.push_back({"region", "", true});
query.dimensions.push_back({"product", "", true});
query.dimensions.push_back({"year", "", true});

// Spezifische Kombinationen
query.grouping_sets.push_back({{"region", "product"}});
query.grouping_sets.push_back({{"region", "year"}});
query.grouping_sets.push_back({{"product"}});

auto result = engine.execute(query);

Window Functions

OLAPQuery::WindowSpec window;
window.name = "rolling_avg";
window.partition_by = {"region"};
window.order_by.push_back({"date", true, false});  // ASC
window.rows_preceding = 2;  // 3-Tage gleitender Durchschnitt
window.rows_following = 0;

query.windows.push_back(window);

std::vector<Measure> measures;
measures.push_back({"avg_sales", "amount", Measure::Function::Avg});

std::vector<std::unordered_map<std::string, double>> data = {
    {{"region", 1.0}, {"date", 1.0}, {"amount", 100.0}},
    {{"region", 1.0}, {"date", 2.0}, {"amount", 150.0}},
    {{"region", 1.0}, {"date", 3.0}, {"amount", 200.0}},
    // ...
};

auto windowResults = engine.evaluateWindowFunctions(data, measures, window);

Query Plan

auto plan = engine.explain(query);

std::cout << "Estimated rows: " << plan.estimated_rows << std::endl;
std::cout << "Estimated cost: " << plan.estimated_cost << std::endl;

for (const auto& note : plan.optimization_notes) {
    std::cout << "- " << note << std::endl;
}

Columnar Store

Grundlagen

ColumnarStore store;

// Spalten erstellen
store.createColumn("id", "string");
store.createColumn("region", "string");
store.createColumn("amount", "double");

// Daten einfügen
using Value = std::variant<std::nullptr_t, bool, int64_t, double, std::string>;

std::vector<std::unordered_map<std::string, Value>> rows = {
    {{"id", std::string("1")}, {"region", std::string("North")}, {"amount", 100.0}},
    {{"id", std::string("2")}, {"region", std::string("South")}, {"amount", 200.0}},
    {{"id", std::string("3")}, {"region", std::string("North")}, {"amount", 150.0}}
};

store.appendRows(rows);

// Aggregationen
std::cout << "Sum: " << store.sum("amount") << std::endl;
std::cout << "Avg: " << store.avg("amount") << std::endl;
std::cout << "Min: " << store.min("amount") << std::endl;
std::cout << "Max: " << store.max("amount") << std::endl;
std::cout << "Count: " << store.count("amount") << std::endl;
std::cout << "Distinct: " << store.countDistinct("region") << std::endl;

Gefilterte Aggregation

std::vector<bool> mask = {true, false, true};  // Include rows 0 and 2
double sum = store.sumWhere("amount", mask);   // 100 + 150 = 250

Column Statistics

auto stats = store.getColumnStats("amount");

std::cout << "Rows: " << stats.row_count << std::endl;
std::cout << "Nulls: " << stats.null_count << std::endl;
std::cout << "Distinct: " << stats.distinct_count << std::endl;
std::cout << "Min: " << *stats.min_value << std::endl;
std::cout << "Max: " << *stats.max_value << std::endl;
std::cout << "Avg: " << stats.avg_value << std::endl;

Materialized Views

View erstellen

MaterializedView::Definition def;
def.name = "sales_by_region";
def.source_collection = "sales";
def.dimensions.push_back({"region", "", true});
def.measures.push_back({"total_sales", "amount", Measure::Function::Sum});
def.measures.push_back({"order_count", "id", Measure::Function::Count});
def.refresh_mode = MaterializedView::Definition::RefreshMode::Periodic;
def.refresh_interval_seconds = 3600;  // Stündlich

MaterializedView view(def);
view.refresh();  // Erste Aktualisierung

View abfragen

std::vector<Filter> filters;
Filter f;
f.field = "region";
f.op = Filter::Operator::Eq;
f.value = std::string("North");
filters.push_back(f);

auto result = view.query(filters, {}, 10);

Status prüfen

if (view.isStale()) {
    view.refresh();
}

std::cout << "Last refresh: " 
          << std::chrono::system_clock::to_time_t(view.lastRefreshTime())
          << std::endl;
std::cout << "Row count: " << view.rowCount() << std::endl;

Aggregations-Funktionen

Funktion	Beschreibung
COUNT	Anzahl der Werte
SUM	Summe aller Werte
AVG	Durchschnitt
MIN	Minimum
MAX	Maximum
STDDEV	Standardabweichung
VARIANCE	Varianz
MEDIAN	Median (50. Perzentil)
PERCENTILE	Beliebiges Perzentil
COUNT_DISTINCT	Anzahl eindeutiger Werte
FIRST	Erster Wert
LAST	Letzter Wert

Percentile Beispiel

Measure m;
m.name = "p95_latency";
m.field = "latency";
m.function = Measure::Function::Percentile;
m.percentile_value = 95.0;  // 95. Perzentil

Filter-Operatoren

Operator	Beschreibung
Eq	Gleichheit (=)
Ne	Ungleichheit (!=)
Lt	Kleiner als (<)
Le	Kleiner gleich (<=)
Gt	Größer als (>)
Ge	Größer gleich (>=)
In	Enthält in Liste
NotIn	Nicht in Liste
Contains	String enthält
StartsWith	String beginnt mit
EndsWith	String endet mit
IsNull	Ist NULL
IsNotNull	Ist nicht NULL
Between	Zwischen zwei Werten

Performance-Tipps

1. Index-Nutzung

Filter auf indizierte Spalten verwenden
Sortierung nach indizierten Spalten

2. Materialized Views

Häufig verwendete Aggregationen vorberechnen
Refresh-Intervall an Aktualitätsanforderungen anpassen

3. Columnar Store

Für große Datasets verwenden
Vektorisierte Operationen nutzen

4. Query-Optimierung

EXPLAIN zur Plananalyse
Limit für Ergebnismengen
Selektive Filter verwenden

Limitationen

Keine echte Columnar-Persistenz (in-memory)
Keine automatische View-Auswahl
Keine parallele Aggregation (single-threaded)

Roadmap

Letzte Aktualisierung: 30. November 2025
Maintainer: ThemisDB Team

ThemisDB Documentation - auto-synced from /docs on 2025-11-30

ThemisDB Wiki

Getting Started

SDKs and Clients

Query Language (AQL)

Search and Retrieval

Storage and Indexes

Security and Compliance

Enterprise Features

Performance and Optimization

Features and Capabilities

Geo and Spatial

Content and Ingestion

Sharding and Scaling

APIs and Integration

Admin Tools

Observability

Development

Architecture

Deployment and Operations

Exporters and Integrations

Reports and Status

Compliance and Governance

Testing and Quality

Source Code Documentation

Reference

Updated: 2025-11-30

olap_analytics

OLAP Analytics

Übersicht

Features

✅ Implementiert

Verwendung

OLAPQuery erstellen

Query ausführen

CUBE Query

ROLLUP Query

GROUPING SETS

Window Functions

Query Plan

Columnar Store

Grundlagen

Gefilterte Aggregation

Column Statistics

Materialized Views

View erstellen

View abfragen

Status prüfen

Aggregations-Funktionen

Percentile Beispiel

Filter-Operatoren

Performance-Tipps

1. Index-Nutzung

2. Materialized Views

3. Columnar Store

4. Query-Optimierung

Limitationen

Roadmap

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!