Skip to content
makr-code edited this page Nov 18, 2025 · 1 revision

text_processor.cpp

Path: src/content/text_processor.cpp

Purpose: Tokenization, normalization and simple text transformations used by content indexing and search.

Public functions / symbols:

  • ``
  • for (size_t i = 0; i < chunk_start_idx; i++) {
  • if (current_pos <= chunk_start_idx) {
  • while (iss >> token) {
  • for (int seed = 0; seed < 3; seed++) {
  • for (int dim_offset = 0; dim_offset < 10; dim_offset++) {
  • for (float val : embedding) {
  • for (float& val : embedding) {
  • if (auto pos = lang.find("text/x-"); pos != std::string::npos) {
  • std::vector<float> embedding(EMBEDDING_DIM, 0.0f);
  • std::istringstream iss(chunk_data);
  • std::regex multi_space(" +");

ThemisDB Wiki

Getting Started

SDKs and Clients

Query Language (AQL)

Search and Retrieval

Storage and Indexes

Security and Compliance

Enterprise Features

Performance and Optimization

Features and Capabilities

Geo and Spatial

Content and Ingestion

Sharding and Scaling

APIs and Integration

Admin Tools

Observability

Development

Architecture

Deployment and Operations

Exporters and Integrations

Reports and Status

Compliance and Governance

Testing and Quality

Source Code Documentation

Reference


Updated: 2025-11-30

Clone this wiki locally