Index MCP Server

A Model Context Protocol (MCP) server providing semantic search and vector database capabilities using Weaviate. This server enables AI assistants to index, search, and retrieve information from various content types using hybrid search (combining semantic and keyword matching).

Features

Hybrid Search: Combines semantic (vector) and keyword (BM25) search for optimal results
Multiple Content Types: Support for code, notes, documentation, images, and more
Flexible Configuration: YAML-based configuration for easy customization
RAG Support: Retrieval-Augmented Generation for AI-powered answers
Metadata & Filtering: Rich metadata support with powerful filtering capabilities

Installation

npm install
## Prerequisites

### Weaviate Setup

This server requires a running Weaviate instance.

#### Local Docker Instance (Recommended)

Create a `docker-compose.yml` file in your project directory:

```yaml
---
services:
  weaviate:
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.34.0
    ports:
      - 8080:8080
      - 50051:50051
    volumes:
      - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai'
      CLUSTER_HOSTNAME: 'node1'
volumes:
  weaviate_data:
...

Start Weaviate:

docker compose up -d

Verify it's running:

curl http://localhost:8080/v1/.well-known/ready

Note: Set your OPENAI_APIKEY environment variable before starting the Index MCP server, or add it to your .env file.

Learn More: Weaviate Local Quickstart

Weaviate Cloud

Alternatively, sign up for a free Weaviate Cloud instance at console.weaviate.cloud and update your config:

weaviate:
  url: "https://your-cluster.weaviate.network"
  apiKey: "your-weaviate-api-key"
  openaiApiKey: "sk-..."

Configuration

The server uses a YAML configuration file to define Weaviate instance properties, schema, and shared definitions. By default, it looks for config.yaml in the project root, but you can specify a custom path using the INDEX_CONFIG_PATH environment variable.

Configuration File Structure

server:
  name: "index-mcp-server"
  version: "0.1.0"
  description: "Semantic search and vector database MCP server"

weaviate:
  url: "http://localhost:8080"
  timeout: 30000
  retries: 3

schema:
  collectionName: "Document"
  vectorizer:
    type: "text2vec-openai"
    model: "text-embedding-3-small"
  properties:
    - name: "content"
      dataType: "TEXT"
    # ... more properties

shared:
  contentTypes:
    - "code"
    - "note"
    # ... more types
  
  metadataFields:
    - name: "contentType"
      type: "string"
      required: true
    # ... more fields
  
  filterFields:
    - name: "contentType"
      type: "array"
    # ... more filters

tools:
  - name: "search_index"
    enabled: true
  - name: "index_new"
    enabled: true
  # ... more tools

Configuration Sections

Server Configuration

Defines basic server metadata:

server:
  name: "index-mcp-server"
  version: "0.1.0"
  description: "Your server description"

Weaviate Configuration

Configure your Weaviate instance connection:

weaviate:
  url: "http://localhost:8080"  # Weaviate instance URL
  apiKey: "your-api-key"         # Optional: Weaviate API key
  timeout: 30000                 # Request timeout in milliseconds
  retries: 3                     # Number of retry attempts
  openaiApiKey: "sk-..."         # Optional: OpenAI API key for vectorization

Environment variables can override these settings:

WEAVIATE_URL: Override the Weaviate URL
WEAVIATE_API_KEY: Set the Weaviate API key
OPENAI_APIKEY: Set the OpenAI API key for vectorization

Schema Configuration

Define your Weaviate collection schema:

schema:
  collectionName: "Document"  # Name of the Weaviate collection
  
  vectorizer:
    type: "text2vec-openai"   # Vectorizer type
    model: "text-embedding-3-small"  # Model to use
    options: {}               # Additional vectorizer options
  
  properties:
    - name: "content"
      dataType: "TEXT"
      description: "Main content"
    - name: "tags"
      dataType: "TEXT_ARRAY"
      description: "Document tags"
    # Add more properties as needed

Supported vectorizer types:

text2vec-openai: OpenAI embeddings
text2vec-cohere: Cohere embeddings
text2vec-huggingface: HuggingFace embeddings
none: No vectorization

Supported data types:

TEXT: Text field
TEXT_ARRAY: Array of text values
DATE: Date/timestamp
NUMBER: Numeric value
INT: Integer value
BOOLEAN: Boolean value

Shared Definitions

Define reusable metadata and filter schemas that are automatically applied to relevant tools:

shared:
  # Content types available across all tools
  contentTypes:
    - "code"
    - "note"
    - "documentation"
    - "todo"
    # ... add your custom types
  
  # Priority levels
  priorities:
    - "low"
    - "medium"
    - "high"
  
  # Metadata fields for index_new and index_existing tools
  metadataFields:
    - name: "contentType"
      type: "string"
      description: "Type of content"
      required: true
      enum: []  # Uses contentTypes from above
    
    - name: "title"
      type: "string"
      description: "Document title"
      required: true
    
    - name: "tags"
      type: "array"
      description: "Document tags"
      required: true
    
    # Add more metadata fields as needed
  
  # Filter fields for search and query tools
  filterFields:
    - name: "contentType"
      type: "array"
      description: "Filter by content types"
    
    - name: "project"
      type: "string"
      description: "Filter by project name"
    
    - name: "dateRange"
      type: "object"
      description: "Filter by date range"
      properties:
        after: "ISO date string"
        before: "ISO date string"
    
    # Add more filter fields as needed

Key Benefits of Shared Definitions:

DRY Principle: Define metadata and filters once, use everywhere
Consistency: Ensures all tools use the same field definitions
Easy Customization: Add new content types or metadata fields in one place
Type Safety: Automatically generates proper JSON schemas for tools

Tool Configuration

Enable or disable specific tools:

tools:
  - name: "search_index"
    enabled: true
  
  - name: "index_new"
    enabled: true
  
  - name: "index_existing"
    enabled: true
  
  - name: "unindex"
    enabled: true
  
  - name: "find_similar_in_index"
    enabled: true
  
  - name: "ask_index"
    enabled: true

Note: Tool names and descriptions are hardcoded in the tool classes. The configuration only controls which tools are enabled/disabled.

Creating Custom Configurations

For different use cases, create separate configuration files:

# Development configuration
cp config.yaml config.dev.yaml

# Production configuration
cp config.yaml config.prod.yaml

# Custom use case
cp config.yaml config.custom.yaml

Then specify which config to use:

INDEX_CONFIG_PATH=./config.prod.yaml npm start

Example: Custom Content Types

To add custom content types for your specific use case:

shared:
  contentTypes:
    - "code"
    - "note"
    - "research-paper"      # Custom type
    - "meeting-notes"       # Custom type
    - "customer-feedback"   # Custom type
  
  metadataFields:
    - name: "contentType"
      type: "string"
      required: true
    
    # Add custom metadata for your types
    - name: "department"
      type: "string"
      description: "Department name"
    
    - name: "confidentiality"
      type: "string"
      description: "Confidentiality level"
      enum: ["public", "internal", "confidential"]

Available Tools

search_index

Hybrid search combining semantic and keyword matching.

index_new

Index new content with metadata.

index_existing

Update existing indexed content.

unindex

Remove content from the index.

find_similar_in_index

Find similar content based on a reference document.

ask_index

RAG-based question answering using indexed content.

Security Features

The Index MCP server includes comprehensive security filtering to prevent indexing sensitive data:

Blocked File Patterns

Environment files: .env, .env.*, .env-*
Secrets: secrets.yaml, credentials.json
Private keys: .pem, .key, .p12, id_rsa, etc.
API keys: api-keys.json, tokens.json
Cloud credentials: .aws/credentials, .kube/config
Git credentials: .git-credentials, .netrc

Blocked Directories

node_modules/, .git/, .vscode/, .idea/
dist/, build/, coverage/
.aws/, .ssh/, .gnupg/, .kube/
.terraform/, vendor/

Content Scanning

The system also scans file content for sensitive patterns:

API keys and tokens (long alphanumeric strings)
Passwords in configuration
Private key blocks
Certificate blocks

Note: Both index_new (with file paths) and bulk_index automatically apply these security filters.

Usage

The Index MCP server is started automatically by your MCP client (Kilo Code, Claude Desktop, etc.) based on the configuration in your MCP settings file. You don't need to manually start the server.

Configuration

The server reads configuration from config.yaml by default. To use a different config file, set the INDEX_CONFIG_PATH environment variable in your MCP settings:

{
  "mcpServers": {
    "index": {
      "env": {
        "INDEX_CONFIG_PATH": "/path/to/custom/config.yaml"
      }
    }
  }
}

Development

# Watch mode
npm run watch

# Build
npm run build

# Lint
npm run lint

Environment Variables

INDEX_CONFIG_PATH: Path to configuration file (default: ./config.yaml)
WEAVIATE_URL: Weaviate instance URL (overrides config)
WEAVIATE_API_KEY: Weaviate API key (overrides config)
WEAVIATE_TIMEOUT: Request timeout in milliseconds
WEAVIATE_RETRIES: Number of retry attempts

MCP Settings Configuration

To use this server with Kilo Code or other MCP clients, add it to your MCP settings file:

Kilo Code Configuration

Add to kilo-code/settings/mcp_settings.json:

{
  "mcpServers": {
    "index": {
      "command": "node",
      "args": [
        "/path/to/your/index/dist/server.js"
      ],
      "env": {
        "WEAVIATE_URL": "http://localhost:8080",
        "OPENAI_APIKEY": "sk-...",
        "INDEX_CONFIG_PATH": "/path/to/your/index/config.yaml"
      },
      "alwaysAllow": [
        "search_index",
        "index_new",
        "index_existing",
        "unindex",
        "find_similar_in_index",
        "ask_index",
        "bulk_index"
      ]
    }
  }
}

Claude Desktop Configuration

Add to Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "index": {
      "command": "node",
      "args": ["/path/to/your/index/dist/server.js"],
      "env": {
        "WEAVIATE_URL": "http://localhost:8080",
        "OPENAI_APIKEY": "sk-...",
        "INDEX_CONFIG_PATH": "/path/to/your/index/config.yaml"
      }
    }
  }
}

Important:

Replace /path/to/your/index with the actual path to this project
Set your OpenAI API key in the OPENAI_APIKEY environment variable
The alwaysAllow list (Kilo Code only) pre-approves tool usage for convenience
OPENAI_APIKEY: OpenAI API key for vectorization

Architecture

The server uses a modular architecture:

Config Loader: Loads and validates YAML configuration
Schema Builder: Generates tool schemas from shared definitions
Tool Factory: Creates tool instances based on configuration
Weaviate Client: Manages Weaviate connection and operations

This design allows for:

Easy customization through YAML config
Reusable metadata and filter definitions
Dynamic tool generation
Multiple use case support

License

MIT

Author

Patrick Michaelsen

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
README.md		README.md
config.example.yaml		config.example.yaml
esbuild.build.js		esbuild.build.js
esbuild.watch.js		esbuild.watch.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

prmichaelsen/index

Folders and files

Latest commit

History

Repository files navigation

Index MCP Server

Features

Installation

Weaviate Cloud

Configuration

Configuration File Structure

Configuration Sections

Server Configuration

Weaviate Configuration

Schema Configuration

Shared Definitions

Tool Configuration

Creating Custom Configurations

Example: Custom Content Types

Available Tools

search_index

index_new

index_existing

unindex

find_similar_in_index

ask_index

Security Features

Blocked File Patterns

Blocked Directories

Content Scanning

Usage

Configuration

Development

Environment Variables

MCP Settings Configuration

Kilo Code Configuration

Claude Desktop Configuration

Architecture

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages