A Model Context Protocol (MCP) server providing semantic search and vector database capabilities using Weaviate. This server enables AI assistants to index, search, and retrieve information from various content types using hybrid search (combining semantic and keyword matching).
- Hybrid Search: Combines semantic (vector) and keyword (BM25) search for optimal results
- Multiple Content Types: Support for code, notes, documentation, images, and more
- Flexible Configuration: YAML-based configuration for easy customization
- RAG Support: Retrieval-Augmented Generation for AI-powered answers
- Metadata & Filtering: Rich metadata support with powerful filtering capabilities
npm install
## Prerequisites
### Weaviate Setup
This server requires a running Weaviate instance.
#### Local Docker Instance (Recommended)
Create a `docker-compose.yml` file in your project directory:
```yaml
---
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.34.0
ports:
- 8080:8080
- 50051:50051
volumes:
- weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
ENABLE_MODULES: 'text2vec-openai'
CLUSTER_HOSTNAME: 'node1'
volumes:
weaviate_data:
...Start Weaviate:
docker compose up -dVerify it's running:
curl http://localhost:8080/v1/.well-known/readyNote: Set your OPENAI_APIKEY environment variable before starting the Index MCP server, or add it to your .env file.
Learn More: Weaviate Local Quickstart
Alternatively, sign up for a free Weaviate Cloud instance at console.weaviate.cloud and update your config:
weaviate:
url: "https://your-cluster.weaviate.network"
apiKey: "your-weaviate-api-key"
openaiApiKey: "sk-..."The server uses a YAML configuration file to define Weaviate instance properties, schema, and shared definitions. By default, it looks for config.yaml in the project root, but you can specify a custom path using the INDEX_CONFIG_PATH environment variable.
server:
name: "index-mcp-server"
version: "0.1.0"
description: "Semantic search and vector database MCP server"
weaviate:
url: "http://localhost:8080"
timeout: 30000
retries: 3
schema:
collectionName: "Document"
vectorizer:
type: "text2vec-openai"
model: "text-embedding-3-small"
properties:
- name: "content"
dataType: "TEXT"
# ... more properties
shared:
contentTypes:
- "code"
- "note"
# ... more types
metadataFields:
- name: "contentType"
type: "string"
required: true
# ... more fields
filterFields:
- name: "contentType"
type: "array"
# ... more filters
tools:
- name: "search_index"
enabled: true
- name: "index_new"
enabled: true
# ... more toolsDefines basic server metadata:
server:
name: "index-mcp-server"
version: "0.1.0"
description: "Your server description"Configure your Weaviate instance connection:
weaviate:
url: "http://localhost:8080" # Weaviate instance URL
apiKey: "your-api-key" # Optional: Weaviate API key
timeout: 30000 # Request timeout in milliseconds
retries: 3 # Number of retry attempts
openaiApiKey: "sk-..." # Optional: OpenAI API key for vectorizationEnvironment variables can override these settings:
WEAVIATE_URL: Override the Weaviate URLWEAVIATE_API_KEY: Set the Weaviate API keyOPENAI_APIKEY: Set the OpenAI API key for vectorization
Define your Weaviate collection schema:
schema:
collectionName: "Document" # Name of the Weaviate collection
vectorizer:
type: "text2vec-openai" # Vectorizer type
model: "text-embedding-3-small" # Model to use
options: {} # Additional vectorizer options
properties:
- name: "content"
dataType: "TEXT"
description: "Main content"
- name: "tags"
dataType: "TEXT_ARRAY"
description: "Document tags"
# Add more properties as neededSupported vectorizer types:
text2vec-openai: OpenAI embeddingstext2vec-cohere: Cohere embeddingstext2vec-huggingface: HuggingFace embeddingsnone: No vectorization
Supported data types:
TEXT: Text fieldTEXT_ARRAY: Array of text valuesDATE: Date/timestampNUMBER: Numeric valueINT: Integer valueBOOLEAN: Boolean value
Define reusable metadata and filter schemas that are automatically applied to relevant tools:
shared:
# Content types available across all tools
contentTypes:
- "code"
- "note"
- "documentation"
- "todo"
# ... add your custom types
# Priority levels
priorities:
- "low"
- "medium"
- "high"
# Metadata fields for index_new and index_existing tools
metadataFields:
- name: "contentType"
type: "string"
description: "Type of content"
required: true
enum: [] # Uses contentTypes from above
- name: "title"
type: "string"
description: "Document title"
required: true
- name: "tags"
type: "array"
description: "Document tags"
required: true
# Add more metadata fields as needed
# Filter fields for search and query tools
filterFields:
- name: "contentType"
type: "array"
description: "Filter by content types"
- name: "project"
type: "string"
description: "Filter by project name"
- name: "dateRange"
type: "object"
description: "Filter by date range"
properties:
after: "ISO date string"
before: "ISO date string"
# Add more filter fields as neededKey Benefits of Shared Definitions:
- DRY Principle: Define metadata and filters once, use everywhere
- Consistency: Ensures all tools use the same field definitions
- Easy Customization: Add new content types or metadata fields in one place
- Type Safety: Automatically generates proper JSON schemas for tools
Enable or disable specific tools:
tools:
- name: "search_index"
enabled: true
- name: "index_new"
enabled: true
- name: "index_existing"
enabled: true
- name: "unindex"
enabled: true
- name: "find_similar_in_index"
enabled: true
- name: "ask_index"
enabled: trueNote: Tool names and descriptions are hardcoded in the tool classes. The configuration only controls which tools are enabled/disabled.
For different use cases, create separate configuration files:
# Development configuration
cp config.yaml config.dev.yaml
# Production configuration
cp config.yaml config.prod.yaml
# Custom use case
cp config.yaml config.custom.yamlThen specify which config to use:
INDEX_CONFIG_PATH=./config.prod.yaml npm startTo add custom content types for your specific use case:
shared:
contentTypes:
- "code"
- "note"
- "research-paper" # Custom type
- "meeting-notes" # Custom type
- "customer-feedback" # Custom type
metadataFields:
- name: "contentType"
type: "string"
required: true
# Add custom metadata for your types
- name: "department"
type: "string"
description: "Department name"
- name: "confidentiality"
type: "string"
description: "Confidentiality level"
enum: ["public", "internal", "confidential"]Hybrid search combining semantic and keyword matching.
Index new content with metadata.
Update existing indexed content.
Remove content from the index.
Find similar content based on a reference document.
RAG-based question answering using indexed content.
The Index MCP server includes comprehensive security filtering to prevent indexing sensitive data:
- Environment files:
.env,.env.*,.env-* - Secrets:
secrets.yaml,credentials.json - Private keys:
.pem,.key,.p12,id_rsa, etc. - API keys:
api-keys.json,tokens.json - Cloud credentials:
.aws/credentials,.kube/config - Git credentials:
.git-credentials,.netrc
node_modules/,.git/,.vscode/,.idea/dist/,build/,coverage/.aws/,.ssh/,.gnupg/,.kube/.terraform/,vendor/
The system also scans file content for sensitive patterns:
- API keys and tokens (long alphanumeric strings)
- Passwords in configuration
- Private key blocks
- Certificate blocks
Note: Both index_new (with file paths) and bulk_index automatically apply these security filters.
The Index MCP server is started automatically by your MCP client (Kilo Code, Claude Desktop, etc.) based on the configuration in your MCP settings file. You don't need to manually start the server.
The server reads configuration from config.yaml by default. To use a different config file, set the INDEX_CONFIG_PATH environment variable in your MCP settings:
{
"mcpServers": {
"index": {
"env": {
"INDEX_CONFIG_PATH": "/path/to/custom/config.yaml"
}
}
}
}# Watch mode
npm run watch
# Build
npm run build
# Lint
npm run lintINDEX_CONFIG_PATH: Path to configuration file (default:./config.yaml)WEAVIATE_URL: Weaviate instance URL (overrides config)WEAVIATE_API_KEY: Weaviate API key (overrides config)WEAVIATE_TIMEOUT: Request timeout in millisecondsWEAVIATE_RETRIES: Number of retry attempts
To use this server with Kilo Code or other MCP clients, add it to your MCP settings file:
Add to kilo-code/settings/mcp_settings.json:
{
"mcpServers": {
"index": {
"command": "node",
"args": [
"/path/to/your/index/dist/server.js"
],
"env": {
"WEAVIATE_URL": "http://localhost:8080",
"OPENAI_APIKEY": "sk-...",
"INDEX_CONFIG_PATH": "/path/to/your/index/config.yaml"
},
"alwaysAllow": [
"search_index",
"index_new",
"index_existing",
"unindex",
"find_similar_in_index",
"ask_index",
"bulk_index"
]
}
}
}Add to Claude/claude_desktop_config.json:
{
"mcpServers": {
"index": {
"command": "node",
"args": ["/path/to/your/index/dist/server.js"],
"env": {
"WEAVIATE_URL": "http://localhost:8080",
"OPENAI_APIKEY": "sk-...",
"INDEX_CONFIG_PATH": "/path/to/your/index/config.yaml"
}
}
}
}Important:
-
Replace
/path/to/your/indexwith the actual path to this project -
Set your OpenAI API key in the
OPENAI_APIKEYenvironment variable -
The
alwaysAllowlist (Kilo Code only) pre-approves tool usage for convenience -
OPENAI_APIKEY: OpenAI API key for vectorization
The server uses a modular architecture:
- Config Loader: Loads and validates YAML configuration
- Schema Builder: Generates tool schemas from shared definitions
- Tool Factory: Creates tool instances based on configuration
- Weaviate Client: Manages Weaviate connection and operations
This design allows for:
- Easy customization through YAML config
- Reusable metadata and filter definitions
- Dynamic tool generation
- Multiple use case support
MIT
Patrick Michaelsen