Bluetooth Specifications MCP Server

RAG-powered search for Bluetooth specifications. Ask questions in natural language, get precise answers with citations.

✨ What It Does

Search Bluetooth spec PDFs using hybrid search (BM25 + semantic embeddings) through the Model Context Protocol. Works with Claude Code, Cline, and other MCP clients.

Example:

"How does Bluetooth pairing protect against MITM attacks?"

→ Returns relevant sections from Core Specification with page numbers and context.

🚀 Quick Start

# 1. Install
python3 -m venv .venv && source .venv/bin/activate
pip install -e .                    # BM25-only (~100MB, 2 min)
# OR
pip install -e ".[semantic]"        # Full features (~7.2GB, 15 min)

# 2. Add PDFs
mkdir specs && cp your-bluetooth-spec.pdf specs/

# 3. Generate cache (first run indexes PDFs)
btmcp-server
# Wait for "Ready! Indexed X chunks", then Ctrl+C

# 4. Configure your MCP client (see Configuration section below)

# Done! Your MCP client will launch the server automatically.

Why generate cache first? The first run indexes all PDFs which can take 1-5 minutes depending on PDF size. Running standalone first prevents MCP client timeouts.

⚖️ Legal Notice

Users must provide their own legally obtained PDF copies of Bluetooth specifications.

This MCP server provides tools to access Bluetooth specification documents. You are responsible for obtaining PDFs legally from official sources. Visit bluetooth.com/specifications for information about specifications.

📦 Features

Hybrid Search - BM25 keyword + semantic embeddings (RRF fusion)
Smart Chunking - Section-based (keeps multi-page sections together)
Metadata Extraction - Requirements, UUIDs, tables, figures
3 Search Modes - BM25 (exact), semantic (meaning), hybrid (both)
Persistent Cache - Fast subsequent searches
4 MCP Tools - search, refresh, list, check_status

📖 Installation Modes

Mode	Size	Time	Features	When to Use
BM25-only	~100MB	2 min	Keyword search	Quick configuration
Semantic	~7.2GB	15 min	Hybrid search	Best accuracy

Note: You can start with BM25-only and upgrade later with pip install -e ".[semantic]"

💡 Configuration

Claude Code

The easiest way to add the server:

claude mcp add --transport stdio bluetooth-specifications /absolute/path/to/mcp-bluetooth-specification/.venv/bin/btmcp-server

Or manually edit .mcp.json:

{
  "mcpServers": {
    "bt-specs": {
      "command": "/absolute/path/to/.venv/bin/btmcp-server"
    }
  }
}

Cursor

Add to your MCP settings (File → Preferences → MCP):

{
  "mcpServers": {
    "bt-specs": {
      "command": "/absolute/path/to/.venv/bin/btmcp-server"
    }
  }
}

Note: Replace /absolute/path/to/ with your actual project path. The MCP client will launch the server automatically when needed.

🖥️ Manual Server Mode

For testing or running as a long-running daemon:

source .venv/bin/activate
btmcp-server  # stdio mode (default)

HTTP mode (advanced):

MCP_TRANSPORT=streamable-http btmcp-server  # Runs on http://127.0.0.1:8000

Environment Variables

MCP_TRANSPORT - stdio (default) or streamable-http
MCP_HOST - Host for HTTP mode (default: 127.0.0.1)
MCP_PORT - Port for HTTP mode (default: 8000)
BTMCP_CACHE_DIR - Custom cache directory (default: <project>/.cache/)

🔍 MCP Tools

The server provides 4 tools for MCP clients:

`search_specifications`

Search Bluetooth specs with hybrid search.

Parameters:

query (string) - Your question or search terms
mode (string) - "bm25", "semantic", or "hybrid" (default)
top_k (int) - Number of results (default: 3)
filter_pdfs (string) - Filter by PDF names (comma-separated)

Example:

search_specifications(
    query="How does LE Secure Connections work?",
    mode="hybrid",
    top_k=3
)

`list_indexed_specs`

List all indexed PDFs with statistics.

`check_index_status`

Check if cache is fresh or needs rebuilding.

`refresh_index`

Rebuild index from PDFs (use after adding/updating specs).

Testing the Server Manually

We provide three test scripts for manual testing:

1. Automated Test (`test_live.py`)

Runs a full suite of test queries and shows results:

source .venv/bin/activate
python test_live.py

This will:

Load all PDFs from specs/
Run 5 test queries ("GATT Service", "L2CAP layer", "frequency", "routing", "security")
Display results with scores and citations
Show MCP formatted output

2. Interactive Test (`test_interactive.py`)

Search interactively with your own queries:

source .venv/bin/activate
python test_interactive.py

Then type your search queries. Type quit to exit.

Example queries:

ATT protocol
device address
pairing
LE Secure Connections

3. MCP Format Test (`test_mcp_format.py`)

See the exact MCP resource output format:

source .venv/bin/activate
python test_mcp_format.py

Enter a query and see the formatted output that MCP clients would receive.

Configuring for Claude Code

Create or add to your project's .mcp.json the following configuration:

{
  "mcpServers": {
    "bt-specs": {
      "type": "http",
      "url": "http://127.0.0.1:8000/mcp"
    }
  }
}

To test it, ask Claude Code: "Search the Bluetooth specs for GATT Service"

Using with MCP Clients

The server exposes four MCP tools:

Tool: search_specifications

Parameters:
- query (string) - Search terms (e.g., "GATT Service", "Device Information Service")
- mode (string, optional) - Search mode: "bm25" (keyword), "semantic" (meaning), or "hybrid" (default)
- top_k (int, optional) - Number of results to return (default: 3)
- filter_pdfs (string, optional) - Comma-separated list of PDF names to filter results (e.g., "doc1.pdf,doc2.pdf")
Returns: Formatted results with text and citations

Tool: list_indexed_specs

Parameters: None
Returns: List of all indexed PDFs with statistics (pages, chunks)

Tool: check_index_status

Parameters: None
Returns: Cache freshness status and which PDFs need reindexing

Tool: refresh_index

Parameters: None
Returns: Status message after rebuilding index from PDFs

Choosing the Right Search Mode

The server supports three search modes optimized for different query types:

Mode: `bm25` (Keyword Matching)

Best for: Known terminology, command names, protocol names

Query: "GATT service ATT protocol attributes"
Mode: bm25

Response: Top results focus on exact keyword matches:
- Result 1 (score: 25.851): "Attribute Protocol (ATT) block implements the
  peer-to-peer protocol... The Generic Attribute Profile (GATT) block represents
  the functionality of the ATT Server..." (Core_v6.1.pdf, Page 245)
- Result 2 (score: 24.041): SDP record details for GATT service with ATT
  protocol descriptors and attribute handles (Core_v6.1.pdf, Page 1613)

Use when: You know the exact technical terms and want precise matches

Mode: `semantic` (Meaning-Based)

Best for: Natural questions, conceptual understanding, "how-to" queries

Query: "How does pairing work with secure connections"
Mode: semantic

Response: Conceptually related content about security and pairing:
- Result 1 (score: 0.706): "Secure Simple Pairing has two security goals:
  protection against passive eavesdropping and MITM attacks... uses Elliptic
  Curve Diffie Hellman (ECDH) public key cryptography" (Core_v6.1.pdf, Page 313)
- Result 2 (score: 0.645): "Man-in-the-middle (MITM) attack... Secure Simple
  Pairing offers two user assisted numeric methods: numeric comparison or
  passkey entry" (Core_v6.1.pdf, Page 314)

Use when: Asking how/why questions, exploring concepts, learning about features

Mode: `hybrid` (Best of Both - Default)

Best for: General queries, balanced precision/recall

Query: "LE Secure Connections pairing authentication"
Mode: hybrid (default)

Response: Combines exact matches with conceptual similarity:
- Result 1 (score: 0.016): Complete pairing flow diagram showing "Phase 1:
  Established LL connection, Phase 2: Pairing over SMP (Legacy pairing or
  Secure Connections), Phase 3: Establishment of encrypted connection"
  (Core_v6.1.pdf, Page 1629)
- Result 2 (score: 0.016): Authentication procedure details including
  commitment checks and failure handling (Core_v6.1.pdf, Page 701)

Uses RRF (Reciprocal Rank Fusion) to merge BM25 + semantic rankings.

Use when: Not sure which mode to use (recommended default)

Recommendation: Use hybrid (default) for 90% of queries. Only switch to bm25 or semantic if hybrid results don't meet your needs.

Filtering by PDF

Search within specific specifications:

# Search only in Core Specification v6.1
search_specifications(
    query="encryption key management",
    filter_pdfs="Core_v6.1.pdf"
)

Response:
- Result 1 (score: 0.031): Complete table of contents for security chapter
  showing "3 Key management... 3.2.5 Generating the encryption key...
  4.2.5.8 Encryption key refresh" (Core_v6.1.pdf, Page 1016)
- Result 2 (score: 0.030): "Encryption key refresh... shall refresh the
  encryption key within 2^28 ticks... procedures for pause and resume
  encryption" (Core_v6.1.pdf, Page 693)

# Search across multiple specs (comma-separated)
search_specifications(
    query="ATT protocol",
    filter_pdfs="Core_v6.1.pdf,80211MP.TS_.p7.pdf"
)

# Partial names work too
search_specifications(
    query="security",
    filter_pdfs="Core_v6"  # Matches Core_v6.1.pdf
)

Use cases:

Focus on specific specification document
Avoid mixing results from different spec versions
Compare implementations across different documents

Semantic Chunking Benefits

Section-based chunking keeps related content together:

** With page chunking:**

Query: "Device Information Service GATT Service documentation"

❌ Fragmented results:
- Chunk 1: Page 85 (partial introduction)
- Chunk 2: Page 86 (middle of command, no context)
- Chunk 3: Page 87 (table only, incomplete)

Problem: Context broken across page boundaries

With semantic chunking:

Query: "Device Information Service GATT Service documentation"

✅ Complete section:
- Chunk: Section 4.26 "Device Information Service GATT Service"
  - Text: Pages 85-87 combined
  - Introduction + Commands + Tables together
  - Metadata: 6 requirements, 2 hex values, 3 tables

Result: Full context, nothing missing

What we do:

Group multi-page sections (e.g., Section 4.2.1 spanning pages 1-3)
Size constraints: 100-2000 tokens per chunk
50-token overlap between split chunks (preserves continuity)
Merge small trailing chunks (avoid fragments < 100 tokens)

Impact: Better RAG quality - agents get complete semantic units, not arbitrary page cuts.

Tool Chaining Workflow

Agents can chain multiple tools for complex tasks:

User: "Check if the index is current, then find information about
       encryption key management in the Core spec"

Agent workflow:
1. check_index_status()
   → Returns: "✓ Cache is FRESH - all PDFs up to date
               Total PDFs: 2
               • 80211MP.TS_.p7.pdf: OK
               • Core_v6.1.pdf: OK"

2. search_specifications(
       query="encryption key management",
       filter_pdfs="Core_v6.1.pdf",
       mode="hybrid",
       top_k=2
   )
   → Returns: Two chunks about key management and encryption key refresh
               from Core Specification pages 1016 and 693

3. Agent synthesizes answer:
   - Index is fresh (no rebuild needed)
   - Found detailed security chapter with key management sections
   - Includes encryption key refresh procedures and timing requirements

Available tool combinations:

check_index_status → refresh_index (rebuild if stale)
list_indexed_specs → search_specifications with filter_pdfs
search_specifications → Extract metadata → Follow-up search

Benefits:

Stateful caching (no re-indexing between queries)
Metadata-aware search (requirements, hex values, tables, figures)
Compositional workflows across multiple tools

Real Claude Code Conversation

User: How does Bluetooth pairing protect against man-in-the-middle attacks?

Claude: I'll search the Bluetooth specifications for information about MITM
        protection in pairing.

[Uses: search_specifications("How does pairing work with secure connections",
                              mode="semantic", top_k=2)]

Based on the Bluetooth Core Specification v6.1, here's how pairing protects
against man-in-the-middle (MITM) attacks:

1. Secure Simple Pairing Goals (Page 313):
   - Protection against passive eavesdropping using Elliptic Curve Diffie
     Hellman (ECDH) public key cryptography
   - ECDH provides very high degree of strength against passive attacks
   - For Secure Connections: uses P-256 elliptic curve with ~128 bits entropy

2. MITM Protection Methods (Page 314):
   - Offers two user-assisted numeric methods: numeric comparison or passkey
     entry
   - Uses 6-digit numbers providing 1 in 1,000,000 chance of successful MITM
   - If MITM attack fails, users are alerted when connection process fails
   - This level was chosen for FIPS compliance with minimal usability impact

The key insight: While ECDH prevents passive eavesdropping, the numeric
comparison/passkey entry ensures users can verify they're connecting to the
intended device and not an attacker in the middle.

Would you like me to search for more details about the authentication
procedures?

Development

Running Tests

Run all tests:

source .venv/bin/activate
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
pytest

Run specific test file:

pytest tests/test_indexer.py -v

Run with coverage:

pytest --cov=btmcp

Linting and Formatting

Check code:

ruff check .

Format code:

ruff format .

Auto-fix issues:

ruff check --fix .

Project Structure

bluetooth-specifications-mcp/
├── src/btmcp/
│   ├── __init__.py
│   ├── server.py                # MCP server with 4 tools
│   ├── spec_server.py           # PDF loading, caching, search coordinator
│   ├── pdf_loader.py            # pypdf extraction + NFKC normalization
│   ├── indexer.py               # Semantic chunking + hybrid search
│   ├── models.py                # Dataclasses (metadata structures)
│   └── metadata_extractor.py    # Regex parsers (requirements, hex, tables)
├── tests/
│   ├── test_server.py           # MCP server tests
│   ├── test_spec_server.py      # Integration tests
│   ├── test_mcp_integration.py  # MCP protocol compliance
│   ├── test_pdf_loader.py       # PDF extraction + normalization
│   ├── test_indexer.py          # Chunking + search (36 tests)
│   ├── test_metadata_extractor.py  # Metadata parsing (15 tests)
│   ├── test_models.py           # Dataclass validation (12 tests)
│   └── test_error_handling.py   # Exception propagation (2 tests)
├── test_live.py                 # Automated test with real PDFs
├── test_interactive.py          # Interactive search testing
├── test_mcp_format.py           # MCP format output test
├── specs/                       # Place PDF files here
├── .mcp.json                    # Claude Code configuration
└── pyproject.toml

Architecture

Data Flow:

PDF Files (specs/)
  → PDFLoader.load_pdf() → NFKC normalization
  → Semantic Chunking:
      • Group multi-page sections
      • Size constraints (100-2000 tokens, 50 overlap)
  → MetadataExtractor:
      • Requirements (section references, MUST/SHOULD/MAY levels)
      • Hex values (Service/Characteristic UUIDs like 0x180D)
      • Tables & Figures
  → Indexer:
      • BM25 tokenization
      • Semantic embeddings (all-MiniLM-L6-v2)
  → Cache (pickle):
      • BM25 index
      • Embeddings
      • Metadata

Query
  → MCP Tool (search_specifications)
  → Mode selection (bm25/semantic/hybrid)
  → Hybrid: RRF fusion (k=60)
  → PDF filtering (optional)
  → Top-K Results + Metadata

Components (7 modules):

server.py: MCP server, 4 tools (search, refresh, list, check)
spec_server.py: Coordinates PDF loading, caching, searching
pdf_loader.py: pypdf extraction + NFKC Unicode normalization
indexer.py: Semantic chunking, hybrid search (BM25 + embeddings)
models.py: Dataclasses (RequirementInfo, TableInfo, FigureInfo, ChunkMetadata)
metadata_extractor.py: Regex parsers for requirements, hex, tables, figures

Troubleshooting

Issue: "ModuleNotFoundError: No module named 'btmcp'"

Solution: Set PYTHONPATH:

export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"

Issue: "Warning: specs directory not found"

Solution: Create specs directory and add PDFs:

mkdir specs
cp /path/to/your/spec.pdf specs/

Issue: Empty search results

Check:

PDFs are loaded on startup (check console output)
Search query matches content in PDFs
Try broader search terms

Disclaimer

This is a personal learning project to explore MCP and other AI-related technologies. I'm doing it in my spare time. This project is heavily developed with Claude Code, what some would call "Vibe Coding". I provide zero warranties. Consider the project experimental.

License

BSD-3-Clause - see LICENSE.md

Third-Party Dependencies

Dependency	License	Copyright	Source
mcp	MIT	Anthropic, PBC	https://modelcontextprotocol.io
pypdf	BSD-3-Clause	Mathieu Fenniak and contributors	https://github.com/py-pdf/pypdf
rank-bm25	Apache 2.0	Dorian Brown	https://github.com/dorianbrown/rank_bm25
sentence-transformers	Apache 2.0	Nils Reimers and contributors	https://www.SBERT.net
numpy	BSD-3-Clause	NumPy Developers	https://numpy.org

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src/btmcp		src/btmcp
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.md		LICENSE.md
README.md		README.md
manual_mcp_test.py		manual_mcp_test.py
pyproject.toml		pyproject.toml
test_interactive.py		test_interactive.py
test_live.py		test_live.py
test_mcp_format.py		test_mcp_format.py

License

lmolina/mcp-bluetooth-specification

Folders and files

Latest commit

History

Repository files navigation

Bluetooth Specifications MCP Server

✨ What It Does

🚀 Quick Start

⚖️ Legal Notice

📦 Features

📖 Installation Modes

💡 Configuration

Claude Code

Cursor

🖥️ Manual Server Mode

Environment Variables

🔍 MCP Tools

search_specifications

list_indexed_specs

check_index_status

refresh_index

Testing the Server Manually

1. Automated Test (test_live.py)

2. Interactive Test (test_interactive.py)

3. MCP Format Test (test_mcp_format.py)

Configuring for Claude Code

Using with MCP Clients

Choosing the Right Search Mode

Mode: bm25 (Keyword Matching)

Mode: semantic (Meaning-Based)

Mode: hybrid (Best of Both - Default)

Filtering by PDF

Semantic Chunking Benefits

Tool Chaining Workflow

Real Claude Code Conversation

Development

Running Tests

Linting and Formatting

Project Structure

Architecture

Troubleshooting

Disclaimer

License

Third-Party Dependencies

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

`search_specifications`

`list_indexed_specs`

`check_index_status`

`refresh_index`

1. Automated Test (`test_live.py`)

2. Interactive Test (`test_interactive.py`)

3. MCP Format Test (`test_mcp_format.py`)

Mode: `bm25` (Keyword Matching)

Mode: `semantic` (Meaning-Based)

Mode: `hybrid` (Best of Both - Default)