RAG-powered search for Bluetooth specifications. Ask questions in natural language, get precise answers with citations.
Search Bluetooth spec PDFs using hybrid search (BM25 + semantic embeddings) through the Model Context Protocol. Works with Claude Code, Cline, and other MCP clients.
Example:
"How does Bluetooth pairing protect against MITM attacks?"
→ Returns relevant sections from Core Specification with page numbers and context.
# 1. Install
python3 -m venv .venv && source .venv/bin/activate
pip install -e . # BM25-only (~100MB, 2 min)
# OR
pip install -e ".[semantic]" # Full features (~7.2GB, 15 min)
# 2. Add PDFs
mkdir specs && cp your-bluetooth-spec.pdf specs/
# 3. Generate cache (first run indexes PDFs)
btmcp-server
# Wait for "Ready! Indexed X chunks", then Ctrl+C
# 4. Configure your MCP client (see Configuration section below)
# Done! Your MCP client will launch the server automatically.Why generate cache first? The first run indexes all PDFs which can take 1-5 minutes depending on PDF size. Running standalone first prevents MCP client timeouts.
Users must provide their own legally obtained PDF copies of Bluetooth specifications.
This MCP server provides tools to access Bluetooth specification documents. You are responsible for obtaining PDFs legally from official sources. Visit bluetooth.com/specifications for information about specifications.
- Hybrid Search - BM25 keyword + semantic embeddings (RRF fusion)
- Smart Chunking - Section-based (keeps multi-page sections together)
- Metadata Extraction - Requirements, UUIDs, tables, figures
- 3 Search Modes - BM25 (exact), semantic (meaning), hybrid (both)
- Persistent Cache - Fast subsequent searches
- 4 MCP Tools - search, refresh, list, check_status
| Mode | Size | Time | Features | When to Use |
|---|---|---|---|---|
| BM25-only | ~100MB | 2 min | Keyword search | Quick configuration |
| Semantic | ~7.2GB | 15 min | Hybrid search | Best accuracy |
Note: You can start with BM25-only and upgrade later with pip install -e ".[semantic]"
The easiest way to add the server:
claude mcp add --transport stdio bluetooth-specifications /absolute/path/to/mcp-bluetooth-specification/.venv/bin/btmcp-serverOr manually edit .mcp.json:
{
"mcpServers": {
"bt-specs": {
"command": "/absolute/path/to/.venv/bin/btmcp-server"
}
}
}Add to your MCP settings (File → Preferences → MCP):
{
"mcpServers": {
"bt-specs": {
"command": "/absolute/path/to/.venv/bin/btmcp-server"
}
}
}Note: Replace /absolute/path/to/ with your actual project path. The MCP client will launch the server automatically when needed.
For testing or running as a long-running daemon:
source .venv/bin/activate
btmcp-server # stdio mode (default)HTTP mode (advanced):
MCP_TRANSPORT=streamable-http btmcp-server # Runs on http://127.0.0.1:8000MCP_TRANSPORT-stdio(default) orstreamable-httpMCP_HOST- Host for HTTP mode (default:127.0.0.1)MCP_PORT- Port for HTTP mode (default:8000)BTMCP_CACHE_DIR- Custom cache directory (default:<project>/.cache/)
The server provides 4 tools for MCP clients:
Search Bluetooth specs with hybrid search.
Parameters:
query(string) - Your question or search termsmode(string) -"bm25","semantic", or"hybrid"(default)top_k(int) - Number of results (default: 3)filter_pdfs(string) - Filter by PDF names (comma-separated)
Example:
search_specifications(
query="How does LE Secure Connections work?",
mode="hybrid",
top_k=3
)List all indexed PDFs with statistics.
Check if cache is fresh or needs rebuilding.
Rebuild index from PDFs (use after adding/updating specs).
We provide three test scripts for manual testing:
Runs a full suite of test queries and shows results:
source .venv/bin/activate
python test_live.pyThis will:
- Load all PDFs from
specs/ - Run 5 test queries ("GATT Service", "L2CAP layer", "frequency", "routing", "security")
- Display results with scores and citations
- Show MCP formatted output
Search interactively with your own queries:
source .venv/bin/activate
python test_interactive.pyThen type your search queries. Type quit to exit.
Example queries:
ATT protocoldevice addresspairingLE Secure Connections
See the exact MCP resource output format:
source .venv/bin/activate
python test_mcp_format.pyEnter a query and see the formatted output that MCP clients would receive.
Create or add to your project's .mcp.json the following configuration:
{
"mcpServers": {
"bt-specs": {
"type": "http",
"url": "http://127.0.0.1:8000/mcp"
}
}
}To test it, ask Claude Code: "Search the Bluetooth specs for GATT Service"
The server exposes four MCP tools:
Tool: search_specifications
- Parameters:
query(string) - Search terms (e.g., "GATT Service", "Device Information Service")mode(string, optional) - Search mode: "bm25" (keyword), "semantic" (meaning), or "hybrid" (default)top_k(int, optional) - Number of results to return (default: 3)filter_pdfs(string, optional) - Comma-separated list of PDF names to filter results (e.g., "doc1.pdf,doc2.pdf")
- Returns: Formatted results with text and citations
Tool: list_indexed_specs
- Parameters: None
- Returns: List of all indexed PDFs with statistics (pages, chunks)
Tool: check_index_status
- Parameters: None
- Returns: Cache freshness status and which PDFs need reindexing
Tool: refresh_index
- Parameters: None
- Returns: Status message after rebuilding index from PDFs
The server supports three search modes optimized for different query types:
Best for: Known terminology, command names, protocol names
Query: "GATT service ATT protocol attributes"
Mode: bm25
Response: Top results focus on exact keyword matches:
- Result 1 (score: 25.851): "Attribute Protocol (ATT) block implements the
peer-to-peer protocol... The Generic Attribute Profile (GATT) block represents
the functionality of the ATT Server..." (Core_v6.1.pdf, Page 245)
- Result 2 (score: 24.041): SDP record details for GATT service with ATT
protocol descriptors and attribute handles (Core_v6.1.pdf, Page 1613)
Use when: You know the exact technical terms and want precise matches
Best for: Natural questions, conceptual understanding, "how-to" queries
Query: "How does pairing work with secure connections"
Mode: semantic
Response: Conceptually related content about security and pairing:
- Result 1 (score: 0.706): "Secure Simple Pairing has two security goals:
protection against passive eavesdropping and MITM attacks... uses Elliptic
Curve Diffie Hellman (ECDH) public key cryptography" (Core_v6.1.pdf, Page 313)
- Result 2 (score: 0.645): "Man-in-the-middle (MITM) attack... Secure Simple
Pairing offers two user assisted numeric methods: numeric comparison or
passkey entry" (Core_v6.1.pdf, Page 314)
Use when: Asking how/why questions, exploring concepts, learning about features
Best for: General queries, balanced precision/recall
Query: "LE Secure Connections pairing authentication"
Mode: hybrid (default)
Response: Combines exact matches with conceptual similarity:
- Result 1 (score: 0.016): Complete pairing flow diagram showing "Phase 1:
Established LL connection, Phase 2: Pairing over SMP (Legacy pairing or
Secure Connections), Phase 3: Establishment of encrypted connection"
(Core_v6.1.pdf, Page 1629)
- Result 2 (score: 0.016): Authentication procedure details including
commitment checks and failure handling (Core_v6.1.pdf, Page 701)
Uses RRF (Reciprocal Rank Fusion) to merge BM25 + semantic rankings.
Use when: Not sure which mode to use (recommended default)
Recommendation: Use hybrid (default) for 90% of queries. Only switch to
bm25 or semantic if hybrid results don't meet your needs.
Search within specific specifications:
# Search only in Core Specification v6.1
search_specifications(
query="encryption key management",
filter_pdfs="Core_v6.1.pdf"
)
Response:
- Result 1 (score: 0.031): Complete table of contents for security chapter
showing "3 Key management... 3.2.5 Generating the encryption key...
4.2.5.8 Encryption key refresh" (Core_v6.1.pdf, Page 1016)
- Result 2 (score: 0.030): "Encryption key refresh... shall refresh the
encryption key within 2^28 ticks... procedures for pause and resume
encryption" (Core_v6.1.pdf, Page 693)
# Search across multiple specs (comma-separated)
search_specifications(
query="ATT protocol",
filter_pdfs="Core_v6.1.pdf,80211MP.TS_.p7.pdf"
)
# Partial names work too
search_specifications(
query="security",
filter_pdfs="Core_v6" # Matches Core_v6.1.pdf
)Use cases:
- Focus on specific specification document
- Avoid mixing results from different spec versions
- Compare implementations across different documents
Section-based chunking keeps related content together:
** With page chunking:**
Query: "Device Information Service GATT Service documentation"
❌ Fragmented results:
- Chunk 1: Page 85 (partial introduction)
- Chunk 2: Page 86 (middle of command, no context)
- Chunk 3: Page 87 (table only, incomplete)
Problem: Context broken across page boundaries
With semantic chunking:
Query: "Device Information Service GATT Service documentation"
✅ Complete section:
- Chunk: Section 4.26 "Device Information Service GATT Service"
- Text: Pages 85-87 combined
- Introduction + Commands + Tables together
- Metadata: 6 requirements, 2 hex values, 3 tables
Result: Full context, nothing missing
What we do:
- Group multi-page sections (e.g., Section 4.2.1 spanning pages 1-3)
- Size constraints: 100-2000 tokens per chunk
- 50-token overlap between split chunks (preserves continuity)
- Merge small trailing chunks (avoid fragments < 100 tokens)
Impact: Better RAG quality - agents get complete semantic units, not arbitrary page cuts.
Agents can chain multiple tools for complex tasks:
User: "Check if the index is current, then find information about
encryption key management in the Core spec"
Agent workflow:
1. check_index_status()
→ Returns: "✓ Cache is FRESH - all PDFs up to date
Total PDFs: 2
• 80211MP.TS_.p7.pdf: OK
• Core_v6.1.pdf: OK"
2. search_specifications(
query="encryption key management",
filter_pdfs="Core_v6.1.pdf",
mode="hybrid",
top_k=2
)
→ Returns: Two chunks about key management and encryption key refresh
from Core Specification pages 1016 and 693
3. Agent synthesizes answer:
- Index is fresh (no rebuild needed)
- Found detailed security chapter with key management sections
- Includes encryption key refresh procedures and timing requirements
Available tool combinations:
check_index_status→refresh_index(rebuild if stale)list_indexed_specs→search_specificationswithfilter_pdfssearch_specifications→ Extract metadata → Follow-up search
Benefits:
- Stateful caching (no re-indexing between queries)
- Metadata-aware search (requirements, hex values, tables, figures)
- Compositional workflows across multiple tools
User: How does Bluetooth pairing protect against man-in-the-middle attacks?
Claude: I'll search the Bluetooth specifications for information about MITM
protection in pairing.
[Uses: search_specifications("How does pairing work with secure connections",
mode="semantic", top_k=2)]
Based on the Bluetooth Core Specification v6.1, here's how pairing protects
against man-in-the-middle (MITM) attacks:
1. Secure Simple Pairing Goals (Page 313):
- Protection against passive eavesdropping using Elliptic Curve Diffie
Hellman (ECDH) public key cryptography
- ECDH provides very high degree of strength against passive attacks
- For Secure Connections: uses P-256 elliptic curve with ~128 bits entropy
2. MITM Protection Methods (Page 314):
- Offers two user-assisted numeric methods: numeric comparison or passkey
entry
- Uses 6-digit numbers providing 1 in 1,000,000 chance of successful MITM
- If MITM attack fails, users are alerted when connection process fails
- This level was chosen for FIPS compliance with minimal usability impact
The key insight: While ECDH prevents passive eavesdropping, the numeric
comparison/passkey entry ensures users can verify they're connecting to the
intended device and not an attacker in the middle.
Would you like me to search for more details about the authentication
procedures?
Run all tests:
source .venv/bin/activate
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
pytestRun specific test file:
pytest tests/test_indexer.py -vRun with coverage:
pytest --cov=btmcpCheck code:
ruff check .Format code:
ruff format .Auto-fix issues:
ruff check --fix .bluetooth-specifications-mcp/
├── src/btmcp/
│ ├── __init__.py
│ ├── server.py # MCP server with 4 tools
│ ├── spec_server.py # PDF loading, caching, search coordinator
│ ├── pdf_loader.py # pypdf extraction + NFKC normalization
│ ├── indexer.py # Semantic chunking + hybrid search
│ ├── models.py # Dataclasses (metadata structures)
│ └── metadata_extractor.py # Regex parsers (requirements, hex, tables)
├── tests/
│ ├── test_server.py # MCP server tests
│ ├── test_spec_server.py # Integration tests
│ ├── test_mcp_integration.py # MCP protocol compliance
│ ├── test_pdf_loader.py # PDF extraction + normalization
│ ├── test_indexer.py # Chunking + search (36 tests)
│ ├── test_metadata_extractor.py # Metadata parsing (15 tests)
│ ├── test_models.py # Dataclass validation (12 tests)
│ └── test_error_handling.py # Exception propagation (2 tests)
├── test_live.py # Automated test with real PDFs
├── test_interactive.py # Interactive search testing
├── test_mcp_format.py # MCP format output test
├── specs/ # Place PDF files here
├── .mcp.json # Claude Code configuration
└── pyproject.toml
Data Flow:
PDF Files (specs/)
→ PDFLoader.load_pdf() → NFKC normalization
→ Semantic Chunking:
• Group multi-page sections
• Size constraints (100-2000 tokens, 50 overlap)
→ MetadataExtractor:
• Requirements (section references, MUST/SHOULD/MAY levels)
• Hex values (Service/Characteristic UUIDs like 0x180D)
• Tables & Figures
→ Indexer:
• BM25 tokenization
• Semantic embeddings (all-MiniLM-L6-v2)
→ Cache (pickle):
• BM25 index
• Embeddings
• Metadata
Query
→ MCP Tool (search_specifications)
→ Mode selection (bm25/semantic/hybrid)
→ Hybrid: RRF fusion (k=60)
→ PDF filtering (optional)
→ Top-K Results + Metadata
Components (7 modules):
- server.py: MCP server, 4 tools (search, refresh, list, check)
- spec_server.py: Coordinates PDF loading, caching, searching
- pdf_loader.py: pypdf extraction + NFKC Unicode normalization
- indexer.py: Semantic chunking, hybrid search (BM25 + embeddings)
- models.py: Dataclasses (RequirementInfo, TableInfo, FigureInfo, ChunkMetadata)
- metadata_extractor.py: Regex parsers for requirements, hex, tables, figures
Issue: "ModuleNotFoundError: No module named 'btmcp'"
Solution: Set PYTHONPATH:
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"Issue: "Warning: specs directory not found"
Solution: Create specs directory and add PDFs:
mkdir specs
cp /path/to/your/spec.pdf specs/Issue: Empty search results
Check:
- PDFs are loaded on startup (check console output)
- Search query matches content in PDFs
- Try broader search terms
This is a personal learning project to explore MCP and other AI-related technologies. I'm doing it in my spare time. This project is heavily developed with Claude Code, what some would call "Vibe Coding". I provide zero warranties. Consider the project experimental.
BSD-3-Clause - see LICENSE.md
| Dependency | License | Copyright | Source |
|---|---|---|---|
| mcp | MIT | Anthropic, PBC | https://modelcontextprotocol.io |
| pypdf | BSD-3-Clause | Mathieu Fenniak and contributors | https://github.com/py-pdf/pypdf |
| rank-bm25 | Apache 2.0 | Dorian Brown | https://github.com/dorianbrown/rank_bm25 |
| sentence-transformers | Apache 2.0 | Nils Reimers and contributors | https://www.SBERT.net |
| numpy | BSD-3-Clause | NumPy Developers | https://numpy.org |