Document page extraction tool powered by DeepSeek-OCR.
⚠️ Important: This package requires PyTorch with CUDA support (GPU Required). PyTorch is NOT automatically installed - you must install it manually first.
Choose the command that matches your CUDA version:
# For CUDA 12.1 (recommended for most users)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# For CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.6
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126💡 Don't know your CUDA version? Run
nvidia-smito check, or just try CUDA 12.1 (works with most recent drivers).
pip install doc-page-extractorCheck if everything is working:
python -c "import doc_page_extractor; import torch; print('✓ Installation successful!'); print('✓ CUDA available:', torch.cuda.is_available())"Expected output:
✓ Installation successful!
✓ CUDA available: True
If CUDA shows False, see the troubleshooting section below.
from doc_page_extractor import PageExtractor
# Your code hereInstall PyTorch first:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121Check your GPU driver:
nvidia-smiIf the command fails, you need to install NVIDIA drivers:
- Download from: https://www.nvidia.com/download/index.aspx
If it succeeds, you might have CPU-only PyTorch. Reinstall with CUDA:
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121- Python >= 3.10, < 3.14
- NVIDIA GPU with CUDA 11.8 or 12.1 support (Required)
- Sufficient GPU memory (recommended: 4GB+ VRAM)
For contributors and developers, see Development Guide for:
- Running tests
- Running lint checks
- Building the package