doc-page-extractor

Document page extraction tool powered by DeepSeek-OCR.

Installation

⚠️ Important: This package requires PyTorch with CUDA support (GPU Required). PyTorch is NOT automatically installed - you must install it manually first.

Step 1: Install PyTorch with CUDA

Choose the command that matches your CUDA version:

# For CUDA 12.1 (recommended for most users)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# For CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.6
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126

💡 Don't know your CUDA version? Run nvidia-smi to check, or just try CUDA 12.1 (works with most recent drivers).

Step 2: Install doc-page-extractor

pip install doc-page-extractor

Verify Installation

Check if everything is working:

python -c "import doc_page_extractor; import torch; print('✓ Installation successful!'); print('✓ CUDA available:', torch.cuda.is_available())"

Expected output:

✓ Installation successful!
✓ CUDA available: True

If CUDA shows False, see the troubleshooting section below.

Usage

from doc_page_extractor import PageExtractor

# Your code here

Troubleshooting

"PyTorch is required but not installed!"

Install PyTorch first:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

"CUDA is not available!"

Check your GPU driver:

nvidia-smi

If the command fails, you need to install NVIDIA drivers:

Download from: https://www.nvidia.com/download/index.aspx

If it succeeds, you might have CPU-only PyTorch. Reinstall with CUDA:

pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Requirements

Python >= 3.10, < 3.14
NVIDIA GPU with CUDA 11.8 or 12.1 support (Required)
Sufficient GPU memory (recommended: 4GB+ VRAM)

Development

For contributors and developers, see Development Guide for:

Running tests
Running lint checks
Building the package

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
.vscode		.vscode
doc_page_extractor		doc_page_extractor
docs		docs
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
build.py		build.py
download.py		download.py
lint.py		lint.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

doc-page-extractor

Installation

Step 1: Install PyTorch with CUDA

Step 2: Install doc-page-extractor

Verify Installation

Usage

Troubleshooting

"PyTorch is required but not installed!"

"CUDA is not available!"

Requirements

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Moskize91/doc-page-extractor

Folders and files

Latest commit

History

Repository files navigation

doc-page-extractor

Installation

Step 1: Install PyTorch with CUDA

Step 2: Install doc-page-extractor

Verify Installation

Usage

Troubleshooting

"PyTorch is required but not installed!"

"CUDA is not available!"

Requirements

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages