Installation & Setup¶
This comprehensive guide covers installing Wurzel and setting up your development environment, including handling special dependencies and various deployment options.
Basic Installation¶
Prerequisites¶
- Python 3.11 or 3.12
- pip (Python package installer)
- Virtual environment (recommended)
Quick Install¶
To get started with Wurzel, install the library using pip:
Development Installation¶
For development work, we recommend using the provided Makefile which handles all dependencies:
# Clone the repository
git clone https://github.com/telekom/wurzel.git
cd wurzel
# Install all dependencies including development tools
make install
This installs:
- Core Wurzel library
- All optional dependencies
- Development tools (linting, testing, documentation)
- Pre-commit hooks setup
Manual Development Setup¶
If you prefer manual installation:
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode with all extras
pip install -e .[all,lint,test,docs]
# Install direct dependencies (see below)
pip install -r DIRECT_REQUIREMENTS.txt
# Set up pre-commit hooks
pre-commit install
Direct Dependencies¶
Due to PyPI restrictions on direct dependencies, some components require manual installation. This primarily affects the German spaCy model used for semantic text splitting.
ℹ️ Why Direct Dependencies? PyPI has restrictions on packages that include direct dependencies to external URLs for security reasons. The spaCy German model is hosted on GitHub releases rather than PyPI, requiring manual installation.
Manual Installation¶
If you plan to use the semantic text splitting functionality (e.g., SemanticSplitter
), you'll need to manually install the German spaCy model:
pip install https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.7.0/de_core_news_sm-3.7.0-py3-none-any.whl
Using DIRECT_REQUIREMENTS.txt¶
If you're working with the source code, you can install from the provided requirements file:
Optional Dependencies¶
Wurzel supports various optional features through extras. You can install only what you need for your specific use case.
Vector Database Support¶
# For Qdrant vector database
pip install wurzel[qdrant]
# For Milvus vector database
pip install wurzel[milvus]
Document Processing¶
Backend Support¶
Development Tools¶
# For linting and code quality tools
pip install wurzel[lint]
# For testing framework and tools
pip install wurzel[test]
# For documentation generation
pip install wurzel[docs]
Install Everything¶
# Install all optional dependencies
pip install wurzel[all]
# Don't forget the direct dependencies!
pip install -r DIRECT_REQUIREMENTS.txt
Docker Installation¶
The Docker image includes all dependencies automatically and is the easiest way to get started:
# Pull the latest image
docker pull ghcr.io/telekom/wurzel:latest
# Or build locally
docker build -t wurzel .
Running with Docker¶
docker run \
-e GIT_USER=wurzel-demo \
-e GIT_MAIL=demo@example.com \
-e DVC_DATA_PATH=/usr/app/data \
-e DVC_FILE=/usr/app/dvc.yaml \
-e WURZEL_PIPELINE=pipelinedemo:pipeline \
ghcr.io/telekom/wurzel:latest
Environment Configuration¶
When running Wurzel, several environment variables can be configured to customize behavior:
Git Configuration¶
GIT_USER
: Git username for repository operations (default:wurzel
)GIT_MAIL
: Git email for repository operations (default:wurzel@example.com
)
DVC Configuration¶
DVC_DATA_PATH
: Path where DVC stores data files (default:/usr/app/dvc-data
)DVC_FILE
: Path to the DVC pipeline definition file (default:/usr/app/dvc.yaml
)DVC_CACHE_HISTORY_NUMBER
: Number of cache versions to keep (default:30
)
Pipeline Configuration¶
WURZEL_PIPELINE
: Specifies which pipeline to execute (e.g.,pipelinedemo:pipeline
)
Monitoring (Optional)¶
PROMETHEUS__GATEWAY
: Prometheus pushgateway URL for metrics
For backend-specific configuration, see:
Verification¶
Verify Basic Installation¶
# Check Wurzel installation
python -c "import wurzel; print('Wurzel installed successfully')"
# Check CLI is available
wurzel --help
Verify spaCy Model¶
# Test spaCy model loading
python -c "import spacy; nlp = spacy.load('de_core_news_sm'); print('spaCy model loaded successfully')"
Run Tests¶
Troubleshooting¶
Common Issues¶
spaCy Model Issues¶
If you encounter issues with the spaCy model:
- Verify the model is installed:
- Reinstall the model if needed:
pip uninstall de-core-news-sm
pip install https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.7.0/de_core_news_sm-3.7.0-py3-none-any.whl
Environment Issues¶
- Python Version: Ensure you're using Python 3.11 or 3.12
- Virtual Environment: Always use a virtual environment to avoid conflicts:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install wurzel
Dependency Conflicts¶
- Clean Installation: Start with a fresh virtual environment
- Upgrade pip: Ensure you have the latest pip version:
Getting Help¶
If you encounter issues not covered here:
- Check the examples directory for working configurations
- Review the API documentation
- Consult the troubleshooting section in backend documentation
- Open an issue on the GitHub repository
Next Steps¶
Once you have Wurzel installed:
- Get Started with Development - Set up your development workflow
- Build Your First Pipeline - Learn the core concepts
- Explore Backends - Understand deployment options