Contributing guidelines

Contributions are welcome!

Big changes

Please open a new issue to discuss or propose a major change. Not only is it fun to discuss big ideas, but we might save each other’s time too. Perhaps some of the work you’re contemplating is already half-done in a development branch.

Code style

We use PEP8, black for code formatting and ruff for everything else. The settings for these programs are in pyproject.toml. Pull requests should follow the style guide. One difference we use from “black” style is that strings shown to the user are always in double quotes (") and strings for internal uses are in single quotes (').

Tests

New features should come with tests that confirm their correctness.

New dependencies

If you are proposing a change that will require a new dependency, we prefer dependencies that are already packaged by Debian or Red Hat. This makes life much easier for our downstream package maintainers. A package that is only available on PyPI or GitHub, and not more widely packaged, may not be accepted.

We are unlikely to accept a dependency on CUDA or other GPU-based libraries, because these are still difficult to package and install on many systems. We recommend implementing these changes as plugins.

Python dependencies must also be license-compatible. GPLv3 or AGPLv3 are likely incompatible with the project’s license, but LGPLv3 is compatible.

New non-Python dependencies

OCRmyPDF uses several external programs (Tesseract, Ghostscript and others) for its functionality. In general we prefer to avoid adding new external programs, and if we are to add external programs, we prefer those that are already packaged by Debian or Red Hat.

Plugins

Some new features may be a good fit for a plugin. Plugins are a way to add features to OCRmyPDF without adding them to the core program. Plugins are installed separately from OCRmyPDF. They are written in Python and can be installed from PyPI. See the plugin documentation.

We are happy to link users to your plugin from the documentation.

Style guide: Is it OCRmyPDF or ocrmypdf?

The program/project is OCRmyPDF and the name of the executable or library is ocrmypdf.