API reference

This page summarizes the rest of the public API. Generally speaking this should be mainly of interest to plugin developers.

ocrmypdf

ocrmypdf.exceptions

ocrmypdf.helpers

Support functions.

exception ocrmypdf.helpers.NeverRaise

An exception that is never raised.

class ocrmypdf.helpers.Resolution(x: T, y: T)

The number of pixels per inch in each 2D direction.

Resolution objects are considered “equal” for == purposes if they are equal to a reasonable tolerance.

flip_axis() Resolution[T]

Return a new Resolution object with x and y swapped.

property is_finite: bool

True if both x and y are finite numbers.

property is_square: bool

True if the resolution is square (x == y).

round(ndigits: int) Resolution

Round to ndigits after the decimal point.

take_max(vals: Iterable[Any], yvals: Iterable[Any] | None = None) Resolution

Return a new Resolution object with the maximum resolution of inputs.

take_min(vals: Iterable[Any], yvals: Iterable[Any] | None = None) Resolution

Return a new Resolution object with the minimum resolution of inputs.

to_int() Resolution[int]

Round to nearest integer.

to_scalar() float

Return the harmonic mean of x and y as a 1D approximation.

In most cases, Resolution is 2D, but typically it is “square” (x == y) and can be approximated as a single number. When not square, the harmonic mean is used to approximate the 2D resolution as a single number.

ocrmypdf.helpers.available_cpu_count() int

Returns number of CPUs in the system.

ocrmypdf.helpers.check_pdf(input_file: Path) bool

Check if a PDF complies with the PDF specification.

Checks for proper formatting and proper linearization. Uses pikepdf (which in turn, uses QPDF) to perform the checks.

ocrmypdf.helpers.clamp(n: T, smallest: T, largest: T) T

Clamps the value of n to between smallest and largest.

ocrmypdf.helpers.is_file_writable(test_file: PathLike) bool

Intentionally racy test if target is writable.

We intend to write to the output file if and only if we succeed and can replace it atomically. Before doing the OCR work, make sure the location is writable.

ocrmypdf.helpers.is_iterable_notstr(thing: Any) bool

Is this is an iterable type, other than a string?

ocrmypdf.helpers.monotonic(seq: Sequence) bool

Does this sequence increase monotonically?

ocrmypdf.helpers.page_number(input_file: PathLike) int

Get one-based page number implied by filename (000002.pdf -> 2).

ocrmypdf.helpers.pikepdf_enable_mmap() None

Enable pikepdf memory mapping.

ocrmypdf.helpers.remove_all_log_handlers(logger: Logger) None

Remove all log handlers, usually used in a child process.

The child process inherits the log handlers from the parent process when a fork occurs. Typically we want to remove all log handlers in the child process so that the child process can set up a single queue handler to forward log messages to the parent process.

ocrmypdf.helpers.safe_symlink(input_file: PathLike, soft_link_name: PathLike) None

Create a symbolic link at soft_link_name, which references input_file.

Think of this as copying input_file to soft_link_name with less overhead.

Use symlinks safely. Self-linking loops are prevented. On Windows, file copy is used since symlinks may require administrator privileges. An existing link at the destination is removed.

ocrmypdf.helpers.samefile(file1: PathLike, file2: PathLike) bool

Return True if two files are the same file.

Attempts to account for different relative paths to the same file.

ocrmypdf.hocrtransform

ocrmypdf.pdfa

ocrmypdf.quality

ocrmypdf.subprocess