API reference¶
This page summarizes the rest of the public API. Generally speaking this should be mainly of interest to plugin developers.
ocrmypdf¶
ocrmypdf.exceptions¶
ocrmypdf.helpers¶
Support functions.
- @ocrmypdf.helpers.deprecated(deprecated_in=None, removed_in=None, current_version=None, details='')¶
Decorate a function to signify its deprecation
- This function wraps a method that will soon be removed and does two things:
The docstring of the method will be modified to include a notice about deprecation, e.g., “Deprecated since 0.9.11. Use foo instead.”
Raises a
DeprecatedWarning
via thewarnings
module, which is a subclass of the built-inDeprecationWarning
. Note that built-inDeprecationWarning`s are ignored by default, so for users to be informed of said warnings they will need to enable them--see the :mod:`warnings
module documentation for more details.
- Parameters:
deprecated_in – The version at which the decorated method is considered deprecated. This will usually be the next version to be released when the decorator is added. The default is None, which effectively means immediate deprecation. If this is not specified, then the removed_in and current_version arguments are ignored.
removed_in – The version or
datetime.date
when the decorated method will be removed. The default is None, specifying that the function is not currently planned to be removed. Note: This parameter cannot be set to a value if deprecated_in=None.current_version – The source of version information for the currently running code. This will usually be a __version__ attribute on your library. The default is None. When current_version=None the automation to determine if the wrapped function is actually in a period of deprecation or time for removal does not work, causing a
DeprecatedWarning
to be raised in all cases.details – Extra details to be added to the method docstring and warning. For example, the details may point users to a replacement method, such as “Use the foo_bar method instead”. By default there are no details.
- ocrmypdf.helpers.NeverRaise()
An exception that is never raised.
Deprecated since version 15.4.0.
- class ocrmypdf.helpers.Resolution(x: T, y: T)
The number of pixels per inch in each 2D direction.
Resolution objects are considered “equal” for == purposes if they are equal to a reasonable tolerance.
- flip_axis() Resolution[T]
Return a new Resolution object with x and y swapped.
- property is_finite: bool
True if both x and y are finite numbers.
- property is_square: bool
True if the resolution is square (x == y).
- round(ndigits: int) Resolution
Round to ndigits after the decimal point.
- take_max(vals: Iterable[Any], yvals: Iterable[Any] | None = None) Resolution
Return a new Resolution object with the maximum resolution of inputs.
- take_min(vals: Iterable[Any], yvals: Iterable[Any] | None = None) Resolution
Return a new Resolution object with the minimum resolution of inputs.
- to_int() Resolution[int]
Round to nearest integer.
- to_scalar() float
Return the harmonic mean of x and y as a 1D approximation.
In most cases, Resolution is 2D, but typically it is “square” (x == y) and can be approximated as a single number. When not square, the harmonic mean is used to approximate the 2D resolution as a single number.
- ocrmypdf.helpers.available_cpu_count() int
Returns number of CPUs in the system.
- ocrmypdf.helpers.check_pdf(input_file: Path) bool
Check if a PDF complies with the PDF specification.
Checks for proper formatting and proper linearization. Uses pikepdf (which in turn, uses QPDF) to perform the checks.
- ocrmypdf.helpers.clamp(n: T, smallest: T, largest: T) T
Clamps the value of
n
to betweensmallest
andlargest
.
- ocrmypdf.helpers.is_file_writable(test_file: PathLike) bool
Intentionally racy test if target is writable.
We intend to write to the output file if and only if we succeed and can replace it atomically. Before doing the OCR work, make sure the location is writable.
- ocrmypdf.helpers.is_iterable_notstr(thing: Any) bool
Is this is an iterable type, other than a string?
- ocrmypdf.helpers.page_number(input_file: PathLike) int
Get one-based page number implied by filename (000002.pdf -> 2).
- ocrmypdf.helpers.pikepdf_enable_mmap() None
Enable pikepdf memory mapping.
- ocrmypdf.helpers.remove_all_log_handlers(logger: Logger) None
Remove all log handlers, usually used in a child process.
The child process inherits the log handlers from the parent process when a fork occurs. Typically we want to remove all log handlers in the child process so that the child process can set up a single queue handler to forward log messages to the parent process.
- ocrmypdf.helpers.safe_symlink(input_file: PathLike, soft_link_name: PathLike) None
Create a symbolic link at
soft_link_name
, which referencesinput_file
.Think of this as copying
input_file
tosoft_link_name
with less overhead.Use symlinks safely. Self-linking loops are prevented. On Windows, file copy is used since symlinks may require administrator privileges. An existing link at the destination is removed.