_pdfium
chunk_iterable(iterable, chunk_size)
Splits an iterable into chunks of specified size, distributing the remainder evenly.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iterable |
Iterable[T]
|
The iterable to be chunked. |
required |
chunk_size |
int
|
The desired size of each chunk. |
required |
Returns:
Type | Description |
---|---|
List[List[T]]
|
List[List[T]]: A list of lists, where each sublist is a chunk. |
Source code in docprompt/_pdfium.py
get_pdfium_document(fp, password=None)
Loads a PDF document with a lock to prevent race conditions in threaded environments
Source code in docprompt/_pdfium.py
rasterize_page_with_pdfium(fp, page_number, *, return_mode='pil', post_process_fn=None, **kwargs)
Rasterizes a page of a PDF document
Source code in docprompt/_pdfium.py
rasterize_pdf_with_pdfium(fp, password=None, *, return_mode='pil', post_process_fn=None, **kwargs)
Rasterizes an entire PDF using PDFium and a pool of workers
Source code in docprompt/_pdfium.py
rasterize_pdfs_with_pdfium(fps, passwords=None, *, return_mode='pil', post_process_fn=None, **kwargs)
Like 'rasterize_pdf_with_pdfium', but optimized for multiple PDFs by loading all PDF's into the workers memory space