Providers in Docprompt
Overview
Providers in Docprompt are abstract interfaces that define how to add data to document nodes. They encapsulate various tasks such as OCR, classification, and more. The provider system is designed to be extensible, allowing users to create custom providers to add new functionality to Docprompt.
Key Concepts
AbstractTaskProvider
The AbstractTaskProvider
is the base class for all providers in Docprompt. It defines the interface that all task providers must implement.
class AbstractTaskProvider(Generic[PageTaskResult]):
name: str
capabilities: List[str]
def process_document_pages(
self,
document: Document,
start: Optional[int] = None,
stop: Optional[int] = None,
**kwargs,
) -> Dict[int, PageTaskResult]:
raise NotImplementedError
def contribute_to_document_node(
self,
document_node: "DocumentNode",
results: Dict[int, PageTaskResult],
) -> None:
pass
def process_document_node(
self,
document_node: "DocumentNode",
start: Optional[int] = None,
stop: Optional[int] = None,
contribute_to_document: bool = True,
**kwargs,
) -> Dict[int, PageTaskResult]:
# ... implementation ...
Key features:
- Generic type PageTaskResult
allows for type-safe results
- capabilities
list defines what the provider can do
- process_document_pages
method processes pages of a document
- contribute_to_document_node
method adds results to a DocumentNode
- process_document_node
method combines processing and contributing results
CAPABILITIES
The CAPABILITIES
enum defines the various capabilities that a provider can have:
class CAPABILITIES(Enum):
PAGE_RASTERIZATION = "page-rasterization"
PAGE_LAYOUT_OCR = "page-layout-ocr"
PAGE_TEXT_OCR = "page-text-ocr"
PAGE_CLASSIFICATION = "page-classification"
PAGE_SEGMENTATION = "page-segmentation"
PAGE_VQA = "page-vqa"
PAGE_TABLE_IDENTIFICATION = "page-table-identification"
PAGE_TABLE_EXTRACTION = "page-table-extraction"
ResultContainer
The ResultContainer
is a generic class that holds the results of a task:
class ResultContainer(BaseModel, Generic[PageOrDocumentTaskResult]):
results: Dict[str, PageOrDocumentTaskResult] = Field(
description="The results of the task, keyed by provider", default_factory=dict
)
@property
def result(self):
return next(iter(self.results.values()), None)
Creating Custom Providers
To extend Docprompt's functionality, you can create custom providers. Here's an shortened example of a builtin OCR provider from GCP:
from docprompt.tasks.base import AbstractTaskProvider, CAPABILITIES
from docprompt.schema.layout import TextBlock
from pydantic import Field
class OcrPageResult(BasePageResult):
page_text: str = Field(description="The text for the entire page in reading order")
word_level_blocks: List[TextBlock] = Field(default_factory=list)
line_level_blocks: List[TextBlock] = Field(default_factory=list)
block_level_blocks: List[TextBlock] = Field(default_factory=list)
raster_image: Optional[bytes] = Field(default=None)
class GoogleOcrProvider(AbstractTaskProvider[OcrPageResult]):
name = "Google Document AI"
capabilities = [
CAPABILITIES.PAGE_TEXT_OCR.value,
CAPABILITIES.PAGE_LAYOUT_OCR.value,
CAPABILITIES.PAGE_RASTERIZATION.value,
]
def process_document_pages(
self,
document: Document,
start: Optional[int] = None,
stop: Optional[int] = None,
**kwargs,
) -> Dict[int, OcrPageResult]:
# Implement OCR logic here
pass
def contribute_to_document_node(
self,
document_node: "DocumentNode",
results: Dict[int, OcrPageResult],
) -> None:
# Add OCR results to document node
pass
Usage
Here's how you can use a provider in your Docprompt workflow:
from docprompt import load_document, DocumentNode
from docprompt.providers.ocr import GoogleOcrProvider
# Load a document
document = load_document("path/to/my.pdf")
document_node = DocumentNode.from_document(document)
# Create and use the OCR provider
ocr_provider = GoogleOcrProvider(...)
ocr_results = ocr_provider.process_document_node(document_node)
# Access OCR results
for page_number, result in ocr_results.items():
print(f"Page {page_number} text: {result.page_text[:100]}...")
Benefits of Using Providers
- Extensibility: Easily add new functionality to Docprompt by creating custom providers.
- Modularity: Each provider encapsulates a specific task, making the codebase more organized and maintainable.
- Type Safety: Generic types ensure that providers produce and consume the correct types of results.
- Standardized Interface: All providers follow the same interface, making it easy to switch between different implementations.
- Capability-based Design: Providers declare their capabilities, allowing for dynamic feature discovery and usage.
By leveraging the provider system in Docprompt, you can create flexible and powerful document processing pipelines that can be easily extended and customized to meet your specific needs.