TET Product Family

Name: TET Product Family
SKU: 1398
Price: 1 IDR
Availability: InStock

Category: Utilities

Rp1

Add to wishlist

Compare

Description
Reviews (0)

What is PDFlib TET?

PDFlib TET (Text and Image Extraction Toolkit) reliably extracts text, images and metadata from PDF documents. TET makes available the text contents of a PDF as Unicode strings, plus detailed color, glyph and font information as well as the position on the page. Raster images are extracted in common image formats. TET optionally converts PDF documents to an XML-based format called TETML which contains text and metadata as well as resource information. TET contains advanced content analysis algorithms for determining word boundaries, grouping text into columns, identifying table structures and removing redundant items such as shadow text.

With PDFlib TET you can:

Implement the PDF indexer for a search engine
Repurpose text and images in PDFs
Convert the contents of PDFs to other formats
Process PDFs based on their contents, e.g. splitting based on headings (requires PDFlib+PDI in addition to TET)
Check whether a particular location on the page is empty, e.g. for placing a barcode or stamp
TET also includes the pCOS interface for querying details about a PDF document such as document information fields and XMP metadata, font lists, page size, and many more (see pCOS product description and pCOS Cookbook)