Re: PDF processing – The Document Foundation Mailing List Archives

Thorsten Behrens <thb -AT- libreoffice.org>

Thu, 5 Mar 2020 13:37:23 +0100

Hi, Michael Weghorn wrote:

On 03/03/2020 12.26, Pietro Paolini wrote:

I wanted to have a look at the source code
to see if there is some sort of PDF "model" being built from the
original PDF document, for instance a  set of objects each describing
the graphic meanings of a particular region within the page.


At a quick glance, 'sdext/source/pdfimport' looks like a good place to
start with; I personally don't know more related to your more specific
question.

Yep, that's the place - we currently use poppler to parse the PDF, then generate a tree of quite basic drawing operations from it. Check sdext/source/pdfimport/tree/genericelements.cxx for the type of objects in that tree, and sdext/source/pdfimport/tree/{draw|writer}treevisiting.cxx for a visitor-pattern kind of tree walking - for your need, you could e.g. check the object boundaries for each visited object, to check if they intersect with your region of interest. Cheers, -- Thorsten

Attachment: signature.asc
Description: PGP signature

Context

PDF processing · Pietro Paolini

Re: PDF processing · Michael Weghorn
- Re: PDF processing · Thorsten Behrens
Re: PDF processing · Lionel Élie Mamane

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.