Date: prev next · Thread: first prev next last
2025 Archives by date, by thread · List index


Leo -

Under Linux there's a utility 'pdftotext' that might be able to produce a .txt file with the plain-text contents of your PDF file. I don't know what's available under Windows (if that's what you're using).

(You say 'text in a Draw file... Is that useful?' – by their nature, draw programmes (unlike paint programmes) do retain the original text of whatever appears graphically.)

- Robert

On 2025-06-08 19:45, Leo L te Braake wrote:
Hi,

Subject says it all: I found a book  >200 pages of genealogical data
(Papal Zouave army 1860-70). There's a good quality PDF in the BNF
Gallica, and a worse one I found on Geneanet. But: this one has been
OCRed. My own OCR-capaity has only one limit, but it is an extremely low
one.

I need the data in a database or spreadsheet.  That's not very
difficult:  been there, done that. but most of the times my workhorse
from the last 30 year,MS Office, was sufficient.   But to get to my
target, I need a textfile to apply a series of S&R commands to turn it
into .CSV,  a route with no problems for me.

1. Adobe can't export or 'save as' the PDF, because it's somehow protected.
2. Microsoft (Word 2007 and 2016) can't open it or import it because it
   is Microsoft, and can't be botered
3. I tried to open the PDF in LibreOffice, and when dinner was
   finished, so was LO: it had fabricated a Big Beatiful Bill File.

This file sat  in a nameless window overlaying LO Draw. I am not
proficient in LO, and this was the first time I encountered such a
floating window. It appeared that I could select a page in the file,
click on it and have the graphic image selected. Delete and only the
text remained. NB: text in a Draw file... Is that useful?
But it also proved error-prone: sometimes I deleted not the graphic
image, but the whole page.  So besides this being a tedious and long
process, this is just not the ideal way to handle say 250 pages.

 * If somehow I get this text without the graphics in a LO Draw file,
   will I be able to make a Writes file out of it?
 * Is there a better route between the PDF and a .csv file?


Leo

--
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Privacy Policy: https://www.documentfoundation.org/privacy

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.