Date: prev next · Thread: first prev next last
2025 Archives by date, by thread · List index


Hi,

Subject says it all: I found a book  >200 pages of genealogical data (Papal Zouave army 1860-70). There's a good quality PDF in the BNF Gallica, and a worse one I found on Geneanet. But: this one has been OCRed. My own OCR-capaity has only one limit, but it is an extremely low one.

I need the data in a database or spreadsheet.  That's not very difficult:  been there, done that. but most of the times my workhorse from the last 30 year,MS Office, was sufficient.   But to get to my target, I need a textfile to apply a series of S&R commands to turn it into .CSV,  a route with no problems for me.

1. Adobe can't export or 'save as' the PDF, because it's somehow protected.
2. Microsoft (Word 2007 and 2016) can't open it or import it because it
   is Microsoft, and can't be botered
3. I tried to open the PDF in LibreOffice, and when dinner was
   finished, so was LO: it had fabricated a Big Beatiful Bill File.

This file sat  in a nameless window overlaying LO Draw. I am not proficient in LO, and this was the first time I encountered such a floating window. It appeared that I could select a page in the file,  click on it and have the graphic image selected. Delete and only the text remained. NB: text in a Draw file... Is that useful? But it also proved error-prone: sometimes I deleted not the graphic image, but the whole page.  So besides this being a tedious and long process, this is just not the ideal way to handle say 250 pages.

 * If somehow I get this text without the graphics in a LO Draw file,
   will I be able to make a Writes file out of it?
 * Is there a better route between the PDF and a .csv file?


Leo




--
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Privacy Policy: https://www.documentfoundation.org/privacy

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.