Date:
prev next ·
Thread:
first prev next last
Hi,
Subject says it all: I found a book >200 pages of genealogical data
(Papal Zouave army 1860-70). There's a good quality PDF in the BNF
Gallica, and a worse one I found on Geneanet. But: this one has been
OCRed. My own OCR-capaity has only one limit, but it is an extremely low
one.
I need the data in a database or spreadsheet. That's not very
difficult: been there, done that. but most of the times my workhorse
from the last 30 year,MS Office, was sufficient. But to get to my
target, I need a textfile to apply a series of S&R commands to turn it
into .CSV, a route with no problems for me.
1. Adobe can't export or 'save as' the PDF, because it's somehow protected.
2. Microsoft (Word 2007 and 2016) can't open it or import it because it
is Microsoft, and can't be botered
3. I tried to open the PDF in LibreOffice, and when dinner was
finished, so was LO: it had fabricated a Big Beatiful Bill File.
This file sat in a nameless window overlaying LO Draw. I am not
proficient in LO, and this was the first time I encountered such a
floating window. It appeared that I could select a page in the file,
click on it and have the graphic image selected. Delete and only the
text remained. NB: text in a Draw file... Is that useful?
But it also proved error-prone: sometimes I deleted not the graphic
image, but the whole page. So besides this being a tedious and long
process, this is just not the ideal way to handle say 250 pages.
* If somehow I get this text without the graphics in a LO Draw file,
will I be able to make a Writes file out of it?
* Is there a better route between the PDF and a .csv file?
Leo
--
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Privacy Policy: https://www.documentfoundation.org/privacy
Context
- [libreoffice-users] Scanned and OCR's PDF to text · Leo L te Braake
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.