At 16:38 20/07/2020 -0400, John Kaufmann wrote:
Documents archived in Project Gutenberg are typically simple text,
with each line ending in <CR><LF> (Hex:0D0A), so that paragraphs are
separated by an empty line <CR><LF><CR><LF>. I thought it would be
simple to convert one such (5657.txt) to format in Writer, ...
... but stumbled on elementary problems in Find-&-Replace [Ctrl-H]
using regular expressions:
(1) "\n" is not found. Should not "\n" match one of the codes in
<CR><LF>? [If not, what code(s) should "\n" match?]
First, once you have your text in a word processor, you do not have
<CR> or <LF> or <CR><LF> or anything else like that in your text;
instead you have *paragraph breaks*. There is no character there,
despite the pilcrow that you can get Writer to display. And what you
are calling "empty lines" are actually empty paragraphs. "\n" in the
"Search for" field matches line breaks, not paragraph breaks. (And
line beaks are line breaks - also no "codes".)
(2) Although "$" is found (matches to <CR><LF>), ...
No, "$" does not match anything; instead, it anchors the expression
before it to the end of a paragraph. So an expression ending with "$"
will match text only if it comes at the end of its paragraph.
... "$$" (for successive occurrences of <CR><LF>) is not found. Why?
"$$" has no sense. If anything it means "this pattern needs to match
something that is *really, really* at the end of a paragraph"!
(3) Doing Find "$" & Replace with " " (single space), <CR><LF> is
replaced by " " (single space). However, doing Find "$" & Replace
with "@" (single @char), <CR><LF> is replaced by "@@" (double @char). Why?
I don't think that's true. In any case, there are no <CR><LF>s present.
To achieve what you want:
First combine single-line paragraphs:
o Apply Default paragraph style to all the text.
o Select all the text.
o Apply AutoCorrect.
(You may need to adjust the minimum length of such paragraphs in
AutoCorrect Options - possibly to 0%.)
Then remove empty paragraphs:
o Search for "^$" (no quotes) and replace with nothing.
("^" anchors your pattern to the start of a paragraph and "$" to the
end. So "^$" matches a paragraph with nothing in it.)
I trust this helps.
To unsubscribe e-mail to: email@example.com
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Impressum (Legal Info)
: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (MPLv2
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our trademark policy