[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [libreoffice-users] Multiple newlines


At 16:38 20/07/2020 -0400, John Kaufmann wrote:
Documents archived in Project Gutenberg are typically simple text, with each line ending in <CR><LF> (Hex:0D0A), so that paragraphs are separated by an empty line <CR><LF><CR><LF>. I thought it would be simple to convert one such (5657.txt) to format in Writer, ...

It is.

... but stumbled on elementary problems in Find-&-Replace [Ctrl-H] using regular expressions:
(1) "\n" is not found. Should not "\n" match one of the codes in <CR><LF>? [If not, what code(s) should "\n" match?]

First, once you have your text in a word processor, you do not have <CR> or <LF> or <CR><LF> or anything else like that in your text; instead you have *paragraph breaks*. There is no character there, despite the pilcrow that you can get Writer to display. And what you are calling "empty lines" are actually empty paragraphs. "\n" in the "Search for" field matches line breaks, not paragraph breaks. (And line beaks are line breaks - also no "codes".)

(2) Although "$" is found (matches to <CR><LF>), ...

No, "$" does not match anything; instead, it anchors the expression before it to the end of a paragraph. So an expression ending with "$" will match text only if it comes at the end of its paragraph.

... "$$" (for successive occurrences of <CR><LF>) is not found. Why?

"$$" has no sense. If anything it means "this pattern needs to match something that is *really, really* at the end of a paragraph"!

(3) Doing Find "$" & Replace with " " (single space), <CR><LF> is replaced by " " (single space). However, doing Find "$" & Replace with "@" (single @char), <CR><LF> is replaced by "@@" (double @char). Why?

I don't think that's true. In any case, there are no <CR><LF>s present.

To achieve what you want:

First combine single-line paragraphs:
o Apply Default paragraph style to all the text.
o Select all the text.
o Apply AutoCorrect.
(You may need to adjust the minimum length of such paragraphs in AutoCorrect Options - possibly to 0%.)

Then remove empty paragraphs:
o Search for "^$" (no quotes) and replace with nothing.
("^" anchors your pattern to the start of a paragraph and "$" to the end. So "^$" matches a paragraph with nothing in it.)

I trust this helps.

Brian Barker


--
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Privacy Policy: https://www.documentfoundation.org/privacy

Follow-Ups:
Re: [libreoffice-users] Multiple newlinesJohn Kaufmann <kaufmann@nb.net>
References:
[libreoffice-users] Multiple newlinesJohn Kaufmann <kaufmann@nb.net>
Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.