Date: prev next · Thread: first prev next last
2015 Archives by date, by thread · List index


Hello everybody.
Probably the title of this post is not very clear, sorry for that ;).

I have a bunch of text (html code) and need to find <p> tags with their classes, id, styles (if any) etc. I'm doing this using the following regexs:
<p(.*?)> or (<p([^>]+))>

The pattern of my text is here:

<p class="navi_buttons">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>

<p class="reg">Aliquam mi sapien, rutrum eget sem vel, semper efficitur.<a href="xyz.html" class="topiclink">vitae velit</a></p>

<p class="THIS_SHOULD_BE_AVOIDED">Donec fringilla sapien vitae interdum volutpat.</p>

<p class="nav">Cras nec orci non dolor ultrices luctus sit amet vitae velit.</p>


The problem is that I need to find every occurrence of <p> tag except one certain class (i.e. I want to avoid paragraph tags of this class). I don't know how to write a regex exclusion that is treated as a string, not a set of the individual characters? I tried to use back-references, with no success. I want to use regex because the tag classes, to be avoided, are different on each page (but they keep a certain pattern) and a the job should be done as automatic as possible (the code should be as versatile as possible).
I will appreciate any help. Kind regards,

gordom

--
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.