Re: [libreoffice-users] Re: LO Writer (Linux): Way to do text search in set of documents?

Tom Davies <tomcecf -AT- gmail.com>
Mon, 25 Aug 2014 20:54:17 +0100

Hi :)
I think it's easier to just edit the bash script isn't it?

Surely to get it's output into a file all that is needed is something like

   > filename.txt

to be added to the end of the relevant lines?  or better would be if it
could keep adding to the end of a file after first creating the file with
the first bit of output.  I think Python is a bit of an over-kill for this
although it might be really nice to have as a permanent Extension written
in a decent language like Python.
Regards from
Tom :)




On 25 August 2014 17:53, Paul D. Mirowsky <p_mirowsky@bentaxna.com> wrote:

If the Python code were modified to also add filename with path and inject
it at end of paragraph as URL.

It might be possible to re-direct python command output to a .txt file
that could be opened by Writer.

I am not sure whether or not Writer could be set to recognize and "Open
File URL" automatically to modify original document.

Hhhhhmmmmm



On 8/25/2014 10:03 AM, P. . wrote:

Try this, even if it isn't exactly an 'out of the box' solution, it
can be useful:
in few words, the script parses the xml file inside the .odt - in fact
an archive file, and search for a keyword after having extracted the
text part.

A short excerpt, from the page 3 of "Extract and Parse ODF Files with
Python":
"In this particular program, I collect all the text as a list of
paragraphs, and then I search for the keywords passed in from the
command line. If the searched word matches, the paragraph is printed
out.

The text found in each <text:p> is Unicode text. You have to convert
this to normal text in order to print correctly and/or use in a
widget. The encode() command translates the Unicode to a printable
string. "


<http://www.linuxjournal.com/article/9347?page=0,2>


On 25 August 2014 15:31, Paul <paulsteyn1@afrihost.co.za> wrote:

Well, it does seem like all your mails do this,

<snip />

On Mon, 25 Aug 2014 13:41:14 +0100
Tom Davies <tomcecf@gmail.com> wrote:

 Hi :)

I suspect that Paul's post below has not yet arrived in Maurice's
time-line.

<snip />

On the other hand it might be good if someone could test Paul's
script. Perhaps it's possible to combine the 2 ideas so that both
the file-name AND the few lines of surrounding text could be
output? Would that help?  Also it might be good to have the
output directed into a file rather than just onto the
command-line?

<snip />

Regards from

Tom :)



On 24 August 2014 19:29, Paul <paulsteyn1@afrihost.co.za> wrote:

 Try changing the line:


      unzip -ca "$file" content.xml | grep -ql "$1"

to:

      unzip -ca "$file" content.xml | grep -qC 10 "$1"

the "-l" to grep makes it show only the names of files that
match, not the content. The "-C #" gives # lines of context
around the match. Or you could use "-B #" and "-A #" to print #
lines of leading and trailing conext, respectively.

You could also make a script to pull the contents of all the
files and concatenate them in such a way that you can use
Writer to do find inside one big document, but that would be
considerably harder. Try this first.


Paul



Disclaimer: I haven't actually tested this, just done a "man
grep", but I think the syntax is right...




On Sun, 24 Aug 2014 18:16:35 +0000 (UTC)
Maurice <maurice@bcs.org.uk> wrote:

 On Sun, 24 Aug 2014 11:44:31 -0500, Don Pobanz wrote:


<snip />


-- 
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context

Re: [libreoffice-users] Re: LO Writer (Linux): Way to do text search in set of documents? (continued)

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.