Date: prev next · Thread: first prev next last
2013 Archives by date, by thread · List index


On 03/21/2013 06:06 AM, Thomas Blasejewicz wrote:

(2013/03/21 15:07), M. Fioretti wrote:
On Wed, Mar 20, 2013 21:02:22 PM -0400, Virgil Arrington wrote:

1. With LO, I use the "Save as" command to save the document in an
HTML format. Do NOT use the Export to HTML feature as, for whatever
reason, the resulting HTML file is not nearly as clean as when using
the "Save as" command. Once saved in the HTML format, I then load
it...
I published a shell script to automate conversion with LO from
{.doc,.odt...}  to CLEAN HTML here:

http://www.techrepublic.com/blog/opensource/how-to-convert-doc-and-odf-files-to-clean-and-lean-html/3708

That sounds very promising ... BUT ...
as an ordinary mortal man I am NOT CAPABLE of understanding that article.
The technical language eludes me completety.
So does the technique itself. I have absolutely no idea as to what to do with that "script".

Is there any chance of selecting a certain file, click somewhere and wait until the conversion process completes (automatically)?
I agree. Clearly he understands what needs to be done, but what he has written is for people with a high level understanding in programing. Such is not the case with the average person subscribed to this list who would like to use the script. Two things seem to be missing: a more in depth explanation of the parts of script, and examples where .odt and .doc files are converted to clean HTML (one for .odt and one for .doc).
     Examples:

soffice --headless --convert-to output_file_extension[:output_filter_name] [--outdir output_dir] 
files

soffice --headless (This part I understand.)

--convert-to output_file_extension[:output_filter_name] [--outdir output_dir] files

I have no idea what the components of this are. What part goes with what? The only thing that I do understand is that the things contained in brackets are optional. What is this?:

output_file_extension[:output_filter_name]

What is this? What is its purpose?

[--outdir output_dir]

What is the purpose for ending the entire command line with the term "files"? What files? Can several files be listed? Can * be used in place of "files" to batch convert all the files in a folder? Examples please!
Another problem line in the article:

 convert_doc_to_html.sh SOURCE_DIR TARGET_DIR

As I understand script files, "convert_doc_to_html.sh" is the name of a script file. Source directory and target directory of what? Here a simple explanation would be helpful. For example, add this to the line:

(SOURCE_DIR is where the file to be created is located, and TARGET_DIR is where you want the 
converted HTML file to be created.

Another suggestion: Describe the script file before listing the code for it. Include directions for creating a temporary folder (directory) to contain the .doc or .odt files to be converted. This way lines 4 and 5 can be kept as it: the folder is after all temporary. Also include directions for creating the folder to contain the converted HTML files.

Include more detailed instructions on how to create the /tidy_options.conf/ file and where to save it.

I must admit that having to reread your article several times while writing this email has given me a better understanding of what you wrote. It has taken that long for me to be able to piece together what you wrote. Even so, I may still miss some parts because I do not understand even some of the fundamentals of programming languages. (I wonder how many others don't either.)

--Dan

--
For unsubscribe instructions e-mail to: users+help@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.