Date: prev next · Thread: first prev next last
2013 Archives by date, by thread · List index


Michael Meeks <michael.meeks@suse.com>:
      I was curious about what you'd like to hack on here :-)

I wrote and maintain a tool called 'doclifter' that lifts manual pages
(and most other kinds of documents witten in troff-based markups) into
DocBook-XML.  This is a useful tool for several reasons; one is that the 
XML can be used to generate higher-quality HTML than you get from a
presentation-level troff to HTML translation.  If all manual pages lifted
cleanly, generating a nice web view of all the world's documentation
would be easy.

Unfortunately, troff markup is such a badly structured tag soup that
automatic lifting doesn't always work. By dint of a bunch of compiler
technology and a couple hundred cliche-recognition rules, doclifter
does a pretty good job; on the 12K pages shipped with a stock Linux
distribution it lifts about 94% of the eligible targets cleanly
without patches.

Most of the remaining 6% of troff pages contain markup that is
outright broken even in troff terms.  Your pages, which had an
incorrect \fb where a \fB was needed, are good examples.

One of my longer-term projects is cleaning up the Linux/Unix manual-page
corpus so that remaining 4% gets fixed and becomes automatically liftable.
I've been working on this since 2002, and have shipped about 2000 patches
upstream to several hundred projects.

Recently I fixed up all the X man pages.  Current statistics:

11923   100%    Total pages in stock Ubuntu 13.04
917     7.69%   Already made from XML-DocBook or Doxygen, not eligible.
10270   86.14%  Clean lift from troff, no problems
721     6.02%   Clean lift with a fix patch.
8       0.07%   Internal error in doclifter
7       0.08%   Incorrect (non-validating) XML generated.

You just got your patches.  The LibreOffice pages now lift clean.

Very occasionally (once every year or two) I run a validation pass on
as much of the manual-page universe as I can easily get my hands on.
In the future, if your pages develop any problems due to careless
changes, I'll ship you another fix.  Otherwise I have no specially
concentrated interest in LibreOffice, sorry. I think it's a good thing
that the suite exists, but I don't use it myself.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.