XHP cleanup

Hi,

I was interested to hear yesterday that there were discussions about
abandoning XHP as the file format for the help files, and use plain HTML
instead.

I am (so far) convinced that the actual format is not the real problem
here, and that with a bit of a cleanup, XHP will be as convenient as a
format as HTML would be - but with the advantage that:

* we can do the changes incrementally, no big-bang necessary

* there is no (or minimal) impact on the l10n

* it is not blocking any later migration to "something else"

* it keeps the semantics

* it keeps the possibility to embed help files between themselves

So let me propose some cleanups I'd like to do:

* get rid of the old attributes that were used only for the
  helpcontent -> helpcontent2 migration (like the 'oldref' or 'l10n'
  attribute)

* make the 'id' attribute non-mandatory, and instead check during the
  build for the presence of the id's that are referenced from somewhere

    + this needs to be done carefully not to affect l10n

* get rid of xml-lang attribute, and instead mark only the strings that
  are _not_ supposed to be translated.

* make role="paragraph" the default, so only the headings need to be
  marked

With this, the help descriptions would change from:

<paragraph role="heading" id="hd_id3145649" xml-lang="en-US" level="2" l10n="U" oldref="13">Heading</paragraph>
<paragraph role="paragraph" id="par_id3145663" xml-lang="en-US" l10n="U" oldref="14">The actual text...</paragraph>

to

<paragraph role="heading" level="2">Heading</paragraph>
<paragraph>The actual text...</paragraph>

which is hopefully not much more complex than HTML, and yet possible
incrementally, and without affecting l10n or other parts of the existing
workflow.

Going further, we can later change 'paragraph' to 'p', introduce 'h2' as
a shortcut for <paragraph role="heading" level="2"> too, if we with so;
but for the moment, I think there are XHP features that are worth
keeping, because as a format, it gives more semantics to the text than a
plain HTML would do.

Any objections, please? :slight_smile:

Thank you,
Kendy

Hello Jan

Hi,

I was interested to hear yesterday that there were discussions about
abandoning XHP as the file format for the help files, and use plain HTML
instead.

What documenters want is a easy way to insert, edit and update help
contents, as well as to add new, modern resources as multimedia and
graphics in the help pages.

I am (so far) convinced that the actual format is not the real problem
here, and that with a bit of a cleanup, XHP will be as convenient as a
format as HTML would be - but with the advantage that:

The format is not the problem as long as we have tools to edit the
contents with the benefits as listed above. The only XHP rich editor is
HelpAuthoring extension, which is still buggy and demand a long and
steep learning process. By contrast, users and volunteers are much more
comfortable with editors available in CMSs, forums and wikis, such as
TinyMCE, CKEditor, or any other markdown editor.

* we can do the changes incrementally, no big-bang necessary

* there is no (or minimal) impact on the l10n

* it is not blocking any later migration to "something else"

* it keeps the semantics

* it keeps the possibility to embed help files between themselves

The major use we made of XHP is to display information for reading and
to source information for an search index (the <bookmark> tags).

To be displayed, XHP is transformed into HTML and presented in a
specially tweaked Writer/Web module. Bookmarks are handled separately.

So let me propose some cleanups I'd like to do:

* get rid of the old attributes that were used only for the
  helpcontent -> helpcontent2 migration (like the 'oldref' or 'l10n'
  attribute)

OK. but cleaning up useless attributes in XHP such as oldref= and l10n=
should not trigger a fuzzy state in our translation process.

* make the 'id' attribute non-mandatory, and instead check during the
  build for the presence of the id's that are referenced from somewhere

    + this needs to be done carefully not to affect l10n

ID is absolutely mandatory because "filepath+filename+ID" sets the
uniqueness of the string in the help system for the translation process.
If we change the format XHP to format ABC, then there must be a
one-to-one relation between IDs in XHP to IDs in ABC. Another constraint
is that once an ID is set for a string, it must remain the same forever
for that string.

So, we must keep "id" and also "localize".

* get rid of xml-lang attribute, and instead mark only the strings that
  are _not_ supposed to be translated.

OK

* make role="paragraph" the default, so only the headings need to be
  marked

With this, the help descriptions would change from:

<paragraph role="heading" id="hd_id3145649" xml-lang="en-US" level="2" l10n="U" oldref="13">Heading</paragraph>
<paragraph role="paragraph" id="par_id3145663" xml-lang="en-US" l10n="U" oldref="14">The actual text...</paragraph>

to

<paragraph role="heading" level="2">Heading</paragraph>
<paragraph>The actual text...</paragraph>

which is hopefully not much more complex than HTML, and yet possible
incrementally, and without affecting l10n or other parts of the existing
workflow.

Going further, we can later change 'paragraph' to 'p', introduce 'h2' as
a shortcut for <paragraph role="heading" level="2"> too, if we with so;
but for the moment, I think there are XHP features that are worth
keeping, because as a format, it gives more semantics to the text than a
plain HTML would do.

Perfect.
As I put above, we must keep id="...". That is not a issue at all. Note
that if the format ABC is actually HTML, your example turns to
<h2 id="hd_id3145649">Heading</h2>
<p id="par_id3145663">The actual text...</p>

and we keep the uniqueness of the ID, our translators (including me) are
happy and users thrilled to use their preferred markdown javascript editor.

Let me think a bit on the "localize=" attribute...

Any objections, please? :slight_smile:

I will build a wiki document comparing and commenting XHP and HTML5, for
evaluation. This is not an endorsement of HTML, though.

Thank you

Hi Olivier,

Olivier Hallot píše v Pá 16. 12. 2016 v 08:52 -0200:

> I am (so far) convinced that the actual format is not the real problem
> here, and that with a bit of a cleanup, XHP will be as convenient as a
> format as HTML would be - but with the advantage that:

The format is not the problem as long as we have tools to edit the
contents with the benefits as listed above. The only XHP rich editor is
HelpAuthoring extension, which is still buggy and demand a long and
steep learning process. By contrast, users and volunteers are much more
comfortable with editors available in CMSs, forums and wikis, such as
TinyMCE, CKEditor, or any other markdown editor.

Sure; the thing is that from my point of view it's easier to tweak such
an existing tool to consume the XHP markup, than a big bang conversion
to a general HTML that on one hand loses semantics (like the <section>
or <embed> tags), and on the other lets very free-form stuff going in
(should people use <b>'s and <i>'s? Or <emph>'s and <strong>'s? Or
just div's and css?).

This can easily become quite messy, so having a stricter XML (like the
XHP) is useful from my point of view, so that we can extend the markup
where we need in a targeted way (eg. to be able to build books from
that, or to add the multimedia content or so).

> * get rid of the old attributes that were used only for the
> helpcontent -> helpcontent2 migration (like the 'oldref' or 'l10n'
> attribute)

OK. but cleaning up useless attributes in XHP such as oldref= and l10n=
should not trigger a fuzzy state in our translation process.

Yes, I've already synced with Cloph on that, see the other mail :slight_smile:

> * make the 'id' attribute non-mandatory, and instead check during the
> build for the presence of the id's that are referenced from somewhere
>
> + this needs to be done carefully not to affect l10n

ID is absolutely mandatory because "filepath+filename+ID" sets the
uniqueness of the string in the help system for the translation process.
If we change the format XHP to format ABC, then there must be a
one-to-one relation between IDs in XHP to IDs in ABC. Another constraint
is that once an ID is set for a string, it must remain the same forever
for that string.

So, we must keep "id" and also "localize".

The id is mandatory only if there are two (or more) same strings in one
file, otherwise the uniqueness is given by the filepath + filename + the
string itself.

This can be easily mandated by a git hook that would check this (or even
generate the id's in cases where necessary).

> Going further, we can later change 'paragraph' to 'p', introduce 'h2' as
> a shortcut for <paragraph role="heading" level="2"> too, if we with so;
> but for the moment, I think there are XHP features that are worth
> keeping, because as a format, it gives more semantics to the text than a
> plain HTML would do.
Perfect.
As I put above, we must keep id="...". That is not a issue at all. Note
that if the format ABC is actually HTML, your example turns to
<h2 id="hd_id3145649">Heading</h2>
<p id="par_id3145663">The actual text...</p>

and we keep the uniqueness of the ID, our translators (including me) are
happy and users thrilled to use their preferred markdown javascript editor.

The complexity here lies in the structure, not in the markup. We need
an editor that "understands" the structure, unfortunately. But again, I
believe that's not a hard problem to sort out :slight_smile:

In my ideal world, I'd like to see a button in the help next to the
paragraph with "Improve this text", that would lead through some google
(or so) sign-in to a web-based editor where the person would update the
text, check a checkbox agreeing with the license, and it'd be submitted
to gerrit for review.

Let me think a bit on the "localize=" attribute...

> Any objections, please? :slight_smile:

I will build a wiki document comparing and commenting XHP and HTML5, for
evaluation. This is not an endorsement of HTML, though.

A cheat-sheet of xhp vs. html markup would be indeed useful; just to
show people that xhp is not that bad if we get rid of the attributes
that we don't need, but are repeated all over the place, making it look
complex :slight_smile:

All the best,
Kendy

Hello world & Kendy,

Going further, we can later change 'paragraph' to 'p', introduce 'h2' as
a shortcut for <paragraph role="heading" level="2"> too, if we with so;
but for the moment, I think there are XHP features that are worth
keeping, because as a format, it gives more semantics to the text than a
plain HTML would do.

so while you're at it (the previous mail's been perhaps too much content for a
single chunk of text & normal human attention span) ...

... can you enlighten those of use who haven't been here for so very long,
what those killer features of XHP format worth keeping are?

Hi Bubli,

Katarina Behrens píše v Pá 16. 12. 2016 v 18:20 +0100:

> Going further, we can later change 'paragraph' to 'p', introduce 'h2' as
> a shortcut for <paragraph role="heading" level="2"> too, if we with so;
> but for the moment, I think there are XHP features that are worth
> keeping, because as a format, it gives more semantics to the text than a
> plain HTML would do.

so while you're at it (the previous mail's been perhaps too much content for a
single chunk of text & normal human attention span) ...

... can you enlighten those of use who haven't been here for so very long,
what those killer features of XHP format worth keeping are?

tl;dr: Semantics & focus on what it is used for.

All the best,
Kendy