Getting rid of 'oldref' in the help files

Hi Cloph, all,

I've recently proposed some help cleanups on the documentation@ ML, and
this is the first one of them. I'm cross-posting to l10n@ and
documentation@ - this change is supposed to be transparent for L10n and
Documentation teams, but they should know :slight_smile:

The idea is to get rid of the 'oldref' attribute in the help files; ie.
change

  <paragraph role="heading" id="hd_id3145649" xml-lang="en-US" level="2" l10n="U" oldref="13">Heading</paragraph>

to

  <paragraph role="heading" id="hd_id3145649" xml-lang="en-US" level="2" l10n="U">Heading</paragraph>

The 'oldref' comes from helpcontent -> helpcontent2 migration, and the
documentation says "This contains the reference number used by the old
help files and is only used for migration purposes."

Unfortunately, it is used in the msgctx flag in the .po files, like:

#: main0503.xhp
msgctxt ""
"main0503.xhp\n"
"hd_id3155084\n"
"21\n"
"help.text"
msgid "Flexible Application Interface"
msgstr "Snadno přizpůsobitelné uživatelské rozhraní"

The "21\n" above is the oldref.

As we talked on the IRC - unless there are any objections, can you
please do your magic with the next translation update so that we remove
these oldrefs from the helpcontent, the .po templates, and .po
translations themselves?

The helpcontent2 part of that is this:

  git grep -l 'oldref="[0-9]*"' | xargs sed -i 's/ *\<oldref="[0-9]*" *//'

Thank you in advance!

[Or - any objections to this change?]

All the best,
Kendy

this change is supposed to be transparent for L10n and
Documentation teams, but they should know :slight_smile:

It does not seem transparent for the few languages that do not use pootle (sl and sr) please do not forget those.

It does also influence the help repo (of course), since the change will be a very big commit.

[Or - any objections to this change?]

No objections as I think it is a good and welcome change, just a question.

As we discussed in ESC (and Oliver sort of pushed) it seems the goal is to move away from .xhp to .xhtml (if I understood it correct). If decided do we then want to do that as a set of small steps or make 1 script that does it ?

Please just see this as a question of how often to we want to run these conversions.

have a nice weekend
rgds
jan I.

Hi Jan,

Jan Iversen píše v Pá 16. 12. 2016 v 16:27 +0100:

> this change is supposed to be transparent for L10n and
> Documentation teams, but they should know :slight_smile:

It does not seem transparent for the few languages that do not use pootle (sl and sr) please do not forget those.

Thanks for the reminder. I hope Cloph can do the upgrade for them some
way that fits them too, though?

It does also influence the help repo (of course), since the change will be a very big commit.

> [Or - any objections to this change?]

No objections as I think it is a good and welcome change, just a question.

As we discussed in ESC (and Oliver sort of pushed) it seems the goal
is to move away from .xhp to .xhtml (if I understood it correct). If
decided do we then want to do that as a set of small steps or make 1
script that does it ?

I tried to explain on the documentation@ why a big-bang move to html is
not a good idea from many reasons in another thread; to name the most
important ones:

* big-bang "let's abandon one technology and hooray for another one"
  always brings lots of regressions that are hard to fix in a timely
  manner; incremental changes are easier to maintain

* html does not have markup for some of the semantics that we have (and
  need) in the help files (like <section> or <embed> to name few)

* there are many ways how to describe the same thing in html (<b>'s and
  <i>'s vs. <strong> and <emph> vs. <div>'s with css vs. who-knows-what)
  which would make the help harder to maintain, if we eg. want to reuse
  the information from there to generate other representations (like
  eg. books or so)

Please just see this as a question of how often to we want to run these conversions.

One more may be needed if we agree that the id="..." attribute could be
done non-mandatory, because that one affects the msgctx too.

If we want to make the XHP markup look more like HTML markup (which I
don't object in general & this is up to agreement between between the
Documentation and L10n people), there might be additional conversions
needed, for things like <image> -> <img> etc. - but I'd like to keep
this separate from the cleanup effort / topic.

All the best,
Kendy

it seems the goal is to move away from .xhp to .xhtml

I hope you meant HTML 5, because XHTML is a dead end (and good riddance).

html does not have markup for some of the semantics that we have (and need)

in the help files (like <section> or <embed> to name few)

Both <section> and <embed> are part of HTML 5 and there is a good chance
the other things are as well.

there are many ways how to describe the same thing in html (<b>'s and <i>'s

vs. <strong> and <emph> vs. <div>'s with css vs. who-knows-what)

CSS is the preferred way, its the same as with styles and direct formatting
in Writer.

Hi,

khagaroth píše v Pá 16. 12. 2016 v 17:51 +0100:

> I hope you meant HTML 5, because XHTML is a dead end (and good riddance).

html does not have markup for some of the semantics that we have (and need)
> in the help files (like <section> or <embed> to name few)
>

Both <section> and <embed> are part of HTML 5 and there is a good chance
the other things are as well.

They are, but they mean a completely different thing than what they mean
in XHP :wink:

<embed> in XHP is more like <object name="foo" type="text/html"
data="foo.inc"></object>.

Similarly <section> is more like a <div> with some associated css.

Again - I'm talking semantics; <object> is a general thing, and has no
semantics by itself, similarly <div>. We'd lose this by converting to a
plain HTML.

All the best,
Kendy

Hi All

it seems the goal is to move away from .xhp to .xhtml

I hope you meant HTML 5, because XHTML is a dead end (and good riddance).

html does not have markup for some of the semantics that we have (and need)

in the help files (like <section> or <embed> to name few)

Both <section> and <embed> are part of HTML 5 and there is a good chance
the other things are as well.

there are many ways how to describe the same thing in html (<b>'s and <i>'s

vs. <strong> and <emph> vs. <div>'s with css vs. who-knows-what)

CSS is the preferred way, its the same as with styles and direct formatting
in Writer.

The complete XHP reference is here:

https://wiki.documentfoundation.org/HelpContentAuthoring/3

Regards

CSS is preferred for styling not semantics. But all of this is moot if
we don’t have an HTML5 help viewer anyway.

Regards,
Khaled

oh and of course HTML help content effectively requires first ditching
the current built-in Writer-based help browser, since Writer can only
handle ~HTML3 or so and if people can just write HTML5 directly they
surely don't want to be restricted to a subset with no tool to verify
they didn't get it wrong, so the pages have to be displayed by a proper
web browser.

but maybe getting rid of the Writer based online help is already part of
the goal, i don't know.

Hi, Jan,

> It does not seem transparent for the few languages that do not use
pootle (sl and sr) please do not forget those.

Thanks for the reminder. I hope Cloph can do the upgrade for them some
way that fits them too, though?

I do not mind this happening if it does not affect the l10n process.

With that in mind the date/point at which it will be run must be set.

Probably this will happen after 5.3.x branch? It would make sense to do it
after the 5.3.0 release, when most l10n teams finish with their work and
have finished translating help, and still not started translating the 5.x
master. Or maybe after 5.3.1 release?

The sl and sr team would probably send their po files zipped (or they would
be taken from the git) and the script should be run through them and they
would be entered in the git and returned to respective l10n teams.

This conversion process (including the majority of l10n from Pootle) should
happen in a short span (like two working days, maybe more if required by
the scripts to go through all the languages and to check all converted
files for consistency by some other scripts).

Probably one (or a few) languages should be taken first to test this
procedure/the scripts and to have better estimates of time and resources
needed.

Lp, m.

Hi all

2016-12-16 16:13 keltezéssel, Jan Holesovsky írta:

The idea is to get rid of the 'oldref' attribute in the help files; ie.
change

  <paragraph role="heading" id="hd_id3145649" xml-lang="en-US" level="2" l10n="U" oldref="13">Heading</paragraph>

to

  <paragraph role="heading" id="hd_id3145649" xml-lang="en-US" level="2" l10n="U">Heading</paragraph>

I did not realized oldref being part of msgctxt, so in my recent help
commits removed a lot of these from the files I touched, even from
otherwise untouched strings. So I probably made quite a few fuzzy
strings - sorry about that.

With that being said, I fully support this move, like right now :).

Also if possible, please kill the l10n attribute along it. These two
make my eyes bleed while editing the original xml.

Regards
Gabor

Hi Jan

Hi,

khagaroth píše v Pá 16. 12. 2016 v 17:51 +0100:

I hope you meant HTML 5, because XHTML is a dead end (and good riddance).

html does not have markup for some of the semantics that we have (and need)

in the help files (like <section> or <embed> to name few)

Both <section> and <embed> are part of HTML 5 and there is a good chance
the other things are as well.

They are, but they mean a completely different thing than what they mean
in XHP :wink:

<embed> in XHP is more like <object name="foo" type="text/html"
data="foo.inc"></object>.

Similarly <section> is more like a <div> with some associated css.

Again - I'm talking semantics; <object> is a general thing, and has no
semantics by itself, similarly <div>. We'd lose this by converting to a
plain HTML.

All the best,
Kendy

One thing I'd like to add for evaluation of using XML for the help
contents in browsers is that, in my experience:

* XSLT (XML style sheets), XPath and XQuery are another technologies
to master.

* An error in a XSLT statement and one get a blank page or a message
with very little indications (Firefox)

* XSLT seems to be an aging technology. Is the industry betting in this
technology for the future?

* Rendering XML+XSLT is browser-dependent and is not publicly/widely
tested by W3C. We may be forced to test the results into a wide set of
browsers.

Regards

Hi Olivier,

Olivier Hallot píše v So 17. 12. 2016 v 14:54 -0200:

One thing I'd like to add for evaluation of using XML for the help
contents in browsers is that, in my experience:

* XSLT (XML style sheets), XPath and XQuery are another technologies
to master.

* An error in a XSLT statement and one get a blank page or a message
with very little indications (Firefox)

* XSLT seems to be an aging technology. Is the industry betting in this
technology for the future?

* Rendering XML+XSLT is browser-dependent and is not publicly/widely
tested by W3C. We may be forced to test the results into a wide set of
browsers.

Nothing stops us from rewriting the XLST transformation to plain
JavaScript, and handle the XHP files directly via JS if XSLT is blocking
us. [And this is a reasonably self-contained, and easily testable task:
The XHP -> HTML conversion has to give the same results before and after
the rewrite for all the files. We've got rid of XSLT in writerfilter
the same way few years ago.]

And maybe we'll eventually end up with using the plain HTML5 directly -
I definitely don't want to block evolution (even though at the moment I
see more drawbacks than gains).

But that's my main point - I want an evolution, not a revolution. Every
time I hear about "helpcontent*3*" or "let's move to html5", I get
extremely scared, because such claims seem to suggest that we have to
throw away what we have & rewrite everything first, and miss what we
want to achieve in the first place; which from what I know is:

1) add multimedia content

2) make the editing easier

But neither 1) nor 2) have html5 (or a complete rewrite) as a
pre-requisite, for both these goals there is an incremental upgrade path
possible: Improving XHP step by step.

All the best,
Kendy

Hi Jan, *,

Jan Iversen píše v Pá 16. 12. 2016 v 16:27 +0100:

> this change is supposed to be transparent for L10n and
> Documentation teams, but they should know :slight_smile:

It does not seem transparent for the few languages that do not use pootle (sl and sr) please do not forget those.

Thanks for the reminder. I hope Cloph can do the upgrade for them some
way that fits them too, though?

I'll provide the script with which I'll do the auto-translation, so
depends on whether they are using same mechanism to update against the
new templates or not whether it will work without modifications (i.e.
either they have obsolete strings in the newly generated po files, or
they have to keep old po files to read the old translation from).

ciao
Christian

Hi *,

it seems the goal is to move away from .xhp to .xhtml

I hope you meant HTML 5, because XHTML is a dead end (and good riddance).

What will be in the installation set of course is independent of what
is used as the source-format.

And as kendy already noted, the source format has additional
requirements. (like being translated in a sensible way, i.e. po files
like it is now)

xml can be validated, and converted rather easily. (and of course it
is working already, no need to reinvent the wheel at this point)

ciao
Christian

HI Martin, *,

I do not mind this happening if it does not affect the l10n process.

removing those context markers has been done already, and I think
translators didn't notice, so I assume it won't affect the process at
all..

ciao
Chrisian

Hi,
I dont remember this. Was it done for sl and sr as well? Probably I forgot
...
Lp, m.

napisala: