Workflow between dev, UX and l10n teams

Hi Robinson

To conclude, what l10n team would like to see is:
- a review process of the strings before they are committed and make
sure they respect the en_US standards (capitals, grammar, punctuation,
typography). Maybe adding the Gnome HIG book to our pages [like 2] if
not already.

That will require a revisor with en_US skills.

About how much work (read: time) would this review process entail?

Thanks,
--R

If you need to review *all* help & UI, I think it maps to an equivalent
of a 500 or more page handbook.

A person who cannot decide if a string change is semantic or cosmetic to en-US should not be messing around with the string names in the first place, if you ask me.

Ok so maybe occasionally they might get it wrong. That still produces a lot LESS workload to fix that landing 2000 cosmetic en-US changes on 50 locales.

Not a good reason for opposing this approach.

Michael

Sgrìobh Jan Holesovsky na leanas 27/01/2015 aig 14:16:

PS the current setup is not foolproof either as we sometimes get really bad strings, linguistically bad that is.

If this is such a concern, then why don't we set up a panel of experiences localizers who are willing to help developers judge if a change is semantic or cosmetic before we land them on l10n in general?

Michael

Sgrìobh Jan Holesovsky na leanas 27/01/2015 aig 14:16:

Hi,

2015.01.26 17:40, Jan Holesovsky rašė:

Sophie píše v Po 26. 01. 2015 v 16:19 +0100:

That's why we were thinking of a en_US version as a real language and
different from the sources and

But at some stage this will have to apply to the sources - and at that
time, it will be even worse than now :frowning: I'm afraid having en_US as a
separate language will make the situation worse, not better.

also about scripting changes when
possible (like the substitution of ~ by _)

Sure - so I think this was something that could have been automatized
with a trivial script; when this was noticed for the first time, please?
Pity that it was not brought to the ESC as a problem...

I just wanted to say that I'm fully with Jan on these two statements: I
believe that the right thing to do is automation of massive trivial
changes, not a separate pseudo-locale where strings with developer
mistakes and/or without enough clarity would be carved in stone. Having
that pseudo-locale would not help us solve half of cosmetic issues, such
as added colons or changed access keys, these would require scripting
anyway. The issues it would solve are either also scriptable
(typographical or letter case changes) or should be rare by their nature
(typo fixes or sentence improvements; now that some teams work on
master, these should occur in branches even less frequently). On the
other hand, having that source locale would introduce a yet another
level of complexity by forcing each developer to decide where each
string change should go, and if you are thinking about making a single
person or two accountable for these decisions, then why not ask them to
instead review strings that are about to be landed into en-US?

In general, I think it's kind of sloppy (sorry, can't think of a right
word right now) to leave miss-worded strings in the source as they are,
and fix them in a separate locale instead. I don't know how many fixes
like that (specifically excluding typography, colons and similar massive
replacements) end up in each release, but assuming there aren't many
(e.g. a dozen or two), I really don't think they deserve all this fuss.

Regards,
Rimas

Hi Jan,

2015.01.26 16:43, Jan Holesovsky rašė:

Mihovil Stanić píše v Po 26. 01. 2015 v 10:25 +0100:

Cosmetic changes (~ to _ or "Status" to "Status:" or ... to … or those
different quote styles I don't even have on my keyboard) and anything
similliar - NOT OK if you don't script it for all languages
Cosmetic changes ("Big brown fox" -> "Big Brown Fox") - NOT OK at all,
change just for en_us, don't change my strings and don't even notify me
you did it in en_us

I see 2 problems here:

1) There is no tool that would detect these trivial changes, and would
   act accordingly.

Regarding 1) - I thought that Pootle is detecting the trivial changes
some way, and offering the original translation. Is it not? What can
be done to improve that, so that for translators it is just a matter of
checking; not a matter of translating? [Or even what you suggest - that
it would just update the source strings without touching the
translations?]

Pootle does offer the original translation, but the localizer still has
to approve it.

Furthermore, Pootle does not apply any automatic changes. If you had
e.g. "Some ~string", and you change it to "Some _string", Pootle will
show the original translation as a hint, but the user will still have to
port this trivial change to the translation manually.

Needless to say, sometimes these minor differences avoid being noticed
by the localizers, which results in errors in the locale (I've seen
incorrect access key identifiers in the menus at least once).

However, while you are correct that there is no tool to detect these
changes, I don't think there has to be. The person who implements the
change knows better than anyone whether or not it can be automated,
perhaps they even automated it themselves. For example, I seriously
doubt that somebody went over all L10n files and changed triple dots to
ellipses manually, this was most likely a scripted change. Same, or very
similar, script would have probably worked with all other locales, but I
guess that person simply didn't think about it.

Similarly, changes in used quote characters most likely could have been
isolated and transplanted to locales.

Adding colons to certain strings only would probably have been slightly
more difficult, but still scriptable.

And none of that requires any "tool to detect trivial changes"... :wink:

2) The texts for translations are updated in big 'code' drops, without
   possibility for translators to affect the process in any way - for
   them it is too late.

Regarding 2) - I'm glad that you say that the strings will be now
getting to Pootle immediately after the code / string changes in master.
I think it is important that the translators will be able to deal with
the changes immediately, not several months later, so that they can
cooperate, and not only react.

In general, I don't think that setting extremely strict rules works,
unless you have means how to enforce them - like via a commit hook or so
(and it is extremely unpopular way to do things).

It is always much better to communicate - if you see a developer who
commits a change that causes you grief, please _do_ tell _him/her_
immediately, and - if possible - in a friendly way. I'm sure he/she
will do much better the next time.

Unfortunately I did not see any signs of notice that this or that change
was problematic for localization on the development mailing list - were
there such warnings there? Like "commit XY caused AB - please don't do
such things, unless we agree how to do that effectively / without pain"?
Or was it impossible so far because the strings in Pootle were not
synced with master?

I fully agree with you here, and yes, so far communicating these issues
was really difficult because these massive changes appeared in front of
the localizers' eyes way too late in the process.

Regards,
Rimas

Hi,

2015.01.27 15:39, Olivier Hallot rašė:

To conclude, what l10n team would like to see is:
- a review process of the strings before they are committed and make
sure they respect the en_US standards (capitals, grammar, punctuation,
typography). Maybe adding the Gnome HIG book to our pages [like 2] if
not already.

That will require a revisor with en_US skills.

About how much work (read: time) would this review process entail?

Thanks,
--R

If you need to review *all* help & UI, I think it maps to an equivalent
of a 500 or more page handbook.

But you don't. L10n only are only asking for review of strings when they
are being changed.

Rimas

Hi Rimas, all

Hi Jan,

2015.01.26 16:43, Jan Holesovsky rašė:
> Mihovil Stanić píše v Po 26. 01. 2015 v 10:25 +0100:
>
>> Cosmetic changes (~ to _ or "Status" to "Status:" or ... to … or those
>> different quote styles I don't even have on my keyboard) and anything
>> similliar - NOT OK if you don't script it for all languages
>> Cosmetic changes ("Big brown fox" -> "Big Brown Fox") - NOT OK at all,
>> change just for en_us, don't change my strings and don't even notify me
>> you did it in en_us
> I see 2 problems here:
>
> 1) There is no tool that would detect these trivial changes, and would
> act accordingly.
>
>
> Regarding 1) - I thought that Pootle is detecting the trivial changes
> some way, and offering the original translation. Is it not? What can
> be done to improve that, so that for translators it is just a matter of
> checking; not a matter of translating? [Or even what you suggest - that
> it would just update the source strings without touching the
> translations?]

Pootle does offer the original translation, but the localizer still has
to approve it.

Furthermore, Pootle does not apply any automatic changes. If you had
e.g. "Some ~string", and you change it to "Some _string", Pootle will
show the original translation as a hint, but the user will still have to
port this trivial change to the translation manually.

Needless to say, sometimes these minor differences avoid being noticed
by the localizers, which results in errors in the locale (I've seen
incorrect access key identifiers in the menus at least once).

However, while you are correct that there is no tool to detect these
changes, I don't think there has to be. The person who implements the
change knows better than anyone whether or not it can be automated,
perhaps they even automated it themselves. For example, I seriously
doubt that somebody went over all L10n files and changed triple dots to
ellipses manually, this was most likely a scripted change. Same, or very
similar, script would have probably worked with all other locales, but I
guess that person simply didn't think about it.

Similarly, changes in used quote characters most likely could have been
isolated and transplanted to locales.

Adding colons to certain strings only would probably have been slightly
more difficult, but still scriptable.

And none of that requires any "tool to detect trivial changes"... :wink:

That's the point of this discussion, thanks Rimas to make it :slight_smile:
L10n team can always react, and earlier now, but making the scripting part
of the commit or part of the 'one making the change' is more natural in the
workflow. In other words, our product is not en_US only.

> 2) The texts for translations are updated in big 'code' drops, without
> possibility for translators to affect the process in any way - for
> them it is too late.
>
>
> Regarding 2) - I'm glad that you say that the strings will be now
> getting to Pootle immediately after the code / string changes in master.
> I think it is important that the translators will be able to deal with
> the changes immediately, not several months later, so that they can
> cooperate, and not only react.
>
> In general, I don't think that setting extremely strict rules works,
> unless you have means how to enforce them - like via a commit hook or so
> (and it is extremely unpopular way to do things).
>
> It is always much better to communicate - if you see a developer who
> commits a change that causes you grief, please _do_ tell _him/her_
> immediately, and - if possible - in a friendly way. I'm sure he/she
> will do much better the next time.
>
> Unfortunately I did not see any signs of notice that this or that change
> was problematic for localization on the development mailing list - were
> there such warnings there? Like "commit XY caused AB - please don't do
> such things, unless we agree how to do that effectively / without pain"?
> Or was it impossible so far because the strings in Pootle were not
> synced with master?

I fully agree with you here, and yes, so far communicating these issues
was really difficult because these massive changes appeared in front of
the localizers' eyes way too late in the process.

What we should take care though is to not over complicate the work of l10n
team by relying on this fact. So as I already said, it should be a shared
work and vigilance by the concerned teams.
Cheers
Sophie

Review strings in context.

Whoever volunteers for this task will need to go through _all_ of the
existing help, UI, and other things, before reviewing strings when they
are changed.

The task requires:
* Copy editing;
* Line editing;
* Proof reading;
amongst other editing tasks.

FWIW, this also means that the l10n, a11y, and i18n teams will be dumped
with a slew of changes that might, but probably won't affect their
existing translation, but will still need to be verified to ensure that
their translations, etc. are not broken.

BTW, does anybody know where the *current* _LibreOffice Manual of
Style_ can be obtained from?

jonathon

If you need to review *all* help & UI, I think it maps to an equivalent of a 500 or more page handbook.

But you don't. L10n only are only asking for review of strings when they are being changed.

Review strings in context.

Whoever volunteers for this task will need to go through _all_ of the
existing help, UI, and other things, before reviewing strings when they
are changed.

So basically there's a steep learning curve, but once someone has an
active knowledge of the current text, then they should be able to do
work in small deltas. As long as there's clear documentation about
getting up to speed (and the potential time to do so), the workflow
seems plausible.

The task requires:
* Copy editing;
* Line editing;
* Proof reading;
amongst other editing tasks.

Yup, sounds challenging.

FWIW, this also means that the l10n, a11y, and i18n teams will be dumped
with a slew of changes that might, but probably won't affect their
existing translation, but will still need to be verified to ensure that
their translations, etc. are not broken.

I keep on hearing about these big changes. Some of them sound
scriptable, but have there been some that are not? Aside from the
bulk-changes, how many string additions/modifications/etc.. are there
on a weekly or monthly basis?

BTW, does anybody know where the *current* _LibreOffice Manual of
Style_ can be obtained from?

Once we find it, let's add a redirect here:
https://wiki.documentfoundation.org/Manual_of_style

(Or it can live there, if it doesn't have a good home yet! :slight_smile:

If it's more of an official-ish document, perhaps this would be a better home:
https://wiki.documentfoundation.org/TDF/Manual_of_style

Best,
--R

So basically there's a steep learning curve,

I wouldn't describe it as "a steep learning curve" as much as it is
ensuring that the current text is grammatically correct en_US, with no
spelling errors, _and_ conforms to the conventions in the Style Manual.

Rephrased: The person has to translate everything into en_US, then
copyedit everything.

but once someone has an active knowledge of the current text, then

they should be able to do work in small deltas.

In theory, once everything is in en_US, and conforms to the style
manual, then the only "knowledge" required, will be to read the material
once or twice, before being able to work on small deltas.

As long as there's clear documentation about getting up to speed (and

the potential time to do so), the workflow seems plausible.

That initial translation & copyedit is there, purely because of a number
of minor issues. Things that 99.99% of the users probably won't notice,
but overall detract from the project. The things that one is not
conscious of, but nonetheless have a negative impact.

I keep on hearing about these big changes. Some of them sound scriptable, but have there been some that are not?

Presentation markup is usually scriptable.

Vocabulary, spelling, syntax, and grammatical changes are not
scriptable. (Well they can be, but you end up with "invisible, insane"
for "out of sight, out of mind".)

If it's more of an official-ish document, perhaps this would be a better home:
https://wiki.documentfoundation.org/TDF/Manual_of_style

My sense is that whilst Sun wanted the project to use it, it's usage was
honoured by the breach thereof.

jonathon

Am I right in reading into this, that master is using American English?
And if so, why? Seeing as LibreOffice is, at heart, a European program
surely it should be using English?

Like British English? RP? Let's be specific here...

What proportion of developers are native American speakers?

Not that many. It would be great to see more involvement from the US,
but I think that promoting the "this is a European Project" attitude
can really hurt those numbers. LibreOffice is a global project: No one
country or continent should try to claim it for itself.

Bear in mind that most English variants use English spelling, not
American spelling.

I think the phrase "American English spelling" is clearer -- there are
lots of languages spoken here across the pond, so the phrase "American
spelling" is ambiguous.

At the end of the day, not enforcing en_us as a translation means that
the majority of us (including those of us that speak English rather than
American as our native language) are forced to suffer pain as the
foundations are messed up underneath us.

Whoa there, cowboy! (or whatever the British equivalent is) I think
that British, American, Canadian, etc.. English are all pretty
similar, so while I agree that we might have our little differences
about an extra 'u' in color, or whether the big vehicle that picks up
the trash is a Lorry or a Truck, it's not a big deal compared to the
diff between the Englishes and French or Spanish.

And by allowing that *minority*
to avoid suffering, they are enabled to cause unnecessary pain without
even realising what they are doing!

*facepalm*

I know that you're just getting some stuff off your chest, and sure, I
get it: languages can be tough. So we get have a couple beers, find
the vertias in the vino, and start speaking French (wait, maybe that's
just what I do). More seriously, I'm trying to get people interested
in LibreOffice in the US, and it's really important for us to make the
project welcoming to users and new contributors.

You want to propose some changes? Sure, great plan. But please check
that your method of delivery doesn't paint the Americans as the
outsiders and buffoons of your diatribe, because the reality is that
we really don't have much going on in the US yet, and there's already
a hesitancy to interact with what is perceived as aloof Europeans. I
think that growth in the US has the potential to give a ton back to
the LibreOffice community in Development, Documentation, QA, and so
forth, but we need to go the extra mile there, not tell people that,
before they've opened a single spreadsheet or triaged a single bug,
they are somehow (?) "causing pain."

The rule should be simple. Any changes of meaning can be edited directly
in master. If it's non-native English, and poor at that such as it's
hard to comprehend then it can be corrected in master. If it's clear
comprehensible English, whether English or Strine or American or
International or whatever, then it's off-limits for changes to master,
and has to be done in Pootle or whatever as a localisation.

I like the general idea, but I am concerned about the feasibility. Notes:

1) will inconsistency of nouns (e.g. color vs. colour), inconsistency
of grammar, etc.. within the sources in master make translation harder
for the native-lang teams?

2) What will the language be for builds w/o langpacks? Just a generic
'English'? (maybe we can call it "LibreOffice English" :slight_smile:

3) Who's going to step up to maintain en_US? (I'd love to help, but
I'm working tons of hours as it is)

Cheers,
--R

Hi Jonathon,

2015.01.27 23:15, jonathon wrote:

>> If you need to review *all* help & UI, I think it maps to an

equivalent of a 500 or more page handbook.

> But you don't. L10n only are only asking for review of strings when

they are being changed.

Review strings in context.

Whoever volunteers for this task will need to go through _all_ of the
existing help, UI, and other things, before reviewing strings when they
are changed.

The task requires:
* Copy editing;
* Line editing;
* Proof reading;
amongst other editing tasks.

FWIW, this also means that the l10n, a11y, and i18n teams will be dumped
with a slew of changes that might, but probably won't affect their
existing translation, but will still need to be verified to ensure that
their translations, etc. are not broken.

I really don't see a revision of all existing strings as a requirement
to start reviewing newly added ones. Of course, it would be beneficial,
but not at all a requirement. You don't need to read a 500-pages worth
of text to tell whether or not a certain string is clear, concise and
grammatically, syntactically and typographically correct. Especially if
you are a native English speaker and have a style guide at hand.

Rimas

Am I right in reading into this, that master is using American English?
And if so, why? Seeing as LibreOffice is, at heart, a European program
surely it should be using English?

Okay, maybe I'm being a little facetious here, but I do get very fed up
when people assume that "English" means American English, and not English.

Surely it would make a lot of sense to state that "Master uses
international English (ie any well-understood variant), and for local
consistency en_gb, en_us, en_au etc have to translate it".

What proportion of developers are native American speakers?

Bear in mind that most English variants use English spelling, not
American spelling. Most non-native speakers of English (even very good
speakers) "don't get" English tenses very well. Likewise, most
non-native speakers use odd grammar and word order - very comprehensible
but a dead giveaway that they are non-native.

At the end of the day, not enforcing en_us as a translation means that
the majority of us (including those of us that speak English rather than
American as our native language) are forced to suffer pain as the
foundations are messed up underneath us. And by allowing that *minority*
to avoid suffering, they are enabled to cause unnecessary pain without
even realising what they are doing!

The rule should be simple. Any changes of meaning can be edited directly
in master. If it's non-native English, and poor at that such as it's
hard to comprehend then it can be corrected in master. If it's clear
comprehensible English, whether English or Strine or American or
International or whatever, then it's off-limits for changes to master,
and has to be done in Pootle or whatever as a localisation.

Cheers,
Wol

I really don't see a revision of all existing strings as a requirement to start reviewing newly added ones.

At some point in time that review has to be done. To minimize the
overall workload, it is easier, and simpler, to do it before reviewing
newly added strings, than afterwards. (For starters, doing it afterwards
means having to review those newly added strings at least twice, and
maybe thrice.)

Especially if you are a native English speaker and have a style guide

at hand.

Just as the English language has never met it word that it has not
adopted as it's own, so it has never met a grammatical construct that it
has not adapted and mutilated. One direct consequence of both those
facets of acquisition, is that it is incredibly difficult to write a
sentence in English that is grammatically incoherent, but extremely easy
to write a sentence that grammatically means the opposite of what was
intended.

IOW, that certain string might be "clear, concise and
grammatically, syntactically and typographically correct", but mean
something other than intended, because the vocabulary is usually used to
mean something else elsewhere.

BTW, when you say "style guide", which specific one do you mean?

jonathon

PS the current setup is not foolproof either as we sometimes get really
bad strings, linguistically bad that is.

If this is such a concern, then why don't we set up a panel of experiences
localizers who are willing to help developers judge if a change is semantic
or cosmetic before we land them on l10n in general?

Michael

+1 to both your comments, Michael.

Sgrìobh Jan Holesovsky na leanas 27/01/2015 aig 14:16:

be deciding if a change should be applied in the sources (ie. it is a
change needed for all languages) and what is just making the original
more consistent? And again - what to do if the person mis-judges?

And thank you to you, Kendy (aka Jan Holesovsky), for answering my

"why". I do see that it is a complex matter.

I really don't see a revision of all existing strings as a

requirement to start reviewing newly added ones.

At some point in time that review has to be done. To minimize the
overall workload, it is easier, and simpler, to do it before reviewing
newly added strings, than afterwards. (For starters, doing it
afterwards
means having to review those newly added strings at least twice, and
maybe thrice.)

You're right, but if the choice is between no reviews at all and reviews of new strings only, which would you choose? It's go for the latter.

Especially if you are a native English speaker and have a style guide

at hand.

Just as the English language has never met it word that it has not
adopted as it's own, so it has never met a grammatical construct that
it
has not adapted and mutilated. One direct consequence of both those
facets of acquisition, is that it is incredibly difficult to write a
sentence in English that is grammatically incoherent, but extremely
easy
to write a sentence that grammatically means the opposite of what was
intended.

IOW, that certain string might be "clear, concise and
grammatically, syntactically and typographically correct", but mean
something other than intended, because the vocabulary is usually used
to
mean something else elsewhere.

Let's add "semantically correct" or "contextually correct" to my list of requirements then. :slight_smile:

BTW, when you say "style guide", which specific one do you mean?

The one you're looking for, assuming it exists. If not, or could be a combination of Gnome HIG and any American English style guide we (the LibO community) would deem acceptable and meeting our needs (e.g. The Chicago Manual of Style).

I still have the one for French but it's copyrighted Sun anyway (from
2006). Gnome HIG contains a lot of information that we can use easily.

Cheers
Sophie

Hi again,

BTW, when you say "style guide", which specific one do you mean?

The one you're looking for, assuming it exists. If not, or could be a
combination of Gnome HIG and any American English style guide we (the
LibO community) would deem acceptable and meeting our needs (e.g. The
Chicago Manual of Style).

In fact, I just thought that it doesn't even have to be a formal manual: if somebody would be willing to oversee style consistency in our strings, and that style would look acceptable by our en-US users, then why not? Especially if that person would be willing to formalize these rules into a written style manual along the way.

Hi again,

BTW, when you say "style guide", which specific one do you mean?

The one you're looking for, assuming it exists. If not, or could be a
combination of Gnome HIG and any American English style guide we (the
LibO community) would deem acceptable and meeting our needs (e.g. The
Chicago Manual of Style).

In fact, I just thought that it doesn't even have to be a formal manual: if somebody would be willing to oversee style consistency in our strings, and that style would look acceptable by our en-US users, then why not? Especially if that person would be willing to formalize these rules into a written style manual along the way.

--
Rimas

This document may be of interest:
https://obriend.fedorapeople.org/WritingStyleGuide/
It's only recently (last year) been open sourced and made public.

I am not sure I understand you here (to me, the "otherwise" part reads: "if there is no way to script changes, wait until there is a script available," which would not make sense).

When talking about (developer-side) scripting, is it actually OK to commit modifications to the translations in the translations git sub-repo? My understanding was that such modifications would be overwritten by the next "import commit" (as typically done by Andras, AFAIU from some Pootle database).