On 03/05/2016 15:51, Kruno wrote:
not doubled maintained word lists by multiple maintainers (not knowing each other)
will not and can not be resolved.
With a central repository for working on dictionaries, it is far easier
for two individuals interested in the same dictionary to find each
other, than if they are working on two different sites, in different
locations.
Who's dictionary to include to that single repository, how to merge
As a practical matter, a repository that only allows for one dictionary
per language, is not viable. At a minimum, you'll have specialized
dictionaries.
how to merge affix files with different affix classes (that will be a mess).
I've seen some tools for automating the creation of affix files.
I don't know how well they work, though.
This goes back to my claim that spell checking without built-in grammar
checking is useless.
>Why you think that included dictionary is 'standard' and is better
then the other one?
Any dictionary project has to include the ability to have the same
language in at least two different writing systems --- Braille (^1) and
the standard writing system for the language.
>The other guy will give up his work?
The proposal does not require the other guy to give up his project.
I wouldn't be surprised to see the other guy create a more specialized
dictionary.
* John Doe creates a general purpose dictionary;
* Jane Doe creates a name and places dictionary;
* John Roe creates a scientific terminology dictionary;
* Jane Roe creates a basic words dictionary;
Who will hunt all those 'other' guys telling them 'Yo, dude, leave that, do this shit!'
As far as existing spell checking and wordlist projects go, nobody is
going to tell them to "leave that, do this". What might happen, is that
known, existing projects, are offered space, etc in the proposed
repository/incubator, but they will stay where they currently are, due
to how their workflow operates.
How will such a repository resolve competition between two English dictionaries?
Since you specifically mentioned English, there currently are versions
of English for a dozen locales, plus around half a dozen specialist
dictionaries.
Most users won't choose the English (OED) variant, because it has too
many words in it. Too many words means that words that are wrongly used,
get flagged as correct spelled. The "Eye right withe aye pin" phenomena.
Nobody can (or should) just declare 'we are building dictionary
repository - here use this, not that' just because being in position of
power to do that.
The proposal does not mandate that only the proposed space/workflow/etc
be used. In an ideal world, existing groups would be able to drop their
work-product into the repository, with only one change to their workflow
--- a bot that automatically uploads their new, verified, approved work
product into the repository. Furthermore, this change would occur, if,
and only if the existing group wanted to do so.
This proposal is about non-technical types being able to _easily_ create
viable dictionaries for their specific use-case. It doesn't matter if
that use-case is a dictionary in Pondo, or a dictionary of people and
places in Bharat, or a dictionary in Moon.
The other part of the proposal is that even if the original dictionary
creator abandons the dictionary, it can still be maintained, and updated.
The third part of the proposal is that whilst it is initially for LibO,
the hope is that it becomes the source for dictionaries for FLOSS projects.
#####
Hypothetical situation. One of Kevin Scannell's students decides that
what the world needs is a dictionaries in each of the 2,500 languages
that have been reduced to a writing system. So said student walks thru
Kevin's word lists, and creates a dictionary project for each of the
2,000 languages that Kevin maintains word lists for. A year later, said
student graduates, and forgets about their dictionaries.
Under the current scenarios, when said student abandons their
dictionaries, the only way other people can update them, is by forking
them --- assuming that the license allows forking.
Under the proposed scenario, if said student creates the dictionaries in
the repository, when said student abandons them, other people can still
update the dictionaries, which can then be distributed to LibO, etc.
I'll grant that were said student to create 2,000+ dictionaries for
LibO, it would break the UI. However, as far as the proposal goes, that
breakage is irrelevant.
use of hunspell features correctly (not simple word lists, but by logic)
what this mean?
For non-techies, creating a HunSpell dictionary is a non-starter,
because they don't understand the vocabulary that it uses.
For techies, the technical description is, at best, off-putting.
features but those dictionaries who do just word list will continue do
just the word list because is purgatory (or hell) to do this right if
you were not doing it right from the beginning.
Depending upon how it is coded/implemented, it is _theoretically_
possible for the repository/whatever, to accept everything from a simple
word list, to a completed, correctly formed HunSpell Dictionary, and
spit out a libO/AOo/EO/TBird spelling dictionary that includes all of
the bells and whistles of the most obscure features of HunSpell.
Note: I said theoretically possible. As a practical matter, this
functionality will take several hundred iterations to achieve.
but don't expect it to be lively as Pootle and translations - it's just not gonna happen, that's
not realistic.
Depending upon how "lively" is defined, I doubt that anybody expects the
proposed repository/whatever to be as lively as Pootle is.
That said:
Ethnologue claims that there are 7097 living languages.
Ethnologue claims 306 extinct languages, counting from 1950.
Linguist.org claims 573 extinct languages.
Linguist cliams 13 constructed languages.
All of those figures are, at best, guesses. Guesses that have their
basis in social, political, cultural, and ethnic rationales.
Peter Daniels claims 80 writing systems.
This is a more significant figure to work with, because languages can be
written in multiple writing systems. Probably the best contemporary
example is Turkish, which can be correctly written in the Arabic,
Cyrillic, Latin, and Braille Writing Systems.
As such, the end result probably will be far more projects than Pootle
or Translations, but far less activity than they have.
^1: I'm ignoring the not-so-little issue that Braille support in LibO is
minimal.
jonathon
--
To unsubscribe e-mail to: l10n+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.