Date: prev next · Thread: first prev next last
2016 Archives by date, by thread · List index

On 03/05/2016 15:51, Kruno wrote:

not doubled maintained word lists by multiple maintainers (not knowing each other)
will not and can not be resolved.

With a central repository for working on dictionaries, it is far easier for two individuals interested in the same dictionary to find each other, than if they are working on two different sites, in different locations.

Who's dictionary to include to that single repository, how to merge

As a practical matter, a repository that only allows for one dictionary per language, is not viable. At a minimum, you'll have specialized dictionaries.

how to merge affix files with different affix classes (that will be a mess).

I've seen some tools for automating the creation of affix files.
I don't know how well they work, though.

This goes back to my claim that spell checking without built-in grammar checking is useless.

>Why you think that included dictionary is 'standard' and is better then the other one?

Any dictionary project has to include the ability to have the same language in at least two different writing systems --- Braille (^1) and the standard writing system for the language.

>The other guy will give up his work?

The proposal does not require the other guy to give up his project.

I wouldn't be surprised to see the other guy create a more specialized dictionary.

* John Doe creates a general purpose dictionary;
* Jane Doe creates a name and places dictionary;
* John Roe creates a scientific terminology dictionary;
* Jane Roe creates a basic words dictionary;

Who will hunt all those 'other' guys telling them 'Yo, dude, leave that, do this shit!'

As far as existing spell checking and wordlist projects go, nobody is going to tell them to "leave that, do this". What might happen, is that known, existing projects, are offered space, etc in the proposed repository/incubator, but they will stay where they currently are, due to how their workflow operates.

How will such a repository resolve competition between two English dictionaries?

Since you specifically mentioned English, there currently are versions of English for a dozen locales, plus around half a dozen specialist dictionaries.

Most users won't choose the English (OED) variant, because it has too many words in it. Too many words means that words that are wrongly used, get flagged as correct spelled. The "Eye right withe aye pin" phenomena.

Nobody can (or should) just declare 'we are building dictionary
repository - here use this, not that' just because being in position of
power to do that.

The proposal does not mandate that only the proposed space/workflow/etc be used. In an ideal world, existing groups would be able to drop their work-product into the repository, with only one change to their workflow --- a bot that automatically uploads their new, verified, approved work product into the repository. Furthermore, this change would occur, if, and only if the existing group wanted to do so.

This proposal is about non-technical types being able to _easily_ create viable dictionaries for their specific use-case. It doesn't matter if that use-case is a dictionary in Pondo, or a dictionary of people and places in Bharat, or a dictionary in Moon.

The other part of the proposal is that even if the original dictionary creator abandons the dictionary, it can still be maintained, and updated.

The third part of the proposal is that whilst it is initially for LibO, the hope is that it becomes the source for dictionaries for FLOSS projects.


Hypothetical situation. One of Kevin Scannell's students decides that what the world needs is a dictionaries in each of the 2,500 languages that have been reduced to a writing system. So said student walks thru Kevin's word lists, and creates a dictionary project for each of the 2,000 languages that Kevin maintains word lists for. A year later, said student graduates, and forgets about their dictionaries.

Under the current scenarios, when said student abandons their dictionaries, the only way other people can update them, is by forking them --- assuming that the license allows forking.

Under the proposed scenario, if said student creates the dictionaries in the repository, when said student abandons them, other people can still update the dictionaries, which can then be distributed to LibO, etc.

I'll grant that were said student to create 2,000+ dictionaries for LibO, it would break the UI. However, as far as the proposal goes, that breakage is irrelevant.

use of hunspell features correctly (not simple word lists, but by logic)
what this mean?

For non-techies, creating a HunSpell dictionary is a non-starter, because they don't understand the vocabulary that it uses.

For techies, the technical description is, at best, off-putting.

features but those dictionaries who do just word list will continue do
just the word list because is purgatory (or hell) to do this right if
you were not doing it right from the beginning.

Depending upon how it is coded/implemented, it is _theoretically_ possible for the repository/whatever, to accept everything from a simple word list, to a completed, correctly formed HunSpell Dictionary, and spit out a libO/AOo/EO/TBird spelling dictionary that includes all of the bells and whistles of the most obscure features of HunSpell. Note: I said theoretically possible. As a practical matter, this functionality will take several hundred iterations to achieve.

but don't expect it to be lively as Pootle and translations - it's just not gonna happen, that's 
not realistic.

Depending upon how "lively" is defined, I doubt that anybody expects the proposed repository/whatever to be as lively as Pootle is.

That said:

Ethnologue claims that there are 7097 living languages.
Ethnologue claims 306 extinct languages, counting from 1950. claims 573 extinct languages.
Linguist cliams 13 constructed languages.

All of those figures are, at best, guesses. Guesses that have their basis in social, political, cultural, and ethnic rationales.

Peter Daniels claims 80 writing systems.
This is a more significant figure to work with, because languages can be written in multiple writing systems. Probably the best contemporary example is Turkish, which can be correctly written in the Arabic, Cyrillic, Latin, and Braille Writing Systems.

As such, the end result probably will be far more projects than Pootle or Translations, but far less activity than they have.

^1: I'm ignoring the not-so-little issue that Braille support in LibO is minimal.


To unsubscribe e-mail to:
Posting guidelines + more:
List archive:
All messages sent to this list will be publicly archived and cannot be deleted


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.