Hi there,
I am the owner of the Tamil localisation effort, and am creating the
grammar checker for LibreOffice using Lightproof. I am having trouble
matching the diacritic marks that are so common in Tamil. For example --
\b(த[ா-ௌ]*\S*)\b
will match தாலம but not தாலம்
I would like to match the whole word, including the diacritic mark; but I'm
not sure how to trap it.
Would appreciate if you had faced similar problem for your language and
have solved it.
Cheers,
Elanjelian
Hi,
AFAIK it's a known bug of python2. It doesn't support unicode
completely. So you need to switch to python3 to process your language
without this kind of problems.
I'm not sure when Lightproof will deliver python3 support, probably
László tell you more.
--
Unsubscribe instructions: E-mail to l10n+help@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.