Date: prev next · Thread: first prev next last
2013 Archives by date, by thread · List index


Hi,

The data files for libexttextcat in this directory:

https://github.com/giuliopaci/libexttextcat/tree/master/langclass/ShortTexts

Contains a garbled Hungarian version, it's almost in iso-8859-1 but some
characters are destroyed because it doesn't contain all Hungarian
characters.

It is easy to pick up a utf-8 good version from

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=hng

and see the difference.

It's not clear whether this prevents it from classifying Hungarian text
correctly, but it may stop it working in utf-8, because most of the other
files are in utf-8.

Cheers

Mark

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.