Date: prev next · Thread: first prev next last
2016 Archives by date, by thread · List index


Hi Nasrin:

Your comment, the "encoding is unicode," is rather meaningless, since
Unicode simply provides a long and unique number for most every glyph used
to write something down. The actual "encoding" you care about is the
specific method used to represent these unique numbers, which is typically
UTF-8 for most uses, but can be UTF-16 or UTF-32 for special cases.

If you go to BUG #92655
(https://bugs.documentfoundation.org/show_bug.cgi?id=92655), and then to the
second comment (marked as Comment #1) and download the attachment (117160)
listed there, you'll find a 32 page pdf document I created some time back
titled "Exploring Complex Text Layout." This document covers a lot of what
you need to understand to use other scripts in LibreOffice Writer,
particularly if you are using more than one script/language in the same
document.

Since you are presumably writing Farsi, a language which uses the Arabic
script, you'll be interested (maybe) that this script, along with Thai,
Hindi, and Hebrew, is used in some of the examples in my pdf.

Unfortunately, I am unfamiliar with Farsi, so my Arabic script examples are
in the Arabic language (well, one flavor of that), but I'm sure you'll find
the discussion informative, as it covers a lot of the niceties such as
contextual alteration of the character forms, kashideh justification, issues
with using right-to-left scripts in Writer, and so forth.

Beginning on page 33 of the document, there is an explanation of how Unicode
values may be converted into one, two, three, or four bytes in standard
UTF-8 encoding, and why these options are all needed. While UTF-32
characters are always 32 bits (4 bytes) long, UTF-8 character sizes can vary
depending on what character set is in use *for any given individual
character*. While the  غ or ي characters are each two bytes in length, a
space or a carriage return are still only one byte in length. Although it
seems on the surface to be a complicated way of doing things, it's actually
a really cool way of achieving a rather difficult objective.

There is also a section in the pdf about various ways of entering
characters; having a keyboard mapped to another language is great until you
need to switch back and forth on a regular basis.

The bottom line is that the font that is in use must contain the characters
from Unicode block 0600-06ff in order to reproduce Persian/Farsi, which is
why I suggested the fonts that I did, as I know that those include these
characters.

You are stepping into a very interesting area, particularly as LibreOffice
is not particularly good with right-to-left languages if you attempt some
things that are trivial in English (e.g. rotating text in a table cell among
other things). If you are interested in things to look out for with Writer,
go back to the same bug referenced above and download the first attachment
#117159 for a tour of the issues you might face.

I hope this helps you get an idea of what you are stepping into.

Good Luck,

Frank




--
View this message in context: 
http://nabble.documentfoundation.org/about-unicode-txt-documents-tp4178249p4178872.html
Sent from the Users mailing list archive at Nabble.com.

-- 
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.