Date: prev next · Thread: first prev next last
2010 Archives by date, by thread · List index


On 4 November 2010 23:38, Mattias Johnsson <m.t.johnsson@gmail.com> wrote:
After spending a number of hours learning more about unicode than I
ever wanted to know, I've fixed the bug where the word counter in
Writer counts an opening quote mark (unicode symbol 0x201C) as an extra
word.

Turns out that the opening quote lives in unicode block 40, which was
not given an associated script type in breakiteratorImpl.cxx. This
means that its
script type was defaulting to "WEAK" rather than "LATIN", and "WEAK"
is taken as a word break.

Thought I should get in fast and submit this before John LeMoyne
Castle's heroic efforts fix all the word counter problems :-P

Patch attached. It's a very minimal change, so it's probably safe to
push into the 3.3 branch as well as master.

Cheers,
Mattias

And this time with the patch actually attached...
From 9f694f9585c98f486abfd8b5d126ece128a58999 Mon Sep 17 00:00:00 2001
From: Mattias Johnsson <m.t.johnsson@gmail.com>
Date: Thu, 4 Nov 2010 23:25:02 +1100
Subject: [PATCH] An opening quote should not be counted as a word by word count tool

---
 .../source/breakiterator/breakiteratorImpl.cxx     |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/i18npool/source/breakiterator/breakiteratorImpl.cxx 
b/i18npool/source/breakiterator/breakiteratorImpl.cxx
index ec21299..c47f7be 100644
--- a/i18npool/source/breakiterator/breakiteratorImpl.cxx
+++ b/i18npool/source/breakiterator/breakiteratorImpl.cxx
@@ -459,6 +459,7 @@ static UBlock2Script scriptList[] = {
     {UBLOCK_CHEROKEE, UBLOCK_RUNIC, ScriptType::LATIN},
     {UBLOCK_KHMER, UBLOCK_MONGOLIAN, ScriptType::COMPLEX},
     {UBLOCK_LATIN_EXTENDED_ADDITIONAL, UBLOCK_GREEK_EXTENDED, ScriptType::LATIN},
+    {UBLOCK_GENERAL_PUNCTUATION, UBLOCK_GENERAL_PUNCTUATION, ScriptType::LATIN},
     {UBLOCK_CJK_RADICALS_SUPPLEMENT, UBLOCK_HANGUL_SYLLABLES, ScriptType::ASIAN},
     {UBLOCK_CJK_COMPATIBILITY_IDEOGRAPHS, UBLOCK_CJK_COMPATIBILITY_IDEOGRAPHS, ScriptType::ASIAN},
     {UBLOCK_ARABIC_PRESENTATION_FORMS_A, UBLOCK_ARABIC_PRESENTATION_FORMS_A, ScriptType::COMPLEX},
-- 
1.7.1


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.