Re: [Grammar checker] Undocumented change in the API for LO 4

Stephan Bergmann <sbergman -AT- redhat.com>
Tue, 05 Mar 2013 16:42:11 +0100

I have zero insight into that area of the code, but from what I gather:

GrammarCheckingIterator::GetSuggestedEndOfSentence(rText, ...) -- whererText apparently is one single paragraph -- used to be convoluted codethat always returns rText.getLength() for the last few years, whetherthat change was intentional or not. (From<http://cgit.freedesktop.org/libreoffice/core/commit/?id=9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2>"#i103496#: split svtools; improve ConfitItems" it would look odd if itwere really intentional -- why not clean the function up to a singleline then? but who knows.)

From what I understand of linguistic/source/gciterator.cxx, the twocalls to n = GrammarCheckingIterator::GetSuggestedEndOfSentence are intwo loops that each: use that n as nSuggestedBehindEndOfSentencePositionargument to a css.linguistic2.XProofreader.doProofreading call, and thendetermine whether to do further iterations of the loop based on thereturned css.linguistic2.ProofreadingResult, esp. itsnBehindEndOfSentencePosition.

Now, it beats me why anybody designed css.linguistic2.ProofreadingResultthat way, to contain all the data already passed intocss.linguistic2.XProofreader.doProofreading anyway. But could it bethat clients that observe that "[with] LibreOffice 4, each paragraph ofa text is passed several times to [doProofreading]" fail to setnBehindEndOfSentencePosition in the css.linguistic2.ProofreadingResultthey return, to properly reflect their idea of how much they havealready consumed?


Stephan

On 03/05/2013 11:12 AM, Marcin Milkowski wrote:

what's the supposed regression, exactly? Do we have only sentences as
segmented by LO? This would be a serious drawback as ICU methods are
less than perfect, and our results are much more reliable (the
BreakIterator simply uses a static list of abbreviations which is a vast
simplification that cannot really capture a lot of ambiguous dots, so
it's broken by design).

Best,
Marcin

On Mon, Mar 4, 2013 at 9:58 PM, Németh László <nemeth@numbertext.org
<mailto:nemeth@numbertext.org>> wrote:

    Hi,

    If I right know, that was an intended change from the original author,
    Thomas Lange, supported by the contributors, eg. Marcin Miłkowski and
    Daniel Naber, for the real needs, better sentence boundary
    disambiguation and grammar checking by LanguageTool and other grammar
    checker components. So the recent state is a drawback. I suggest to
    revert it (maybe it would be fine to add some comments to the
    ProofreadingResult.idl to prevent from similar changes, too).

    Best regards,
    László

    2013/3/4 Olivier R. <olivier.noreply@gmail.com
    <mailto:olivier.noreply@gmail.com>>:
     > Caolán McNamara wrote
     >> do you get the pre LO 4 behaviour ?
     >
     > Probably.
     > With LO 3, in doProofreading:
     > - nStartOfSentencePos was always the beginning of the paragraph (=0)
     > - nSuggestedSentenceEndPos was always the end of the paragraph
    (=length of
     > rText)
     >
     > And each paragraph was passed once to the GC.
     >
     >
     >
     >> Assuming that you do, then it appears to me that the current LO4
     >> behaviour is the original programmer intent and that the
    intermediate
     >> behaviour was a bug (from the programmer intent perspective
    anyway) in
     >> whatever versions got released between
     >> 9f2fde7ab5de20926bb25a6b298b4e5dffb66eb2 and LO4
     >
     > Yes, we can assume that was the original programmer intent.
     > But it worked another way for 3 years and nobody complained about
    it. :)
     > I prefer the unintended behavior, as LO does not  assume wrongly
    what is the
     > end of sentences.
     >
     > So what LO will do?

Context

Re: [Grammar checker] Undocumented change in the API for LO 4 (continued)

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.