Date: prev next · Thread: first prev next last
2017 Archives by date, by thread · List index


On 17.08.2017 17:10, Ashod Nakashian wrote:
Hi Thorsten,

On Wed, Aug 16, 2017 at 5:22 AM, Thorsten Behrens <thb@libreoffice.org
<mailto:thb@libreoffice.org>> wrote:

    Miklos Vajna wrote:
    > The idea is that per-paragraph signature should be non-chained, similar
    > to per-document signatures, so the Writer field(s) representing the
    > signature(s) should be filtered out before hashing, but otherwise this
    > just takes the paragraph text as-is. (My understanding is that ODF
    > specifies what is the exact paragraph string for a <text:p> element.)
    >
    Hi Miklos,

    ok - as long as that could be described (or pseudo code given),
    that'll do I guess. Just be aware that text:p can still be quite
    complex in xml, with whitespace mangling & all sorts of child elements
    (see paragraph-content-or-hyperlink / paragraph-content in the
    schema).


The code currently in master was a temporary first step. The logic I
currently have locally ready to push soon is to only use Text portions. 

Roughly as follows:

  OUStringBuffer strBuf;
  for (auto& portion : paragraphTextPortions) {
      if (portion.TextPortionType == "Text")
          strBuf.append(portion.Text);
  }
  sign(strBuf.makeStringAndClear());

I expect this should exclude any unwanted fields/characters/LO-specific
conversions etc.

Let me know if there are concerns with this approach.

there are some other portions that, depending on what you want to do,
could be interpreted as containing text:

* "TextField" "generates" text
* "Frame" references paragraphs which contain text
* "Footnote" references paragraphs which contain text
* "InContentMetadata" contains text that is in the paragraph, but you
  have to recursively enumerate its text portions to get at it, it's not
  in the paragraph's enumeration
  (your use case makes me regret that choice of API representation)
* "TextField" may be a "MetadataField" which doesn't generate text but
  has to be recursively enumerated just like "InContentMetadata"
* "Annotation" references (editengine) paragraphs which contain text

there are various other functions to get "cleaned up" text from a
paragraph, such as SwTextNode::GetExpandText() and class
ModelToViewHelper but i'm not even sure why there are several different
ones and when to use which one.


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.