Date: prev next · Thread: first prev next last
2019 Archives by date, by thread · List index


Thank you Kohei for the information!

Noel mentioned that much of Writer’s current implementation covers the conversion of HTML into 
Writer’s DOM. How complete is that step? It sounds like an iterative approach to me: once the DOM 
tree has been replicated, the CSS needs to be cascaded accordingly—and that step seems to be rather 
incomplete?

Again, I haven’t had any time to noodle through the Writer code so I am just trying to put together 
a picture based on this discussion…

Many greetings,
Jens


On Jan 13, 2019, at 03:46, Kohei Yoshida <libreoffice@kohei.us> wrote:

I believe that Kohei started doing some parsing work over in the orcus
library at
  https://gitlab.com/orcus/orcus
and we use some of that (e.g. very very basic CSS parsing) somewhere in our
code.

Just to clarify on this a bit.  The orcus library provides a C++ template based CSS parser which 
supports a pretty wide variety of the current CSS feature set.  It's not 100% feature complete, 
but it does handle more than just a basic set of CSS structures, to say the least.

We currently use that orcus CSS parser to handle some very basic cell formatting imports in Calc 
for now, but that can be extended if needed.

Now, on the Writer side it's a different story.  There are *some* code sharing between Writer and 
Calc wrt HTML parsing, but the CSS parsing code is not shared between the two.  AFAICR Writer has 
its own CSS parser that does not use the orcus CSS parser, and nobody is maintaining that code 
right now.

But for normal HTML we still use our own parser.

Yup.

And the parsing is only a very small part anyhow, most of the work is in
converting the HTML model to our own document model.

Yes, and that part is handled independently between Writer and Calc.

Kohei

--
Kohei Yoshida, LibreOffice Calc volunteer hacker

--
Jens Tröger
http://savage.light-speed.de/


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.