Michael,
We'd love to meet and discuss! Unfortunately, a lot of us are off for
break starting next week so it might be best to sync up early next year.
Would week of the 6th work for you? 8am PT/10am CT/4pm GMT any day should
work!
We started by having the relational database be a simple persistent
storage layer, when coupled with an index to retrieve data by position,
can allow us to scroll through large datasets of billions of rows at
ease. We developed a new positional index to handle insertions and
deletions in O(log(n)) -- https://arxiv.org/pdf/1708.06712.pdf. I agree
that pushing the computation to the relational database does have
overheads; but at the same time, it allows for scaling to arbitrarily
large datasets.
Ooh - nice paper. Your crawled data-set looks quite interesting
too, we
run wide-scale crash-testing on the LibreOffice code-base across ~100k
files and enlarging our corpus there: or better, getting some
statistical view of which OOXML attributes (and thus features) are most
used out there would be extremely useful to us as we develop the core.
I like the data on spreadsheet and formula shape - that is very
useful.
Do you have data on the geometry of formulae - as in rows vs. columns ?
[ we switched to columnar storage based mostly on experience rather than
hard data ;-].
It is also interesting to have access to very large (1.3m row)
data-sets that can have useful analysis done on them - would love to see
the source data there.
Again, this is something that we'd be happy to share; this might just take
a bit more work since it's an older codebase.
I believe we did use the geometry of the formulae to determine the best
storage representation, so it's there somewhere :-)
Sounds good, cf. above - if we can't make that - early in the new
year
would be great.
I look forward to talking,
Likewise!
Aditya
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.