Date: prev next · Thread: first prev next last
2019 Archives by date, by thread · List index


Michael, just checking in to see what might be a good time to chat. We're
excited to connect!

Aditya

On Fri, Dec 13, 2019 at 2:22 PM Aditya Parameswaran <adityagp@berkeley.edu>
wrote:

Michael,

We'd love to meet and discuss!  Unfortunately, a lot of us are off for
break starting next week so it might be best to sync up early next year.
Would week of the 6th work for you? 8am PT/10am CT/4pm GMT any day should
work!

We started by having the relational database be a simple persistent
storage layer, when coupled with an index to retrieve data by position,
can allow us to scroll through large datasets of billions of rows at
ease. We developed a new positional index to handle insertions and
deletions in O(log(n)) -- https://arxiv.org/pdf/1708.06712.pdf. I agree
that pushing the computation to the relational database does have
overheads; but at the same time, it allows for scaling to arbitrarily
large datasets.

        Ooh - nice paper. Your crawled data-set looks quite interesting
too, we
run wide-scale crash-testing on the LibreOffice code-base across ~100k
files and enlarging our corpus there: or better, getting some
statistical view of which OOXML attributes (and thus features) are most
used out there would be extremely useful to us as we develop the core.

        I like the data on spreadsheet and formula shape - that is very
useful.
Do you have data on the geometry of formulae - as in rows vs. columns ?
[ we switched to columnar storage based mostly on experience rather than
hard data ;-].

        It is also interesting to have access to very large (1.3m row)
data-sets that can have useful analysis done on them - would love to see
the source data there.


Again, this is something that we'd be happy to share; this might just take
a bit more work since it's an older codebase.
I believe we did use the geometry of the formulae to determine the best
storage representation, so it's there somewhere :-)

        Sounds good, cf. above - if we can't make that - early in the new
year
would be great.

        I look forward to talking,


Likewise!

Aditya


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.