Re: Fw: benchmark of Excel, Calc, Google Docs

Wol <antlists -AT- youngman.org.uk>
Wed, 11 Dec 2019 02:30:06 +0000

On 10/12/2019 17:30, Aditya Parameswaran wrote:

Wols,

Thanks for sharing your concerns, replete with quotes from Einstein :-) !
I believe I share those concerns... but there are of course someliberties we can take as an academic group that you folks managing apopular spreadsheet tool cannot take (e.g., doing a proof of concept asopposed to a robust implementation.) From a research/academicstandpoint it is valuable to note that something is possible, even ifthe solution is not ideal given other pragmatic considerations. BTW, Idon't believe that anything we're doing *requires* a relational database-- a NoSQL setup would work just fine.

Bear in mind I'm not a regular developer. My name is on the list ofcredits, and I follow the mailing list, but I'm more one of thoseannoying people who moans about it. However, as they say, "you wantcustomers who moan, because they're the people who want your product toimprove. It's the people who leave without moaning that you should beconcerned about".

I'd be happy to discuss more. Our goal is to understand the stumblingblocks in translating our work into spreadsheet tools such as Calc andsee how we can best help with what we've learned.

I'm more in to proving that relational databases are basically rubbish(as is the mathematical justification behind them) :-) That said, themaths itself is good, provided you don't use it to build a real-worlddatabase engine!

Okay, to demolish RDBMSs :-) I don't know which rule it is (no 1?) thatsays "Data comes in rows and columns". So your first job in buildingyour relational application is to call in a data analyst, to whom yougive the job of bashing square pegs in to round holes...

To me, data is what the user gives you. metadata is what the analystdeduces about the data. AND YOU DON'T MIX THE TWO!!!

Given that an RDBMS stores data as sets, and a lot of data comes aslists (which contain a lot of metadata) the only way to store a list inan RDBMS is to muddle data and metadata. Put mathematically, if youstore a list when you want a set, you can just throw away theinformation you don't want (or need). However, once you've thrown thatinformation away (ie the order of the items) so you can store your listin a set-based RDBMS, *you can't get it back*!

My database of choice is Pick, where the basic unit of storage is theobject, not the row. And because an object is an n-dimensional array,rather than the 1-dimensional row, I can store an entire relational viewin it. (Which is what you should do, relational maths is okay, RDBMSsaren't :-) And because I have an n-dimensional array, I can store listsof lists :-)

The problem I have at this point is that I'm now discussing the merit ofdatabases. I'm not actually helping you "translate your work intospreadsheet tools". It's easy for me to extract a two-dimensional tablefrom Pick and, truth be told, I can probably extract it faster(including massaging it into shape) from Pick than selecting it from anRDBMS! But unless we can find out some way of displaying a 3- or4-dimensional array (a relational view) in a spreadsheet that isn'thorribly confusing, I don't really know what to suggest other than touse a SQL query as the layer between the database and the spreadsheet.The database may be n-dimensional, but if your spreadsheet only has two,you might as well use a 2-dimensional query and let the database handlethe complexity. It's not that hard.

(I have - personally - managed a ram-starved 32-user Pick system thatprovided perfectly acceptable user response times despite thrashing likemad...)

(Oh, and as for pushing calculation into the database, Pick has hadvirtual fields for ages. I can define a field as being a calculation egVAT = PRICE * 20%, so I can "store price, retrieve vat" and it's justautomagical :-) (I get the impression most users are unaware of similarcapabilities in RDBMSs)


Cheers,
Wol


Cheers,
Aditya

On Mon, Dec 9, 2019 at 3:37 PM Wols Lists <antlists@youngman.org.uk<mailto:antlists@youngman.org.uk>> wrote:


    On 09/12/19 19:14, Aditya Parameswaran wrote:
     >            The idea of converting to SQL queries is an
    interesting one
     >     but I find
     >     it very hard to believe it would provide any performance
    advantage at
     >     the same memory footprint. Furthermore - I'd be interested to
    know how
     >     you do other spreadsheet operations: row & column insertion,
    addressing,
     >     and dependency work on top of a SQL database with any efficiency.
     >
     >
     > We started by having the relational database be a simple persistent
     > storage layer, when coupled with an index to retrieve data by
    position,
     > can allow us to scroll through large datasets of billions of rows at
     > ease. We developed a new positional index to handle insertions and
     > deletions in O(log(n)) -- https://arxiv.org/pdf/1708.06712.pdf. I
    agree
     > that pushing the computation to the relational database does have
     > overheads; but at the same time, it allows for scaling to arbitrarily
     > large datasets.

    "the quickest way to optimise database access is to ditch first normal
    form".

    A provocative statement I know, but I'm very much in the NoSQL camp. I
    can hunt up the details of a face-off between Oracle and Cache, where
    Oracle had to "cheat" to achieve 100K tpm (things like deferring index
    updates) whereas Cache blasted through 250K barely breaking a sweat ...
    (or it might well have been tps)

    The maths supports this ...

    That said, a spreadsheet is inherently first normal formal, so tying a
    spreadsheet and a relational database together MAY make sense.

    In general though, Einstein said "make things as simple as possible BUT
    NO SIMPLER". Relational oversimplifies the database side, which means
    the application side is over-complex in order to compensate. (Which is
    why Cache blew Oracle out of the water.)

    I'm quite happy to wax lyrical, but I'd rather not preach to an audience
    who aren't at least interested. Feel free to ask me to continue on list,
    or contact me privately, and I'll try to prove everything as
    mathematically as I can :-)

     > but at the same time, it allows for scaling to arbitrarily
     > large datasets.

    At the price of massive unnecessary complexity.

    Cheers,
    Wol


_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Context

benchmark of Excel, Calc, Google Docs · Michael Meeks
- Re: benchmark of Excel, Calc, Google Docs · Rahman, Sajjadur
  - Re: benchmark of Excel, Calc, Google Docs · Michael Meeks
    - Re: benchmark of Excel, Calc, Google Docs · Rahman, Sajjadur
- (message not available)
  - Re: Fw: benchmark of Excel, Calc, Google Docs · Aditya Parameswaran
    - Re: Fw: benchmark of Excel, Calc, Google Docs · Aditya Parameswaran
      - Re: Fw: benchmark of Excel, Calc, Google Docs · Wols Lists
        
        Re: Fw: benchmark of Excel, Calc, Google Docs · Aditya Parameswaran
        
        Re: Fw: benchmark of Excel, Calc, Google Docs · Wol
        
        Re: benchmark of Excel, Calc, Google Docs · Chris Sherlock
    - Re: Fw: benchmark of Excel, Calc, Google Docs · Michael Meeks
      - Re: Fw: benchmark of Excel, Calc, Google Docs · Aditya Parameswaran
        
        Re: Fw: benchmark of Excel, Calc, Google Docs · Aditya Parameswaran
- Re: benchmark of Excel, Calc, Google Docs · Kohei Yoshida

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.