Re: tdf#109097 - mdds accelerating random lookups

Kohei Yoshida <libreoffice -AT- kohei.us>
Sat, 26 Aug 2017 09:39:55 -0400

On Thu, 2017-08-24 at 15:33 +0530, Dennis Francis wrote:

Hello,

As this bug report (https://bugs.documentfoundation.org/show_bug.cgi?
id=109097) suggests
it is highly desirable to improve the lookup time in
multi_type_vector (currently in worst case 
it takes O(N)) without position hints. One way to improve the
situation is to store absolute 
position of first element of each block in the block instead of the
block size member. This 
can improve the lookup time up to log(log(N)) using linear
interpolation search or at least log(N) 
if we fallback to binary search after a few initial passes of linear-
interpolation search.


Agreed.  That's certainly one way to improve the situation, and I'm
pretty sure that's the direction you guys are going (or hoping to go,
with our blessing).

But I have to point out that, the bug you are referencing may not be
the best one to reference to push your agenda, since that performance
issue described in the bug report can be solved by switching to using
position hints, and modifying the existing code that accesses and
updates the matrix content.  That's precisely what Markus was
suggesting and what we've been doing traditionally prior to this
conversation.

I'd rather use the argument that, the change you are proposing does
make sense from the point of view of making mdds::multi_type_vector
more friendly to concurrent read-only access from multiple threads.  In
that light, use of position hints may not be desirable since it would
incur additional costs due to the position hints being additional
states that we would need to keep and share between multiple threads.
That would require thread locking semantics, which you guys are
understandably trying to avoid.


Down side of this approach is that in the worst case, insert/deletes
incur O(N) cost because
absolute positions stored in each block subsequent to the
delete/insert block have to be changed,


This downside is also correct.  We should also note that in this
scenario, N refers to the number of blocks, not the number of cells,
the former of which can be substantially lower in typical real-life
scenarios than the latter.  One good news is that, this position update
can potentially can concurrently if necessary since such operation only
modifies the size of one block at any given moment, and the delta size
can be applied to all the subsequent blocks independently.

I'm not suggesting we should make it multi-threaded from the get-go
(I'd rather you not do that, at least not without some benchmarking),
but when the worst comes to worst we could go that route.

Before working on any change from my side, I would love to hear
everyone's thoughts on the topic first. Thanks !


So, my short answer is "it makes sense".  There are some risks involved
especially with regard to the potential cost on insertions and
deletions which we just don't know ahead of time.  But my gut feeling
tells me that it'll probably be "fine for most common use cases".

But I have to warn you; making this change would not be a walk in the
park. :-)  multi_type_vector has tons of code branches to deal with all
sorts of situations, and you need to be prepared to tackle all of those
branches.  I hope you are mentally prepared for it. :-)

That's all I gotta say.  Please keep me in the loop.

Kohei

Context

tdf#109097 - mdds accelerating random lookups · Dennis Francis
- Re: tdf#109097 - mdds accelerating random lookups · Michael Meeks
- Re: tdf#109097 - mdds accelerating random lookups · Eike Rathke
  - Re: tdf#109097 - mdds accelerating random lookups · Michael Meeks
- Re: tdf#109097 - mdds accelerating random lookups · Kohei Yoshida
  - Re: tdf#109097 - mdds accelerating random lookups · Kohei Yoshida
- Re: tdf#109097 - mdds accelerating random lookups · Noel Grandin
  - Re: tdf#109097 - mdds accelerating random lookups · Michael Meeks
    - Re: tdf#109097 - mdds accelerating random lookups · Kohei Yoshida

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.