On Thu, 2017-08-24 at 15:33 +0530, Dennis Francis wrote:
Hello,
As this bug report (https://bugs.documentfoundation.org/show_bug.cgi?
id=109097) suggests
it is highly desirable to improve the lookup time in
multi_type_vector (currently in worst case
it takes O(N)) without position hints. One way to improve the
situation is to store absolute
position of first element of each block in the block instead of the
block size member. This
can improve the lookup time up to log(log(N)) using linear
interpolation search or at least log(N)
if we fallback to binary search after a few initial passes of linear-
interpolation search.
Agreed. That's certainly one way to improve the situation, and I'm
pretty sure that's the direction you guys are going (or hoping to go,
with our blessing).
But I have to point out that, the bug you are referencing may not be
the best one to reference to push your agenda, since that performance
issue described in the bug report can be solved by switching to using
position hints, and modifying the existing code that accesses and
updates the matrix content. That's precisely what Markus was
suggesting and what we've been doing traditionally prior to this
conversation.
I'd rather use the argument that, the change you are proposing does
make sense from the point of view of making mdds::multi_type_vector
more friendly to concurrent read-only access from multiple threads. In
that light, use of position hints may not be desirable since it would
incur additional costs due to the position hints being additional
states that we would need to keep and share between multiple threads.
That would require thread locking semantics, which you guys are
understandably trying to avoid.
Down side of this approach is that in the worst case, insert/deletes
incur O(N) cost because
absolute positions stored in each block subsequent to the
delete/insert block have to be changed,
This downside is also correct. We should also note that in this
scenario, N refers to the number of blocks, not the number of cells,
the former of which can be substantially lower in typical real-life
scenarios than the latter. One good news is that, this position update
can potentially can concurrently if necessary since such operation only
modifies the size of one block at any given moment, and the delta size
can be applied to all the subsequent blocks independently.
I'm not suggesting we should make it multi-threaded from the get-go
(I'd rather you not do that, at least not without some benchmarking),
but when the worst comes to worst we could go that route.
Before working on any change from my side, I would love to hear
everyone's thoughts on the topic first. Thanks !
So, my short answer is "it makes sense". There are some risks involved
especially with regard to the potential cost on insertions and
deletions which we just don't know ahead of time. But my gut feeling
tells me that it'll probably be "fine for most common use cases".
But I have to warn you; making this change would not be a walk in the
park. :-) multi_type_vector has tons of code branches to deal with all
sorts of situations, and you need to be prepared to tackle all of those
branches. I hope you are mentally prepared for it. :-)
That's all I gotta say. Please keep me in the loop.
Kohei
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.