Joe Smith wrote:
I've looked at the code a bit, and it seems like there is indeed only one point of contact with the rest of the suite, textsearch.cxx, which handles all types of text searches (normal, regexp & fuzzy), and calls Regexpr::re_search(), which calls re_match2() to run the actual regexp match. So the structure makes it easy to replace the regexp code in one place. Unfortunately, the way the functions work does not match well with the Boost RE classes, although I'm sure it would be possible with an interface layer. For example, the Boost engine handles locale-specific issues internally, whereas OOo's engine knows almost nothing about character case or multi-character sequences. Instead, it preps the text to be searched by running it through a filter. I don't understand the i18n & character encoding issues well enough to guess what that filter is actually doing or how it should be handled.
Hi Joe, hm - then I think a combination of those two approaches might be a winning strategy - LibO uses icu for all those nifty transliteration stuff & what not. I notice that newer boost versions also optionally support icu, maybe that already gives us good enough coverage - I'd be tempted to just give it a whirl, and add it as an optional, experimental feature to have people play with it. Cheers, -- Thorsten
Attachment:
pgpCuAJpUxlkj.pgp
Description: PGP signature