Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


Hi all,

Thanks for all the replies and comments.

Attached is a new bunch of patches against master. I've reworked the helpindexer.cpp code so that it can be used as a library, and I changed xmlhelp/source/cxxhelp/provider/databases.cxx to call it.

The good news is that I think this gets rid of the Java invocation on startup. The bad news is that this breaks the build, as I explain below. I attach these work-in-progress patches anyway, because I won't get around to working on this for a few days at least.

1. I converted the HelpIndexer from C++'s std::string and std::wstring to rtl::UOString. This created a new problem (HelpIndexer.cxx:106) of how to convert the rtl::UOString to the TCHAR* that CLucene needs. How can I convert a UOString to a TCHAR* (wchar_t*) in a way that won't break platform independence? This currently garbles the "path" field in the index.

2. In xmlhelp/source/cxxhelp/provider/makefile.mk, I've hacked the include path to include l10ntools/source/help, probably not too good of an idea. I also don't know how to link in the HelpIndexer.o file from xmlhelp (or how to create a .so for it that can be found by xmlhelp).

3. The conversion from using UNIX dirent.h and friends to using 'sal' still needs to happen, and I think that will help get rid of some awkward string conversions too.

4. The patch assumes both libclucene-core and libclucene-contribs-lib are available from pkg-config. Disable the '#define TODO' and the relevant line in the Makefile to only depend on libclucene-core.

Cheers,

Gert

On 02/14/2012 05:24 PM, Caolán McNamara wrote:
On Tue, 2012-02-14 at 17:04 +0100, G.H.M.Valkenhoef, van wrote:

I noticed that CJK-based indexing is only enabled for the Japanese
language. Maybe this can be fixed by adding more languages to be
CJK-indexed.
Indeed, opengrok for "CJKAnalyzer" and see if running zh-* (and possibly
ko) through org.apache.lucene.analysis.cjk.CJKAnalyzer makes a
difference.

Which sadly might mean we need the clucene version of that too :-)

C.


From acd382ec5ca930df837ceac00df8fd181b38cac4 Mon Sep 17 00:00:00 2001
From: Gert van Valkenhoef <g.h.m.van.valkenhoef@rug.nl>
Date: Tue, 14 Feb 2012 19:31:18 +0100
Subject: [PATCH 1/3] Add C++ HelpIndexer

---
 l10ntools/prj/build.lst               |    2 +-
 l10ntools/prj/d.lst                   |    6 +-
 l10ntools/source/help/helpindexer.cxx |  247 +++++++++++++++++++++++++++++++++
 l10ntools/source/help/makefile.mk     |   30 ++---
 4 files changed, 263 insertions(+), 22 deletions(-)
 create mode 100644 l10ntools/source/help/helpindexer.cxx

diff --git a/l10ntools/prj/build.lst b/l10ntools/prj/build.lst
index ed919a5..8e3ea70 100644
--- a/l10ntools/prj/build.lst
+++ b/l10ntools/prj/build.lst
@@ -1,4 +1,4 @@
-tr l10ntools : BERKELEYDB:berkeleydb EXPAT:expat LIBXSLT:libxslt LUCENE:lucene sal NULL
+tr l10ntools : BERKELEYDB:berkeleydb EXPAT:expat LIBXSLT:libxslt sal NULL
 tr     l10ntools                                               usr1    -       all     tr_mkout 
NULL
 tr     l10ntools\inc                                   nmake   -       all     tr_inc NULL
 tr     l10ntools\source                                        nmake   -       all     tr_src 
tr_inc NULL
diff --git a/l10ntools/prj/d.lst b/l10ntools/prj/d.lst
index eded848..174bb6c 100644
--- a/l10ntools/prj/d.lst
+++ b/l10ntools/prj/d.lst
@@ -26,12 +26,14 @@ mkdir: %_DEST%\bin\help\com\sun\star\help
 ..\%__SRC%\bin\txtconv %_DEST%\bin\txtconv
 ..\%__SRC%\bin\ulfconv %_DEST%\bin\ulfconv
 ..\%__SRC%\class\FCFGMerge.jar %_DEST%\bin\FCFGMerge.jar
-..\%__SRC%\class\HelpIndexerTool.jar %_DEST%\bin\HelpIndexerTool.jar
-..\%__SRC%\bin\HelpLinker %_DEST%\bin\HelpLinker
 ..\%__SRC%\bin\HelpCompiler %_DEST%\bin\HelpCompiler
 ..\%__SRC%\bin\HelpCompiler.exe %_DEST%\bin\HelpCompiler.exe
+..\%__SRC%\bin\HelpLinker %_DEST%\bin\HelpLinker
 ..\%__SRC%\bin\HelpLinker.exe %_DEST%\bin\HelpLinker.exe
 ..\%__SRC%\bin\HelpLinker* %_DEST%\bin
+..\%__SRC%\bin\HelpIndexer %_DEST%\bin\HelpIndexer
+..\%__SRC%\bin\HelpIndexer.exe %_DEST%\bin\HelpIndexer.exe
+..\%__SRC%\bin\HelpIndexer* %_DEST%\bin
 
 ..\scripts\localize %_DEST%\bin\localize
 ..\scripts\fast_merge.pl %_DEST%\bin\fast_merge.pl
diff --git a/l10ntools/source/help/helpindexer.cxx b/l10ntools/source/help/helpindexer.cxx
new file mode 100644
index 0000000..c327119
--- /dev/null
+++ b/l10ntools/source/help/helpindexer.cxx
@@ -0,0 +1,247 @@
+#include <CLucene/StdHeader.h>
+#include <CLucene.h>
+#ifdef TODO
+#include <CLucene/analysis/LanguageBasedAnalyzer.h>
+#endif
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <errno.h>
+#include <string.h>
+
+#include <string>
+#include <iostream>
+#include <algorithm>
+#include <set>
+
+// I assume that TCHAR is defined as wchar_t throughout
+
+using namespace lucene::document;
+
+class HelpIndexer {
+       private:
+               std::string d_lang;
+               std::string d_module;
+               std::string d_captionDir;
+               std::string d_contentDir;
+               std::string d_indexDir;
+               std::string d_error;
+               std::set<std::string> d_files;
+
+       public:
+
+       /**
+        * @param lang Help files language.
+        * @param module The module of the helpfiles.
+        * @param captionDir The directory to scan for caption files.
+        * @param contentDir The directory to scan for content files.
+        * @param indexDir The directory to write the index to.
+        */
+       HelpIndexer(std::string const &lang, std::string const &module,
+               std::string const &captionDir, std::string const &contentDir,
+               std::string const &indexDir);
+
+       /**
+        * Run the indexer.
+        * @return true if index successfully generated.
+        */
+       bool indexDocuments();
+
+       /**
+        * Get the error string (empty if no error occurred).
+        */
+       std::string const & getErrorMessage();
+
+       private:
+
+       /**
+        * Scan the caption & contents directories for help files.
+        */
+       bool scanForFiles();
+
+       /**
+        * Scan for files in the given directory.
+        */
+       bool scanForFiles(std::string const &path);
+
+       /**
+        * Fill the Document with information on the given help file.
+        */
+       bool helpDocument(std::string const & fileName, Document *doc);
+
+       /**
+        * Create a reader for the given file, and create an "empty" reader in case the file 
doesn't exist.
+        */
+       lucene::util::Reader *helpFileReader(std::string const & path);
+
+       std::wstring string2wstring(std::string const &source);
+};
+
+HelpIndexer::HelpIndexer(std::string const &lang, std::string const &module,
+       std::string const &captionDir, std::string const &contentDir, std::string const &indexDir) :
+d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), 
d_indexDir(indexDir), d_error(""), d_files() {}
+
+bool HelpIndexer::indexDocuments() {
+       if (!scanForFiles()) {
+               return false;
+       }
+
+#ifdef TODO
+       // Construct the analyzer appropriate for the given language
+       lucene::analysis::Analyzer *analyzer = (
+               d_lang.compare("ja") == 0 ?
+               (lucene::analysis::Analyzer*)new lucene::analysis::LanguageBasedAnalyzer(L"cjk") :
+               (lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
+#else
+       lucene::analysis::Analyzer *analyzer = (
+               (lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
+#endif
+
+       lucene::index::IndexWriter writer(d_indexDir.c_str(), analyzer, true);
+
+       // Index the identified help files
+       Document doc;
+       for (std::set<std::string>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
+               doc.clear();
+               if (!helpDocument(*i, &doc)) {
+                       delete analyzer;
+                       return false;
+               }
+               writer.addDocument(&doc);
+       }
+
+       // Optimize the index
+       writer.optimize();
+
+       delete analyzer;
+       return true;
+}
+
+std::string const & HelpIndexer::getErrorMessage() {
+       return d_error;
+}
+
+bool HelpIndexer::scanForFiles() {
+       if (!scanForFiles(d_contentDir)) {
+               return false;
+       }
+       if (!scanForFiles(d_captionDir)) {
+               return false;
+       }
+       return true;
+}
+
+bool HelpIndexer::scanForFiles(std::string const & path) {
+       DIR *dir = opendir(path.c_str());
+       if (dir == 0) {
+               d_error = "Error reading directory " + path + strerror(errno);
+               return true;
+       }
+
+       struct dirent *ent;
+       struct stat info;
+       while ((ent = readdir(dir)) != 0) {
+               if (stat((path + "/" + ent->d_name).c_str(), &info) == 0 && S_ISREG(info.st_mode)) {
+                       d_files.insert(ent->d_name);
+               }
+       }
+
+       closedir(dir);
+
+       return true;
+}
+
+bool HelpIndexer::helpDocument(std::string const & fileName, Document *doc) {
+       // Add the help path as an indexed, untokenized field.
+       std::wstring path(L"#HLP#" + string2wstring(d_module) + L"/" + string2wstring(fileName));
+       doc->add(*new Field(_T("path"), path.c_str(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
+
+       // Add the caption as a field.
+       std::string captionPath = d_captionDir + "/" + fileName;
+       doc->add(*new Field(_T("caption"), helpFileReader(captionPath), Field::STORE_NO | 
Field::INDEX_TOKENIZED));
+       // FIXME: does the Document take responsibility for the FileReader or should I free it 
somewhere?
+
+       // Add the content as a field.
+       std::string contentPath = d_contentDir + "/" + fileName;
+       doc->add(*new Field(_T("content"), helpFileReader(contentPath), Field::STORE_NO | 
Field::INDEX_TOKENIZED));
+       // FIXME: does the Document take responsibility for the FileReader or should I free it 
somewhere?
+
+       return true;
+}
+
+lucene::util::Reader *HelpIndexer::helpFileReader(std::string const & path) {
+       if (access(path.c_str(), R_OK) == 0) {
+               return new lucene::util::FileReader(path.c_str(), "UTF-8");
+       } else {
+               return new lucene::util::StringReader(L"");
+       }
+}
+
+std::wstring HelpIndexer::string2wstring(std::string const &source) {
+       std::wstring target(source.length(), L' ');
+       std::copy(source.begin(), source.end(), target.begin());
+       return target;
+}
+
+int main(int argc, char **argv) {
+       const std::string pLang("-lang");
+       const std::string pModule("-mod");
+       const std::string pOutDir("-zipdir");
+       const std::string pSrcDir("-srcdir");
+
+       std::string lang;
+       std::string module;
+       std::string srcDir;
+       std::string outDir;
+
+       bool error = false;
+       for (int i = 1; i < argc; ++i) {
+               if (pLang.compare(argv[i]) == 0) {
+                       if (i + 1 < argc) {
+                               lang = argv[++i];
+                       } else {
+                               error = true;
+                       }
+               } else if (pModule.compare(argv[i]) == 0) {
+                       if (i + 1 < argc) {
+                               module = argv[++i];
+                       } else {
+                               error = true;
+                       }
+               } else if (pOutDir.compare(argv[i]) == 0) {
+                       if (i + 1 < argc) {
+                               outDir = argv[++i];
+                       } else {
+                               error = true;
+                       }
+               } else if (pSrcDir.compare(argv[i]) == 0) {
+                       if (i + 1 < argc) {
+                               srcDir = argv[++i];
+                       } else {
+                               error = true;
+                       }
+               } else {
+                       error = true;
+               }
+       }
+
+       if (error) {
+               std::cerr << "Error parsing command-line arguments" << std::endl;
+       }
+
+       if (error || lang.empty() || module.empty() || srcDir.empty() || outDir.empty()) {
+               std::cerr << "Usage: HelpIndexer -lang ISOLangCode -mod HelpModule -srcdir 
SourceDir -zipdir OutputDir" << std::endl;
+               return 1;
+       }
+
+       std::string captionDir(srcDir + "/caption");
+       std::string contentDir(srcDir + "/content");
+       std::string indexDir(outDir + "/" + module + ".idxl");
+       HelpIndexer indexer(lang, module, captionDir, contentDir, indexDir);
+       if (!indexer.indexDocuments()) {
+               std::cerr << indexer.getErrorMessage() << std::endl;
+               return 2;
+       }
+       return 0;
+}
diff --git a/l10ntools/source/help/makefile.mk b/l10ntools/source/help/makefile.mk
index bab01b8..e22c6a3 100644
--- a/l10ntools/source/help/makefile.mk
+++ b/l10ntools/source/help/makefile.mk
@@ -60,8 +60,10 @@ SLOFILES=\
 EXCEPTIONSFILES=\
         $(OBJ)$/HelpLinker.obj \
         $(OBJ)$/HelpCompiler.obj \
+        $(OBJ)$/helpindexer.obj \
         $(SLO)$/HelpLinker.obj \
         $(SLO)$/HelpCompiler.obj
+
 .IF "$(OS)" == "MACOSX" && "$(CPU)" == "P" && "$(COM)" == "GCC"
 # There appears to be a GCC 4.0.1 optimization error causing _file:good() to
 # report true right before the call to writeOut at HelpLinker.cxx:1.12 l. 954
@@ -72,6 +74,9 @@ NOOPTFILES=\
         $(SLO)$/HelpLinker.obj
 .ENDIF
 
+PKGCONFIG_MODULES=libclucene-core
+.INCLUDE : pkg_config.mk
+
 APP1TARGET= $(TARGET)
 APP1OBJS=\
       $(OBJ)$/HelpLinker.obj \
@@ -79,6 +84,12 @@ APP1OBJS=\
 APP1RPATH = NONE
 APP1STDLIBS+=$(SALLIB) $(BERKELEYLIB) $(XSLTLIB) $(EXPATASCII3RDLIB)
 
+APP2TARGET=HelpIndexer
+APP2OBJS=\
+      $(OBJ)$/helpindexer.obj
+APP2RPATH = NONE
+APP2STDLIBS+=$(SALLIB) $(PKGCONFIG_LIBS)
+
 SHL1TARGET     =$(LIBBASENAME)$(DLLPOSTFIX)
 SHL1LIBS=      $(SLB)$/$(TARGET).lib
 .IF "$(COM)" == "MSC"
@@ -93,26 +104,7 @@ SHL1USE_EXPORTS     =ordinal
 DEF1NAME       =$(SHL1TARGET) 
 DEFLIB1NAME    =$(TARGET)
 
-JAVAFILES = \
-    HelpIndexerTool.java                               \
-    HelpFileDocument.java
-
-
-JAVACLASSFILES = \
-    $(CLASSDIR)$/$(PACKAGE)$/HelpIndexerTool.class                             \
-    $(CLASSDIR)$/$(PACKAGE)$/HelpFileDocument.class
 
-.IF "$(SYSTEM_LUCENE)" == "YES"
-EXTRAJARFILES += $(LUCENE_CORE_JAR) $(LUCENE_ANALYZERS_JAR)
-.ELSE
-JARFILES += lucene-core-2.3.jar lucene-analyzers-2.3.jar
-.ENDIF
-JAVAFILES = $(subst,$(CLASSDIR)$/$(PACKAGE)$/, $(subst,.class,.java $(JAVACLASSFILES)))
-
-JARCLASSDIRS      = $(PACKAGE)/*
-JARTARGET             = HelpIndexerTool.jar
-JARCOMPRESS        = TRUE 
- 
 # --- Targets ------------------------------------------------------
 
 .INCLUDE :  target.mk
-- 
1.7.0.4

From 7388ee77361a1f8dad84b98306cbfe92c9a7ca3c Mon Sep 17 00:00:00 2001
From: Gert van Valkenhoef <g.h.m.van.valkenhoef@rug.nl>
Date: Tue, 14 Feb 2012 20:19:37 +0100
Subject: [PATCH 2/3] Separate HelpIndexer into header, implementation, and main

---
 l10ntools/source/help/HelpIndexer.cxx      |  123 ++++++++++++++
 l10ntools/source/help/HelpIndexer.hxx      |   71 ++++++++
 l10ntools/source/help/HelpIndexer_main.cxx |   66 ++++++++
 l10ntools/source/help/helpindexer.cxx      |  247 ----------------------------
 l10ntools/source/help/makefile.mk          |    8 +-
 5 files changed, 265 insertions(+), 250 deletions(-)
 create mode 100644 l10ntools/source/help/HelpIndexer.cxx
 create mode 100644 l10ntools/source/help/HelpIndexer.hxx
 create mode 100644 l10ntools/source/help/HelpIndexer_main.cxx
 delete mode 100644 l10ntools/source/help/helpindexer.cxx

diff --git a/l10ntools/source/help/HelpIndexer.cxx b/l10ntools/source/help/HelpIndexer.cxx
new file mode 100644
index 0000000..ed0ce39
--- /dev/null
+++ b/l10ntools/source/help/HelpIndexer.cxx
@@ -0,0 +1,123 @@
+#include "HelpIndexer.hxx"
+
+#define TODO
+
+#ifdef TODO
+#include <CLucene/analysis/LanguageBasedAnalyzer.h>
+#endif
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <errno.h>
+#include <string.h>
+
+#include <algorithm>
+
+using namespace lucene::document;
+
+HelpIndexer::HelpIndexer(std::string const &lang, std::string const &module,
+       std::string const &captionDir, std::string const &contentDir, std::string const &indexDir) :
+d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), 
d_indexDir(indexDir), d_error(""), d_files() {}
+
+bool HelpIndexer::indexDocuments() {
+       if (!scanForFiles()) {
+               return false;
+       }
+
+#ifdef TODO
+       // Construct the analyzer appropriate for the given language
+       lucene::analysis::Analyzer *analyzer = (
+               d_lang.compare("ja") == 0 ?
+               (lucene::analysis::Analyzer*)new lucene::analysis::LanguageBasedAnalyzer(L"cjk") :
+               (lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
+#else
+       lucene::analysis::Analyzer *analyzer = (
+               (lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
+#endif
+
+       lucene::index::IndexWriter writer(d_indexDir.c_str(), analyzer, true);
+
+       // Index the identified help files
+       Document doc;
+       for (std::set<std::string>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
+               doc.clear();
+               if (!helpDocument(*i, &doc)) {
+                       delete analyzer;
+                       return false;
+               }
+               writer.addDocument(&doc);
+       }
+
+       // Optimize the index
+       writer.optimize();
+
+       delete analyzer;
+       return true;
+}
+
+std::string const & HelpIndexer::getErrorMessage() {
+       return d_error;
+}
+
+bool HelpIndexer::scanForFiles() {
+       if (!scanForFiles(d_contentDir)) {
+               return false;
+       }
+       if (!scanForFiles(d_captionDir)) {
+               return false;
+       }
+       return true;
+}
+
+bool HelpIndexer::scanForFiles(std::string const & path) {
+       DIR *dir = opendir(path.c_str());
+       if (dir == 0) {
+               d_error = "Error reading directory " + path + strerror(errno);
+               return true;
+       }
+
+       struct dirent *ent;
+       struct stat info;
+       while ((ent = readdir(dir)) != 0) {
+               if (stat((path + "/" + ent->d_name).c_str(), &info) == 0 && S_ISREG(info.st_mode)) {
+                       d_files.insert(ent->d_name);
+               }
+       }
+
+       closedir(dir);
+
+       return true;
+}
+
+bool HelpIndexer::helpDocument(std::string const & fileName, Document *doc) {
+       // Add the help path as an indexed, untokenized field.
+       std::wstring path(L"#HLP#" + string2wstring(d_module) + L"/" + string2wstring(fileName));
+       doc->add(*new Field(_T("path"), path.c_str(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
+
+       // Add the caption as a field.
+       std::string captionPath = d_captionDir + "/" + fileName;
+       doc->add(*new Field(_T("caption"), helpFileReader(captionPath), Field::STORE_NO | 
Field::INDEX_TOKENIZED));
+       // FIXME: does the Document take responsibility for the FileReader or should I free it 
somewhere?
+
+       // Add the content as a field.
+       std::string contentPath = d_contentDir + "/" + fileName;
+       doc->add(*new Field(_T("content"), helpFileReader(contentPath), Field::STORE_NO | 
Field::INDEX_TOKENIZED));
+       // FIXME: does the Document take responsibility for the FileReader or should I free it 
somewhere?
+
+       return true;
+}
+
+lucene::util::Reader *HelpIndexer::helpFileReader(std::string const & path) {
+       if (access(path.c_str(), R_OK) == 0) {
+               return new lucene::util::FileReader(path.c_str(), "UTF-8");
+       } else {
+               return new lucene::util::StringReader(L"");
+       }
+}
+
+std::wstring HelpIndexer::string2wstring(std::string const &source) {
+       std::wstring target(source.length(), L' ');
+       std::copy(source.begin(), source.end(), target.begin());
+       return target;
+}
diff --git a/l10ntools/source/help/HelpIndexer.hxx b/l10ntools/source/help/HelpIndexer.hxx
new file mode 100644
index 0000000..56122e7
--- /dev/null
+++ b/l10ntools/source/help/HelpIndexer.hxx
@@ -0,0 +1,71 @@
+#ifndef HELPINDEXER_HXX
+#define HELPINDEXER_HXX
+
+#include <CLucene/StdHeader.h>
+#include <CLucene.h>
+
+#include <string>
+#include <set>
+
+// I assume that TCHAR is defined as wchar_t throughout
+
+class HelpIndexer {
+       private:
+               std::string d_lang;
+               std::string d_module;
+               std::string d_captionDir;
+               std::string d_contentDir;
+               std::string d_indexDir;
+               std::string d_error;
+               std::set<std::string> d_files;
+
+       public:
+
+       /**
+        * @param lang Help files language.
+        * @param module The module of the helpfiles.
+        * @param captionDir The directory to scan for caption files.
+        * @param contentDir The directory to scan for content files.
+        * @param indexDir The directory to write the index to.
+        */
+       HelpIndexer(std::string const &lang, std::string const &module,
+               std::string const &captionDir, std::string const &contentDir,
+               std::string const &indexDir);
+
+       /**
+        * Run the indexer.
+        * @return true if index successfully generated.
+        */
+       bool indexDocuments();
+
+       /**
+        * Get the error string (empty if no error occurred).
+        */
+       std::string const & getErrorMessage();
+
+       private:
+
+       /**
+        * Scan the caption & contents directories for help files.
+        */
+       bool scanForFiles();
+
+       /**
+        * Scan for files in the given directory.
+        */
+       bool scanForFiles(std::string const &path);
+
+       /**
+        * Fill the Document with information on the given help file.
+        */
+       bool helpDocument(std::string const & fileName, lucene::document::Document *doc);
+
+       /**
+        * Create a reader for the given file, and create an "empty" reader in case the file 
doesn't exist.
+        */
+       lucene::util::Reader *helpFileReader(std::string const & path);
+
+       std::wstring string2wstring(std::string const &source);
+};
+
+#endif
diff --git a/l10ntools/source/help/HelpIndexer_main.cxx b/l10ntools/source/help/HelpIndexer_main.cxx
new file mode 100644
index 0000000..a1dd50b
--- /dev/null
+++ b/l10ntools/source/help/HelpIndexer_main.cxx
@@ -0,0 +1,66 @@
+#include "HelpIndexer.hxx"
+
+#include <string>
+#include <iostream>
+
+int main(int argc, char **argv) {
+       const std::string pLang("-lang");
+       const std::string pModule("-mod");
+       const std::string pOutDir("-zipdir");
+       const std::string pSrcDir("-srcdir");
+
+       std::string lang;
+       std::string module;
+       std::string srcDir;
+       std::string outDir;
+
+       bool error = false;
+       for (int i = 1; i < argc; ++i) {
+               if (pLang.compare(argv[i]) == 0) {
+                       if (i + 1 < argc) {
+                               lang = argv[++i];
+                       } else {
+                               error = true;
+                       }
+               } else if (pModule.compare(argv[i]) == 0) {
+                       if (i + 1 < argc) {
+                               module = argv[++i];
+                       } else {
+                               error = true;
+                       }
+               } else if (pOutDir.compare(argv[i]) == 0) {
+                       if (i + 1 < argc) {
+                               outDir = argv[++i];
+                       } else {
+                               error = true;
+                       }
+               } else if (pSrcDir.compare(argv[i]) == 0) {
+                       if (i + 1 < argc) {
+                               srcDir = argv[++i];
+                       } else {
+                               error = true;
+                       }
+               } else {
+                       error = true;
+               }
+       }
+
+       if (error) {
+               std::cerr << "Error parsing command-line arguments" << std::endl;
+       }
+
+       if (error || lang.empty() || module.empty() || srcDir.empty() || outDir.empty()) {
+               std::cerr << "Usage: HelpIndexer -lang ISOLangCode -mod HelpModule -srcdir 
SourceDir -zipdir OutputDir" << std::endl;
+               return 1;
+       }
+
+       std::string captionDir(srcDir + "/caption");
+       std::string contentDir(srcDir + "/content");
+       std::string indexDir(outDir + "/" + module + ".idxl");
+       HelpIndexer indexer(lang, module, captionDir, contentDir, indexDir);
+       if (!indexer.indexDocuments()) {
+               std::cerr << indexer.getErrorMessage() << std::endl;
+               return 2;
+       }
+       return 0;
+}
diff --git a/l10ntools/source/help/helpindexer.cxx b/l10ntools/source/help/helpindexer.cxx
deleted file mode 100644
index c327119..0000000
--- a/l10ntools/source/help/helpindexer.cxx
+++ /dev/null
@@ -1,247 +0,0 @@
-#include <CLucene/StdHeader.h>
-#include <CLucene.h>
-#ifdef TODO
-#include <CLucene/analysis/LanguageBasedAnalyzer.h>
-#endif
-
-#include <unistd.h>
-#include <sys/stat.h>
-#include <dirent.h>
-#include <errno.h>
-#include <string.h>
-
-#include <string>
-#include <iostream>
-#include <algorithm>
-#include <set>
-
-// I assume that TCHAR is defined as wchar_t throughout
-
-using namespace lucene::document;
-
-class HelpIndexer {
-       private:
-               std::string d_lang;
-               std::string d_module;
-               std::string d_captionDir;
-               std::string d_contentDir;
-               std::string d_indexDir;
-               std::string d_error;
-               std::set<std::string> d_files;
-
-       public:
-
-       /**
-        * @param lang Help files language.
-        * @param module The module of the helpfiles.
-        * @param captionDir The directory to scan for caption files.
-        * @param contentDir The directory to scan for content files.
-        * @param indexDir The directory to write the index to.
-        */
-       HelpIndexer(std::string const &lang, std::string const &module,
-               std::string const &captionDir, std::string const &contentDir,
-               std::string const &indexDir);
-
-       /**
-        * Run the indexer.
-        * @return true if index successfully generated.
-        */
-       bool indexDocuments();
-
-       /**
-        * Get the error string (empty if no error occurred).
-        */
-       std::string const & getErrorMessage();
-
-       private:
-
-       /**
-        * Scan the caption & contents directories for help files.
-        */
-       bool scanForFiles();
-
-       /**
-        * Scan for files in the given directory.
-        */
-       bool scanForFiles(std::string const &path);
-
-       /**
-        * Fill the Document with information on the given help file.
-        */
-       bool helpDocument(std::string const & fileName, Document *doc);
-
-       /**
-        * Create a reader for the given file, and create an "empty" reader in case the file 
doesn't exist.
-        */
-       lucene::util::Reader *helpFileReader(std::string const & path);
-
-       std::wstring string2wstring(std::string const &source);
-};
-
-HelpIndexer::HelpIndexer(std::string const &lang, std::string const &module,
-       std::string const &captionDir, std::string const &contentDir, std::string const &indexDir) :
-d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), 
d_indexDir(indexDir), d_error(""), d_files() {}
-
-bool HelpIndexer::indexDocuments() {
-       if (!scanForFiles()) {
-               return false;
-       }
-
-#ifdef TODO
-       // Construct the analyzer appropriate for the given language
-       lucene::analysis::Analyzer *analyzer = (
-               d_lang.compare("ja") == 0 ?
-               (lucene::analysis::Analyzer*)new lucene::analysis::LanguageBasedAnalyzer(L"cjk") :
-               (lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
-#else
-       lucene::analysis::Analyzer *analyzer = (
-               (lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
-#endif
-
-       lucene::index::IndexWriter writer(d_indexDir.c_str(), analyzer, true);
-
-       // Index the identified help files
-       Document doc;
-       for (std::set<std::string>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
-               doc.clear();
-               if (!helpDocument(*i, &doc)) {
-                       delete analyzer;
-                       return false;
-               }
-               writer.addDocument(&doc);
-       }
-
-       // Optimize the index
-       writer.optimize();
-
-       delete analyzer;
-       return true;
-}
-
-std::string const & HelpIndexer::getErrorMessage() {
-       return d_error;
-}
-
-bool HelpIndexer::scanForFiles() {
-       if (!scanForFiles(d_contentDir)) {
-               return false;
-       }
-       if (!scanForFiles(d_captionDir)) {
-               return false;
-       }
-       return true;
-}
-
-bool HelpIndexer::scanForFiles(std::string const & path) {
-       DIR *dir = opendir(path.c_str());
-       if (dir == 0) {
-               d_error = "Error reading directory " + path + strerror(errno);
-               return true;
-       }
-
-       struct dirent *ent;
-       struct stat info;
-       while ((ent = readdir(dir)) != 0) {
-               if (stat((path + "/" + ent->d_name).c_str(), &info) == 0 && S_ISREG(info.st_mode)) {
-                       d_files.insert(ent->d_name);
-               }
-       }
-
-       closedir(dir);
-
-       return true;
-}
-
-bool HelpIndexer::helpDocument(std::string const & fileName, Document *doc) {
-       // Add the help path as an indexed, untokenized field.
-       std::wstring path(L"#HLP#" + string2wstring(d_module) + L"/" + string2wstring(fileName));
-       doc->add(*new Field(_T("path"), path.c_str(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
-
-       // Add the caption as a field.
-       std::string captionPath = d_captionDir + "/" + fileName;
-       doc->add(*new Field(_T("caption"), helpFileReader(captionPath), Field::STORE_NO | 
Field::INDEX_TOKENIZED));
-       // FIXME: does the Document take responsibility for the FileReader or should I free it 
somewhere?
-
-       // Add the content as a field.
-       std::string contentPath = d_contentDir + "/" + fileName;
-       doc->add(*new Field(_T("content"), helpFileReader(contentPath), Field::STORE_NO | 
Field::INDEX_TOKENIZED));
-       // FIXME: does the Document take responsibility for the FileReader or should I free it 
somewhere?
-
-       return true;
-}
-
-lucene::util::Reader *HelpIndexer::helpFileReader(std::string const & path) {
-       if (access(path.c_str(), R_OK) == 0) {
-               return new lucene::util::FileReader(path.c_str(), "UTF-8");
-       } else {
-               return new lucene::util::StringReader(L"");
-       }
-}
-
-std::wstring HelpIndexer::string2wstring(std::string const &source) {
-       std::wstring target(source.length(), L' ');
-       std::copy(source.begin(), source.end(), target.begin());
-       return target;
-}
-
-int main(int argc, char **argv) {
-       const std::string pLang("-lang");
-       const std::string pModule("-mod");
-       const std::string pOutDir("-zipdir");
-       const std::string pSrcDir("-srcdir");
-
-       std::string lang;
-       std::string module;
-       std::string srcDir;
-       std::string outDir;
-
-       bool error = false;
-       for (int i = 1; i < argc; ++i) {
-               if (pLang.compare(argv[i]) == 0) {
-                       if (i + 1 < argc) {
-                               lang = argv[++i];
-                       } else {
-                               error = true;
-                       }
-               } else if (pModule.compare(argv[i]) == 0) {
-                       if (i + 1 < argc) {
-                               module = argv[++i];
-                       } else {
-                               error = true;
-                       }
-               } else if (pOutDir.compare(argv[i]) == 0) {
-                       if (i + 1 < argc) {
-                               outDir = argv[++i];
-                       } else {
-                               error = true;
-                       }
-               } else if (pSrcDir.compare(argv[i]) == 0) {
-                       if (i + 1 < argc) {
-                               srcDir = argv[++i];
-                       } else {
-                               error = true;
-                       }
-               } else {
-                       error = true;
-               }
-       }
-
-       if (error) {
-               std::cerr << "Error parsing command-line arguments" << std::endl;
-       }
-
-       if (error || lang.empty() || module.empty() || srcDir.empty() || outDir.empty()) {
-               std::cerr << "Usage: HelpIndexer -lang ISOLangCode -mod HelpModule -srcdir 
SourceDir -zipdir OutputDir" << std::endl;
-               return 1;
-       }
-
-       std::string captionDir(srcDir + "/caption");
-       std::string contentDir(srcDir + "/content");
-       std::string indexDir(outDir + "/" + module + ".idxl");
-       HelpIndexer indexer(lang, module, captionDir, contentDir, indexDir);
-       if (!indexer.indexDocuments()) {
-               std::cerr << indexer.getErrorMessage() << std::endl;
-               return 2;
-       }
-       return 0;
-}
diff --git a/l10ntools/source/help/makefile.mk b/l10ntools/source/help/makefile.mk
index e22c6a3..1283535 100644
--- a/l10ntools/source/help/makefile.mk
+++ b/l10ntools/source/help/makefile.mk
@@ -60,7 +60,8 @@ SLOFILES=\
 EXCEPTIONSFILES=\
         $(OBJ)$/HelpLinker.obj \
         $(OBJ)$/HelpCompiler.obj \
-        $(OBJ)$/helpindexer.obj \
+        $(OBJ)$/HelpIndexer.obj \
+        $(OBJ)$/HelpIndexer_main.obj \
         $(SLO)$/HelpLinker.obj \
         $(SLO)$/HelpCompiler.obj
 
@@ -74,7 +75,7 @@ NOOPTFILES=\
         $(SLO)$/HelpLinker.obj
 .ENDIF
 
-PKGCONFIG_MODULES=libclucene-core
+PKGCONFIG_MODULES=libclucene-core libclucene-contribs-lib
 .INCLUDE : pkg_config.mk
 
 APP1TARGET= $(TARGET)
@@ -86,7 +87,8 @@ APP1STDLIBS+=$(SALLIB) $(BERKELEYLIB) $(XSLTLIB) $(EXPATASCII3RDLIB)
 
 APP2TARGET=HelpIndexer
 APP2OBJS=\
-      $(OBJ)$/helpindexer.obj
+      $(OBJ)$/HelpIndexer.obj \
+      $(OBJ)$/HelpIndexer_main.obj
 APP2RPATH = NONE
 APP2STDLIBS+=$(SALLIB) $(PKGCONFIG_LIBS)
 
-- 
1.7.0.4

From c44f78a37c2e4919b7c6fc01efa8a04a81b014be Mon Sep 17 00:00:00 2001
From: Gert van Valkenhoef <g.h.m.van.valkenhoef@rug.nl>
Date: Tue, 14 Feb 2012 21:56:08 +0100
Subject: [PATCH 3/3] HelpIndexer using rtl::OUString, called from xmlhelp

---
 l10ntools/source/help/HelpIndexer.cxx         |   59 ++++++++------
 l10ntools/source/help/HelpIndexer.hxx         |   32 ++++----
 l10ntools/source/help/HelpIndexer_main.cxx    |    9 ++-
 xmlhelp/source/cxxhelp/provider/databases.cxx |  102 +++++++++++--------------
 xmlhelp/source/cxxhelp/provider/makefile.mk   |    5 +
 5 files changed, 105 insertions(+), 102 deletions(-)

diff --git a/l10ntools/source/help/HelpIndexer.cxx b/l10ntools/source/help/HelpIndexer.cxx
index ed0ce39..f86d265 100644
--- a/l10ntools/source/help/HelpIndexer.cxx
+++ b/l10ntools/source/help/HelpIndexer.cxx
@@ -6,6 +6,8 @@
 #include <CLucene/analysis/LanguageBasedAnalyzer.h>
 #endif
 
+#include <rtl/string.hxx>
+
 #include <unistd.h>
 #include <sys/stat.h>
 #include <dirent.h>
@@ -16,9 +18,10 @@
 
 using namespace lucene::document;
 
-HelpIndexer::HelpIndexer(std::string const &lang, std::string const &module,
-       std::string const &captionDir, std::string const &contentDir, std::string const &indexDir) :
-d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), 
d_indexDir(indexDir), d_error(""), d_files() {}
+HelpIndexer::HelpIndexer(rtl::OUString const &lang, rtl::OUString const &module,
+       rtl::OUString const &captionDir, rtl::OUString const &contentDir, rtl::OUString const 
&indexDir) :
+d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), 
d_indexDir(indexDir),
+d_error(), d_files() {}
 
 bool HelpIndexer::indexDocuments() {
        if (!scanForFiles()) {
@@ -28,7 +31,7 @@ bool HelpIndexer::indexDocuments() {
 #ifdef TODO
        // Construct the analyzer appropriate for the given language
        lucene::analysis::Analyzer *analyzer = (
-               d_lang.compare("ja") == 0 ?
+               d_lang.compareToAscii("ja") == 0 ?
                (lucene::analysis::Analyzer*)new lucene::analysis::LanguageBasedAnalyzer(L"cjk") :
                (lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
 #else
@@ -36,11 +39,13 @@ bool HelpIndexer::indexDocuments() {
                (lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
 #endif
 
-       lucene::index::IndexWriter writer(d_indexDir.c_str(), analyzer, true);
+       rtl::OString indexDirStr;
+       d_indexDir.convertToString(&indexDirStr, RTL_TEXTENCODING_ASCII_US, 0);
+       lucene::index::IndexWriter writer(indexDirStr.getStr(), analyzer, true);
 
        // Index the identified help files
        Document doc;
-       for (std::set<std::string>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
+       for (std::set<rtl::OUString>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
                doc.clear();
                if (!helpDocument(*i, &doc)) {
                        delete analyzer;
@@ -56,7 +61,7 @@ bool HelpIndexer::indexDocuments() {
        return true;
 }
 
-std::string const & HelpIndexer::getErrorMessage() {
+rtl::OUString const & HelpIndexer::getErrorMessage() {
        return d_error;
 }
 
@@ -70,18 +75,23 @@ bool HelpIndexer::scanForFiles() {
        return true;
 }
 
-bool HelpIndexer::scanForFiles(std::string const & path) {
-       DIR *dir = opendir(path.c_str());
+bool HelpIndexer::scanForFiles(rtl::OUString const & path) {
+       rtl::OString pathStr;
+       path.convertToString(&pathStr, RTL_TEXTENCODING_ASCII_US, 0);
+       DIR *dir = opendir(pathStr.getStr());
        if (dir == 0) {
-               d_error = "Error reading directory " + path + strerror(errno);
+               d_error = rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("Error reading directory ")) + 
path +
+                        rtl::OUString::createFromAscii(strerror(errno));
                return true;
        }
 
        struct dirent *ent;
        struct stat info;
        while ((ent = readdir(dir)) != 0) {
-               if (stat((path + "/" + ent->d_name).c_str(), &info) == 0 && S_ISREG(info.st_mode)) {
-                       d_files.insert(ent->d_name);
+               rtl::OString entPath(pathStr);
+               entPath += rtl::OString(RTL_CONSTASCII_STRINGPARAM("/")) + 
rtl::OString(ent->d_name);
+               if (stat(entPath.getStr(), &info) == 0 && S_ISREG(info.st_mode)) {
+                       d_files.insert(rtl::OUString::createFromAscii(ent->d_name));
                }
        }
 
@@ -90,34 +100,31 @@ bool HelpIndexer::scanForFiles(std::string const & path) {
        return true;
 }
 
-bool HelpIndexer::helpDocument(std::string const & fileName, Document *doc) {
+bool HelpIndexer::helpDocument(rtl::OUString const & fileName, Document *doc) {
        // Add the help path as an indexed, untokenized field.
-       std::wstring path(L"#HLP#" + string2wstring(d_module) + L"/" + string2wstring(fileName));
-       doc->add(*new Field(_T("path"), path.c_str(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
+       rtl::OUString path = rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("#HLP#")) + d_module + 
rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("/")) + fileName;
+       // FIXME: the (TCHAR*) cast is a problem, because TCHAR does not match sal_Unicode
+       doc->add(*new Field(_T("path"), (TCHAR*)path.getStr(), Field::STORE_YES | 
Field::INDEX_UNTOKENIZED));
 
        // Add the caption as a field.
-       std::string captionPath = d_captionDir + "/" + fileName;
+       rtl::OUString captionPath = d_captionDir + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("/")) 
+ fileName;
        doc->add(*new Field(_T("caption"), helpFileReader(captionPath), Field::STORE_NO | 
Field::INDEX_TOKENIZED));
        // FIXME: does the Document take responsibility for the FileReader or should I free it 
somewhere?
 
        // Add the content as a field.
-       std::string contentPath = d_contentDir + "/" + fileName;
+       rtl::OUString contentPath = d_contentDir + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("/")) 
+ fileName;
        doc->add(*new Field(_T("content"), helpFileReader(contentPath), Field::STORE_NO | 
Field::INDEX_TOKENIZED));
        // FIXME: does the Document take responsibility for the FileReader or should I free it 
somewhere?
 
        return true;
 }
 
-lucene::util::Reader *HelpIndexer::helpFileReader(std::string const & path) {
-       if (access(path.c_str(), R_OK) == 0) {
-               return new lucene::util::FileReader(path.c_str(), "UTF-8");
+lucene::util::Reader *HelpIndexer::helpFileReader(rtl::OUString const & path) {
+       rtl::OString pathStr;
+       path.convertToString(&pathStr, RTL_TEXTENCODING_ASCII_US, 0);
+       if (access(pathStr.getStr(), R_OK) == 0) {
+               return new lucene::util::FileReader(pathStr.getStr(), "UTF-8");
        } else {
                return new lucene::util::StringReader(L"");
        }
 }
-
-std::wstring HelpIndexer::string2wstring(std::string const &source) {
-       std::wstring target(source.length(), L' ');
-       std::copy(source.begin(), source.end(), target.begin());
-       return target;
-}
diff --git a/l10ntools/source/help/HelpIndexer.hxx b/l10ntools/source/help/HelpIndexer.hxx
index 56122e7..833e5e7 100644
--- a/l10ntools/source/help/HelpIndexer.hxx
+++ b/l10ntools/source/help/HelpIndexer.hxx
@@ -4,20 +4,20 @@
 #include <CLucene/StdHeader.h>
 #include <CLucene.h>
 
-#include <string>
+#include <rtl/ustring.hxx>
 #include <set>
 
 // I assume that TCHAR is defined as wchar_t throughout
 
 class HelpIndexer {
        private:
-               std::string d_lang;
-               std::string d_module;
-               std::string d_captionDir;
-               std::string d_contentDir;
-               std::string d_indexDir;
-               std::string d_error;
-               std::set<std::string> d_files;
+               rtl::OUString d_lang;
+               rtl::OUString d_module;
+               rtl::OUString d_captionDir;
+               rtl::OUString d_contentDir;
+               rtl::OUString d_indexDir;
+               rtl::OUString d_error;
+               std::set<rtl::OUString> d_files;
 
        public:
 
@@ -28,9 +28,9 @@ class HelpIndexer {
         * @param contentDir The directory to scan for content files.
         * @param indexDir The directory to write the index to.
         */
-       HelpIndexer(std::string const &lang, std::string const &module,
-               std::string const &captionDir, std::string const &contentDir,
-               std::string const &indexDir);
+       HelpIndexer(rtl::OUString const &lang, rtl::OUString const &module,
+               rtl::OUString const &captionDir, rtl::OUString const &contentDir,
+               rtl::OUString const &indexDir);
 
        /**
         * Run the indexer.
@@ -41,7 +41,7 @@ class HelpIndexer {
        /**
         * Get the error string (empty if no error occurred).
         */
-       std::string const & getErrorMessage();
+       rtl::OUString const & getErrorMessage();
 
        private:
 
@@ -53,19 +53,17 @@ class HelpIndexer {
        /**
         * Scan for files in the given directory.
         */
-       bool scanForFiles(std::string const &path);
+       bool scanForFiles(rtl::OUString const &path);
 
        /**
         * Fill the Document with information on the given help file.
         */
-       bool helpDocument(std::string const & fileName, lucene::document::Document *doc);
+       bool helpDocument(rtl::OUString const & fileName, lucene::document::Document *doc);
 
        /**
         * Create a reader for the given file, and create an "empty" reader in case the file 
doesn't exist.
         */
-       lucene::util::Reader *helpFileReader(std::string const & path);
-
-       std::wstring string2wstring(std::string const &source);
+       lucene::util::Reader *helpFileReader(rtl::OUString const & path);
 };
 
 #endif
diff --git a/l10ntools/source/help/HelpIndexer_main.cxx b/l10ntools/source/help/HelpIndexer_main.cxx
index a1dd50b..3d69630 100644
--- a/l10ntools/source/help/HelpIndexer_main.cxx
+++ b/l10ntools/source/help/HelpIndexer_main.cxx
@@ -57,9 +57,14 @@ int main(int argc, char **argv) {
        std::string captionDir(srcDir + "/caption");
        std::string contentDir(srcDir + "/content");
        std::string indexDir(outDir + "/" + module + ".idxl");
-       HelpIndexer indexer(lang, module, captionDir, contentDir, indexDir);
+       HelpIndexer indexer(
+               rtl::OUString::createFromAscii(lang.c_str()),
+               rtl::OUString::createFromAscii(module.c_str()),
+               rtl::OUString::createFromAscii(captionDir.c_str()),
+               rtl::OUString::createFromAscii(contentDir.c_str()),
+               rtl::OUString::createFromAscii(indexDir.c_str()));
        if (!indexer.indexDocuments()) {
-               std::cerr << indexer.getErrorMessage() << std::endl;
+               std::wcerr << indexer.getErrorMessage().getStr() << std::endl;
                return 2;
        }
        return 0;
diff --git a/xmlhelp/source/cxxhelp/provider/databases.cxx 
b/xmlhelp/source/cxxhelp/provider/databases.cxx
index 4a4a756..14fe6b5 100644
--- a/xmlhelp/source/cxxhelp/provider/databases.cxx
+++ b/xmlhelp/source/cxxhelp/provider/databases.cxx
@@ -39,6 +39,12 @@
 #include <algorithm>
 #include <string.h>
 
+// EDIT FROM HERE
+
+#include <HelpIndexer.hxx>
+
+// EDIT ENDS HERE
+
 // Extensible help
 #include "com/sun/star/deployment/ExtensionManager.hpp"
 #include "com/sun/star/deployment/thePackageManagerFactory.hpp"
@@ -2113,78 +2119,60 @@ rtl::OUString IndexFolderIterator::implGetIndexFolderFromPackage( bool& 
o_rbTemp
             // TEST
             //bIsWriteAccess = false;
 
-            Reference< script::XInvocation > xInvocation;
-            Reference< XMultiComponentFactory >xSMgr( m_xContext->getServiceManager(), UNO_QUERY );
+// EDIT FROM HERE
             try
             {
-                xInvocation = Reference< script::XInvocation >(
-                    m_xContext->getServiceManager()->createInstanceWithContext( 
rtl::OUString(RTL_CONSTASCII_USTRINGPARAM(
-                    "com.sun.star.help.HelpIndexer" )), m_xContext ) , UNO_QUERY );
-
-                if( xInvocation.is() )
-                {
-                    Sequence<uno::Any> aParamsSeq( bIsWriteAccess ? 6 : 8 );
-
-                    aParamsSeq[0] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( 
"-lang" )) );
-
-                    rtl::OUString aLang;
-                    sal_Int32 nLastSlash = aLangURL.lastIndexOf( '/' );
-                    if( nLastSlash != -1 )
-                        aLang = aLangURL.copy( nLastSlash + 1 );
-                    else
-                        aLang = rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "en" ));
-                    aParamsSeq[1] = uno::makeAny( aLang );
+                rtl::OUString aLang;
+                sal_Int32 nLastSlash = aLangURL.lastIndexOf( '/' );
+                if( nLastSlash != -1 )
+                    aLang = aLangURL.copy( nLastSlash + 1 );
+                else
+                    aLang = rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "en" ));
 
-                    aParamsSeq[2] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( 
"-mod" )) );
-                    aParamsSeq[3] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( 
"help" )) );
+               rtl::OUString aMod(RTL_CONSTASCII_USTRINGPARAM("help"));
 
-                    rtl::OUString aZipDir = aLangURL;
-                    if( !bIsWriteAccess )
+                rtl::OUString aZipDir = aLangURL;
+                if( !bIsWriteAccess )
+                {
+                    rtl::OUString aTempFileURL;
+                    ::osl::FileBase::RC eErr = ::osl::File::createTempFile( 0, 0, &aTempFileURL );
+                    if( eErr == ::osl::FileBase::E_None )
                     {
-                        rtl::OUString aTempFileURL;
-                        ::osl::FileBase::RC eErr = ::osl::File::createTempFile( 0, 0, 
&aTempFileURL );
-                        if( eErr == ::osl::FileBase::E_None )
+                        rtl::OUString aTempDirURL = aTempFileURL;
+                        try
                         {
-                            rtl::OUString aTempDirURL = aTempFileURL;
-                            try
-                            {
-                                m_xSFA->kill( aTempDirURL );
-                            }
-                            catch (Exception &)
-                            {}
-                            m_xSFA->createFolder( aTempDirURL );
-
-                            aZipDir = aTempDirURL;
-                            o_rbTemporary = true;
+                            m_xSFA->kill( aTempDirURL );
                         }
+                        catch (Exception &)
+                        {}
+                        m_xSFA->createFolder( aTempDirURL );
+
+                        aZipDir = aTempDirURL;
+                        o_rbTemporary = true;
                     }
+                }
 
-                    aParamsSeq[4] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( 
"-zipdir" )) );
-                    rtl::OUString aSystemPath;
-                    osl::FileBase::getSystemPathFromFileURL( aZipDir, aSystemPath );
-                    aParamsSeq[5] = uno::makeAny( aSystemPath );
+                rtl::OUString aTargetDir;
+                osl::FileBase::getSystemPathFromFileURL( aZipDir, aTargetDir );
 
-                    if( !bIsWriteAccess )
-                    {
-                        aParamsSeq[6] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( 
"-srcdir" )) );
-                        rtl::OUString aSrcDirVal;
-                        osl::FileBase::getSystemPathFromFileURL( aLangURL, aSrcDirVal );
-                        aParamsSeq[7] = uno::makeAny( aSrcDirVal );
-                    }
+                rtl::OUString aSourceDir;
+                osl::FileBase::getSystemPathFromFileURL( aLangURL, aSourceDir );
 
-                    Sequence< sal_Int16 > aOutParamIndex;
-                    Sequence< uno::Any > aOutParam;
-                    uno::Any aRet = xInvocation->invoke( 
rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "createIndex" )),
-                        aParamsSeq, aOutParamIndex, aOutParam );
+               rtl::OUString aCaption(RTL_CONSTASCII_USTRINGPARAM("/caption"));
+               rtl::OUString aContent(RTL_CONSTASCII_USTRINGPARAM("/content"));
 
-                    if( bIsWriteAccess )
-                        aIndexFolder = implGetFileFromPackage( 
rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( ".idxl" )), xPackage );
-                    else
-                        aIndexFolder = aZipDir + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( 
"/help.idxl" ));
-                }
+               HelpIndexer aIndexer(aLang, aMod, aSourceDir + aCaption, aSourceDir + aContent, 
aTargetDir);
+
+                if( bIsWriteAccess )
+                    aIndexFolder = implGetFileFromPackage( 
rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( ".idxl" )), xPackage );
+                else
+                    aIndexFolder = aZipDir + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( 
"/help.idxl" ));
             }
             catch (Exception &)
             {}
+
+// EDIT UNTIL HERE
+
         }
     }
 
diff --git a/xmlhelp/source/cxxhelp/provider/makefile.mk 
b/xmlhelp/source/cxxhelp/provider/makefile.mk
index b709797..05f4ead 100644
--- a/xmlhelp/source/cxxhelp/provider/makefile.mk
+++ b/xmlhelp/source/cxxhelp/provider/makefile.mk
@@ -67,6 +67,11 @@ LIBXSLTINCDIR=external$/libxslt
 CFLAGS+= -I$(SOLARINCDIR)$/$(LIBXSLTINCDIR)
 .ENDIF
 
+CFLAGS+= -I$(SRC_ROOT)$/l10ntools$/source$/help
+
+PKGCONFIG_MODULES=libclucene-core libclucene-contribs-lib
+.INCLUDE : pkg_config.mk
+
 .IF "$(GUI)"=="WNT"
 .IF "$(COM)"=="MSC"
 CFLAGS+=-GR
-- 
1.7.0.4

From 2f2ab3b5fca95c5ae39939b361d291cfe0a6cbb4 Mon Sep 17 00:00:00 2001
From: Gert van Valkenhoef <g.h.m.van.valkenhoef@rug.nl>
Date: Tue, 14 Feb 2012 19:31:41 +0100
Subject: [PATCH] Use C++ HelpIndexer

---
 helpcontent2/settings.pmk    |   12 ------------
 helpcontent2/util/target.pmk |   21 +++------------------
 2 files changed, 3 insertions(+), 30 deletions(-)

diff --git a/helpcontent2/settings.pmk b/helpcontent2/settings.pmk
index 185438e..3716281 100755
--- a/helpcontent2/settings.pmk
+++ b/helpcontent2/settings.pmk
@@ -1,17 +1,5 @@
 .INCLUDE : $(LOCAL_COMMON_OUT)/inc$/aux_langs.mk
 .INCLUDE : $(LOCAL_COMMON_OUT)/inc$/help_exist.mk
 
-my_cp:=$(CLASSPATH)$(PATH_SEPERATOR)$(SOLARBINDIR)$/jaxp.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/juh.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/parser.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/xt.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/unoil.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/ridl.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/jurt.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/xmlsearch.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/LuceneHelpWrapper.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/HelpIndexerTool.jar$
-
-.IF "$(SYSTEM_LUCENE)" == "YES"
-my_cp!:=$(my_cp)$(PATH_SEPERATOR)$(LUCENE_CORE_JAR)$(PATH_SEPERATOR)$(LUCENE_ANALYZERS_JAR)
-.ELSE
-my_cp!:=$(my_cp)$(PATH_SEPERATOR)$(SOLARBINDIR)/lucene-core-2.3.jar$(PATH_SEPERATOR)$(SOLARBINDIR)/lucene-analyzers-2.3.jar
-.ENDIF
- 
-.IF "$(SYSTEM_DB)" != "YES"
-JAVA_LIBRARY_PATH= -Djava.library.path=$(SOLARSHAREDBIN)
-.ENDIF 
-
 aux_alllangiso_all:=$(foreach,i,$(alllangiso) $(foreach,j,$(aux_langdirs) $(eq,$i,$j  $i $(NULL))))
 aux_alllangiso:=$(foreach,i,$(aux_alllangiso_all) $(foreach,j,$(help_exist) $(eq,$i,$j  $i 
$(NULL))))
diff --git a/helpcontent2/util/target.pmk b/helpcontent2/util/target.pmk
index 40f6e5d..7dd7e5b 100755
--- a/helpcontent2/util/target.pmk
+++ b/helpcontent2/util/target.pmk
@@ -30,25 +30,10 @@ LINKALLADDEDDEPS=$(foreach,i,$(aux_alllangiso) $(subst,LANGUAGE,$i $(LINKADDEDDP
 
 ALLTAR : $(LINKALLTARGETS)
 
-.IF "$(SYSTEM_DB)" != "YES"
-JAVA_LIBRARY_PATH= -Djava.library.path=$(SOLARSHAREDBIN)
-.ENDIF
-
 XSL_DIR*:=$(SOLARBINDIR)
 
 $(LINKALLTARGETS) : $(foreach,i,$(LINKLINKFILES) $(COMMONMISC)$/$$(@:b:s/_/./:e:s/.//)/$i) 
$(subst,LANGUAGE,$$(@:b:s/_/./:e:s/.//) $(LINKADDEDDEPS)) $(COMMONMISC)$/xhp_changed.flag
     $(HELPLINKER) @$(mktmp -mod $(LINKNAME) -src $(COMMONMISC) -sty $(XSL_DIR)/embed.xsl -zipdir 
$(MISC)$/ziptmp$(@:b) -idxcaption $(XSL_DIR)/idxcaption.xsl -idxcontent $(XSL_DIR)/idxcontent.xsl 
-lang {$(subst,$(LINKNAME)_, $(@:b))} $(subst,LANGUAGE,{$(subst,$(LINKNAME)_, $(@:b))} 
$(LINKADDEDFILES)) $(foreach,i,$(LINKLINKFILES) $(COMMONMISC)$/{$(subst,$(LINKNAME)_, $(@:b))}/$i) 
-o $@.$(INPATH))
-.IF "$(SOLAR_JAVA)" == "TRUE"
-.IF "$(CHECK_LUCENCE_INDEXER_OUTPUT)" == ""
-    $(JAVAI) $(JAVAIFLAGS) $(JAVA_LIBRARY_PATH) -cp "$(my_cp)" com.sun.star.help.HelpIndexerTool 
-lang $(@:b:s/_/./:e:s/.//) -mod $(LINKNAME) -zipdir $(MISC)$/ziptmp$(@:b) -o $@.$(INPATH)
-.ELSE
-    $(JAVAI) $(JAVAIFLAGS) $(JAVA_LIBRARY_PATH) -cp "$(my_cp)" com.sun.star.help.HelpIndexerTool 
-lang $(@:b:s/_/./:e:s/.//) -mod $(LINKNAME) -zipdir $(MISC)$/ziptmp$(@:b) -o $@.$(INPATH) 
-checkcfsandsegname _0 _3
-.ENDIF
-   $(RENAME) $@.$(INPATH) $@
-.ELSE
-    -$(RM) $(MISC)$/ziptmp$(@:b)$/content/*.*
-    -$(RM) $(MISC)$/ziptmp$(@:b)$/caption/*.*
-    zip -j -D $@.$(INPATH) $(MISC)$/ziptmp$(@:b)$/*
-    $(RENAME) $@.$(INPATH) $@
-    -$(RM) $(MISC)$/ziptmp$(@:b)$/*.*
-.ENDIF
+    $(HELPINDEXER) -lang $(@:b:s/_/./:e:s/.//) -mod $(LINKNAME) -srcdir $(MISC)$/ziptmp$(@:b) 
-zipdir $(MISC)$/ziptmp$(@:b)
+    cd $(MISC)$/ziptmp$(@:b) && zip -rX --filesync zipfile.zip $(LINKNAME).*
+    $(RENAME) $(MISC)$/ziptmp$(@:b)$/zipfile.zip $@
-- 
1.7.0.4


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.