--- Begin Message ---
Hi Sam, *,
please forward this also to the apache-list where I'm not subscribed
(I suggest only Sam does, in order to prevent 50 people forwarding the
very same mail :-D)
On Sun, Jun 5, 2011 at 1:32 AM, Sam Ruby <rubys@apache.org> wrote:
On Sat, Jun 4, 2011 at 7:03 PM, Christian Lohmaier
<lohmaier+ooofuture@googlemail.com> wrote:
As far as I know, there is only the "intent" of Oracle to
donate it unter the Apache License, but no clear statement has been
made as to what exact sourcecode this will cover.
The ASF has a signed software grant with a specific list of source files.
It's not even clear whether it will be the current codebase or some
older version IBM is basing their version on.
It is the codebase on openoffice.org. The intent is to move the full
version history. The mechanics of this have yet to be worked out.
As on the apache list, a link to that "list of source files" has been
provided, and there have been claims that this list is covering the
whole source, I had a deeper look myself.
1st of all: It doesn't any history-data/mercurial database files, so
how this point is covered is not clear to me at all, but on to my
analysis of the Oracle provided filelist that was made available here:
http://people.apache.org/~rubys/openoffice.files.txt
1st observation: Some filepaths are split. The lines are split
at various line-length, and not at "word limits" like the dot for the
filename extension or the slash that delimits directorys, but in
middle of the string, see http://libreoffice.pastebin.ca/2075460 for a
patch to fix those
2nd observation: The file is not sorted alphabetically (at least
differs from sort output/what comm tool that is later used expects, so
sort it:
sort openoffice.files.txt > sorted_ooo.lst
In order to do the comparison, clone the current repo
hg clone http://hg.services.openoffice.org/DEV300/
and create a filelist, excluding the repository's data
find DEV300/ -type f -not -path 'DEV300/.hg/*' | cut -c 8- | sort > repo.lst
raw numbers:
wc -l repo.lst sorted_ooo.lst
69076 repo.lst
39616 sorted_ooo.lst
So even calling this "seems to include the full repo" and that even
twice is either with malicious intent, or with no clue. Christian
Lippka really should know better, but had stated this at least twice.
Close to 30000 files gone, who cares "source seems complete"..
Now to interesting numbers:
Files in the Oracle's list, but not in the repo-list (= files most
likely moved by refactoring the code (gbuildification of modules and
similar) = indication of when the snapshot was taken):
comm -1 -3 repo.lst sorted_ooo.lst |wc -l
$ 455
digging in hg's history shows that the snapshot of the sources must
have been taken before 2011-03-21 - as those files were [re]moved in the
following cws:
276288 2011-03-21 CWS-TOOLING: integrate CWS dr78
276552 2011-03-29 CWS-TOOLING: integrate CWS ka102
276583 2011-03-29 CWS-TOOLING: integrate CWS vcl2gnumake
276711 2011-04-01 CWS-TOOLING: integrate CWS solaris11
276673 2011-04-01 CWS-TOOLING: integrate CWS calcvba
276692 2011-04-01 CWS-TOOLING: integrate CWS mav60
So while one can clearly say that those are not part of the sources,
and hence the code is at most in the state of m103 (but of course that
doesn't exclude that the codebase can be older than that) The changes of at
least 27 CWS (+3 masterfix ones) that have been integrated into OOo
code in the meantime are definitely missing.
Files in repo, but not in Oracle's list:
$ comm -2 -3 repo.lst sorted_ooo.lst |wc -l
29915
sdf files = translation files: Those are not included in either repos,
the sdf files that are in the repo are for testcases/gsicheck, the translations
have been split to a seperate repository
http://hg.services.openoffice.org/master_l10n/DEV300/
So those don't even account to the difference!
$ grep -c sdf$ repo.lst sorted_ooo.lst
repo.lst:10
sorted_ooo.lst:0
Image files = binary files
egrep -c '(bmp|png|gif|jpe?g)$' repo.lst sorted_ooo.lst
repo.lst:12352
sorted_ooo.lst:0
So this is one big chunk, all toolbar icons for the different themes,
cursors, artwork for the installers, etc.
But what are the remaining 17563 files? shell-fu will give a hint:
$ comm -2 -3 repo.lst sorted_ooo.lst | egrep -v
'(bmp|png|gif|jpe?g)$' | sed -n -e 's/.*\.\([^./]*\)$/\1/p' | sort |
uniq -c | sort -rn | head
1716 ott
1329 xml
1140 xlb
813 xcu
749 cfg
710 csv
588 txt
555 h
472 css
459 java
OK, the user will not get any templates either, too bad, but the next
ones are interesting. No configuration schemes, no configuration data
either.
Let's have a closer look:
$ comm -2 -3 repo.lst sorted_ooo.lst | grep xcu$ | awk -F/ '{print
$1}' |sort |uniq -c
32 dictionaries
4 extensions
716 filter
3 lingucomponent
2 mysqlc
21 odk
16 officecfg
1 pyuno
3 scripting
7 sdext
5 sfx2
3 testautomation
Want to load documents? Too bad, Apache won't know about the filters.
Want to save? Hah, that 's a good one, apache-OOo doesn't know about
export filters either.
Spellchecking? ha, dream on… (but that is understandable, as
dictionaries are mostly third-party stuff, so that one is excused)
Let alone the other binary files (various OOo documents, also some
MS-Office documents, the palettes, icon/wav (for gallery) the
interesting ones include:
Tons of xml
comm -2 -3 repo.lst sorted_ooo.lst | grep xml$ | awk -F/ '{print $1}'
|sort |uniq -c |sort -nr | head
235 sw
201 i18npool
154 sc
129 sd
112 testautomation
64 dictionaries
51 toolkit
45 desktop
34 scripting
29 svx
Didn't look into that closer, but
$ comm -2 -3 repo.lst sorted_ooo.lst | grep xml$ | grep toolbar |wc -l
392
So want to use toolbar buttons? Too bad, the corresponding definitions
are not included, you won't get any/most toolbars. Good luck starting
from scratch defining your own.
But let alone those boring "non-code" stuff.
134 patches missing (for the external modules) (Ok, that's arguable,
as the external modules won't be part of apache-OOo in the long run
anyway)
You want to actually build this thing? Well, too bad - the build.lst
files that define the inter-module & directory dependencies, and the
d.lst files that list the module' files to be exported for use by
other modules are not included either:
$ grep -c d.lst repo.lst sorted_ooo.lst
repo.lst:425
sorted_ooo.lst:0
similar: 302 *.mk files that are only in the repo, amongst them the
solenv//inc/_tg_*.mk ones, the templates that define the very basic
target rules used throughout the build (and that are expanded by
mkunroll to produce the makefiles that are then included by the actual
build)
So with this snapshot, Apache-OOo is far from being able to deliver
something that is even close to OOo.as it is now. It is missing all
translations, all artwork, build-dependency definitions that are
absolutely needed for doing a build, no toolbar-definitions, no
filter-configurations.
Apart from the systematic omission of images, random source-files are
missing as well, probably because they don't carry the default copyright
header, for example binfilter/inc/bf_svx/svxslots.hxx
So calling this list "complete" or stating something along the lines
of "looks like a straight dump from hg" is a joke.
So Oracle definitely needs to revise that list, and include at least
the translations, the artwork, the configuration data/xml-files, the
randomly omitted files, etc. And while they're on it, they could base
their list on the current m106 milestone.
ciao
Christian
--
Unsubscribe instructions: E-mail to discuss+help@documentfoundation.org
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.documentfoundation.org/www/discuss/
All messages sent to this list will be publicly archived and cannot be deleted
--- End Message ---