Re: [libreoffice-website] downloading entire NA-CMS site [download errors found]

Christian Lohmaier <lohmaier+ooofuture -AT- googlemail.com>
Fri, 19 Aug 2011 22:29:44 +0200

Hi *,

On Wed, Aug 17, 2011 at 6:02 PM, webmaster for Kracked Press
Productions <webmaster@krackedpress.com> wrote:


I re-found HTTrack web site copier. [Windows and Ubuntu versions]

So I decided to download the entire CMS site version of the NA-DVD project.


No need to do it that way. The scripts on the server and the export of
the HTML pages already does this.

This way, I can make sure that I have all of the linked files and with all
the proper file/folder trees in tack.


Not sure what you mean with "in tack" - but that's what the framework does.

You create the export of the html, run the "gather everything that is
linked" script over those html pages (i.e. to see what files and
themes to also copy), and copies that to the html pages. burn it to
iso and you're done.

There is no need to use a webcrawler for this.

If you're just interested in the theme, just download the german iso,
it has all that, the css is the same after all, and you can play
around with css as much as you like.

Currently I am 2.5 hours in the download process, with 387 file and 845MB
downloaded.  [about 0.10 MB per second download speed]


Hmm. That's rather bad, unless your connection doesn't allow for more.
The servers have a GB connection, and the static files are served by
apache directly.

 Once the entire site
is downloaded [3 to 4GB] then I will not need to download any more large
sessions, just update sessions.


But again I'm really asking myself:
Why the heck are you just unwilling to give the established process a try?
It was especially created to *NOT* have to crawl the site, to *NOT*
shuffle tons of GB around for this.

Actually I have found some download errors with the openclipart folder
files.  The exact path is not shown, since they shorten the displayed path.
 Right now the "fall2010" and "halloween2010" archives have errored out.


Sorry, but that's a useless description.

"the exact path is not shown" - are you serious? what crappy crawler
is that then?

http://dvd.north-america.libreofficebox.org/assets/artwork/clipart/openclipart/openclipart-fall2010-full.zip
as linked from http://dvd.north-america.libreofficebox.org/en/artwork/
does exist, so does
http://dvd.north-america.libreofficebox.org/assets/artwork/clipart/openclipart/openclipart-halloween2010-full.tar.gz

So if you meant to point out a problem with the silverstripe-based
site, you have to be more specific.

ciao
Christain

-- 
Unsubscribe instructions: E-mail to website+help@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/website/
All messages sent to this list will be publicly archived and cannot be deleted

Context

[libreoffice-website] downloading entire NA-CMS site [download errors found] · webmaster for Kracked Press Productions
- Re: [libreoffice-website] downloading entire NA-CMS site [download errors found] · Christian Lohmaier
  - Re: [libreoffice-website] downloading entire NA-CMS site [download errors found] · webmaster for Kracked Press Productions

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.