Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: DF Wiki offline dump  (Read 3631 times)

Raltay

  • Escaped Lunatic
    • View Profile
DF Wiki offline dump
« on: September 16, 2010, 02:40:45 pm »

Hello everybody,

I know that this topic has been brought up few times already, but nonetheless I would like to ask you, DF community, for help.

First I've tried to export wiki by using special "Export" page and using it with Wiki Taxi:

The first problem was that I couldn't find a way to export the whole wiki, only single pages or a category of pages. After a while, I discovered that "Current" category gives me all articles related to newest version. I dl'ed the xml file and imported it with Wiki Taxi:

a) At first it seemed to work fine... but only for single pages. WT search engine didn't seem to work at all, so basically getting wanted page was possible only by entering exact name of said page.
b) Also, there were no images (understandable), but also none of the hyperlinks worked. All of them were converted into "Template X" hyperlinks, which led to nothing. Downloading the xml with "Include templates" didn't change anything, because the WT importer always failed to import such file (SQLite Error 19).

I didn't/don't want to use tools like httrack, because it both takes REALLY LONG to dl the site (and I am not able to specify +/- rules that would stop it from dl articles about previous versions of DF - i*32a* and -*40d* didn't seem to do the trick) and it puts high pressure on server.

So, what to do now?

Any advice will be GREATLY appreciated.

Thank you in advance.
Logged

Überzwerg

  • Bay Watcher
    • View Profile
Re: DF Wiki offline dump
« Reply #1 on: September 17, 2010, 10:55:14 am »

With Special:AllPages you can get a list of all articles in a namespace: http://meta.wikimedia.org/wiki/Help:Export
Logged

Emi

  • Bay Watcher
    • View Profile
Re: DF Wiki offline dump
« Reply #2 on: September 21, 2010, 03:55:10 pm »

It'll be a little tricky. The XML dump of the full database is around a gb of data. I wouldn't be surprised if Locriani disabled that intentionally because of the huge amount of load it would add.
Logged