Bay 12 Games Forum

Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 ... 4 5 [6] 7 8

Author Topic: Task Force Needed: Do your part for Forum Preservation!  (Read 11040 times)

SolarShado

  • Bay Watcher
  • Psi-Blade => Your Back
    • View Profile
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #75 on: August 14, 2009, 10:26:26 pm »

Er... Here's an idea:
  • (If on *nix, skip to step 2) Install Cygwin
  • Make sure you've got "wget"
  • use wget to grab every page. It'll probably take some carefull use of the command options, but should be do-able
Logged
Avid (rabid?) Linux user. Preferred flavor: Arch

qwertyuiopas

  • Bay Watcher
  • Photoshop is for elves who cannot use MSPaint.
    • View Profile
    • uristqwerty.ca, my current (barren) site.
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #76 on: August 15, 2009, 07:20:17 am »

The bottleneck is CPU, though I guess that as a result is is slowed to a speed that wouldn't be too bad for the forum...

But you just open it as a web page, in your browser, and when it finishes, the page should contain a massive ammount of text. Copy/paste that into a new file. Open that new file as a web page.
Logged
Eh?
Eh!

qwertyuiopas

  • Bay Watcher
  • Photoshop is for elves who cannot use MSPaint.
    • View Profile
    • uristqwerty.ca, my current (barren) site.
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #77 on: August 15, 2009, 07:27:01 am »

Also, I just learned that firefox hides it's errors, so I can debug it now.

Edit:
It seems to get stuck on "Error: uncaught exception: [Exception... "Access to restricted URI denied"  code: "1012" nsresult: "0x805303f4 (NS_ERROR_DOM_BAD_URI)"  location: "file:///G:/Documents%20and%20Settings/Michael/Desktop/VNI/VNparse%20ff.html Line: 55"]"


Edit:
It has problems because "Security Error: Content at file:///G:/Documents%20and%20Settings/Michael/Desktop/VNI/VNparse%20ff.html may not load data from http://www.bay12games.com/forum/index.php?topic=39823.0."
« Last Edit: August 15, 2009, 07:33:39 am by qwertyuiopas »
Logged
Eh?
Eh!

Armok

  • Bay Watcher
  • God of Blood
    • View Profile
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #78 on: August 15, 2009, 02:35:38 pm »

Quick thrown togeter report, I dont have any more time today but I should say what litle I did find before going to sleep so you can work on it over the nightL:
I tried to run it, both on Explorer and Firefox, and it didn't work. Currently short on time so I cant do more extensive tests. None of them gave any error messages or used up any noticeable amount of CPU, on firefox the page simpy remained blank while on Explorer it was first blank, then after a while it listed the URL of the same topic several times as well as a buch of tags, but then nothing happened despite waiting.
Logged
So says Armok, God of blood.
Sszsszssoo...
Sszsszssaaayysss...
III...

qwertyuiopas

  • Bay Watcher
  • Photoshop is for elves who cannot use MSPaint.
    • View Profile
    • uristqwerty.ca, my current (barren) site.
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #79 on: August 15, 2009, 03:43:04 pm »

Okay, it is now a group effort.

Here is the final code.

Edit start and end to a range, probably ~10, and claim your range.

I'll do 0-10, someone should do 10-20 or so.

Post a .zip of the results of your range.

I have added parse tags to denote the post header, start, end, page start, page end, topic start, and topic end.

Here are the first 10, with start=0 and end=10. Next person should use start=10.

There are HTML comments denoting major points if anyone wants to make a reparser to convert it all to bbcode.

Edit: forgot the code. Duh.

Spoiler (click to show/hide)

Save it to a .html, edit start and end, and copy the result into a new file. It *should* work for most/all IE versions, because unlike firefox, they allow javascript to access external domains.
« Last Edit: August 15, 2009, 03:44:52 pm by qwertyuiopas »
Logged
Eh?
Eh!

SolarShado

  • Bay Watcher
  • Psi-Blade => Your Back
    • View Profile
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #80 on: August 15, 2009, 09:34:04 pm »

Er... Here's an idea:
  • (If on *nix, skip to step 2) Install Cygwin
  • Make sure you've got "wget"
  • use wget to grab every page. It'll probably take some carefull use of the command options, but should be do-able

wget has a bunch of options. If slowing down the forum is an issue, get someone with an always-on connection to do it and set wget to wait a while (30sec, an hour, whatever) between files.
Logged
Avid (rabid?) Linux user. Preferred flavor: Arch

qwertyuiopas

  • Bay Watcher
  • Photoshop is for elves who cannot use MSPaint.
    • View Profile
    • uristqwerty.ca, my current (barren) site.
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #81 on: August 15, 2009, 10:39:04 pm »

What *can* wget do?

It needs to load each page of VN, find just the first page topic links, then for each one, split each post's content HTML from the "junk", then if the topic is multipage, repeat for the rest of the pages.

My script does that, additionally leaving tags for a future program to parse the output back into bbcode by marking the start and end of each post/page/topic. It should be compatible with most/all modern browsers that don't go that extra inch of security to ensure that your hand-written and totally safe script is not allowed to access other domains because of a potential security threat without even an option to temporarily disable it. Seriously, firefox is moving to the mac corner of the windows-mac-linux triangle scale of user-friendly-options-but-not-secure-windows, no-options-mac(might just be theold models), and a-setting-for-everything-but-nothing-easy-to-get-linux. When javascript from bay12games.com is not allowed to access www.bay12games.com, and the only way to change it is to remove a security feature and recompile yourself, you KNOW they are going too far.

Logged
Eh?
Eh!

Armok

  • Bay Watcher
  • God of Blood
    • View Profile
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #82 on: August 15, 2009, 10:58:59 pm »

I just get
Spoiler (click to show/hide)
And then it does nothing. Using the most recent script and Explorer 6.0.
Also, what should you set as max if you want it to back up everything?
Logged
So says Armok, God of blood.
Sszsszssoo...
Sszsszssaaayysss...
III...

SolarShado

  • Bay Watcher
  • Psi-Blade => Your Back
    • View Profile
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #83 on: August 16, 2009, 02:48:05 am »

wget can do basicly the equivalent of going to a page, then file->save as. but it can do that recursively for a bunch of pages and it'll grab stylesheets/images/whatever else.

theoreticly (damn, i can't spell...), you could use it to download the entire public internet, i suppose...

I'm not sure what you'd need to do to make it work with the newly hidden VN, but I'm confident that it could be done.

While it's thorough-ness might be overkill, it could save you a lot of work. You might still want a script to read the HTML files, but you could do it much easier if the files were archived to your HD.
Logged
Avid (rabid?) Linux user. Preferred flavor: Arch

qwertyuiopas

  • Bay Watcher
  • Photoshop is for elves who cannot use MSPaint.
    • View Profile
    • uristqwerty.ca, my current (barren) site.
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #84 on: August 16, 2009, 01:28:06 pm »

Okay, how about this:
I give you a script that gets the first page of all VN topics, and you figure out how to wget them and then parse if they have a second page and if so continue.

Spoiler (click to show/hide)
Logged
Eh?
Eh!

Armok

  • Bay Watcher
  • God of Blood
    • View Profile
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #85 on: August 17, 2009, 03:04:21 am »

I'm confused. I want to help but I really don't understand enough about what's going on to do much else than follow direct orders/instructions.
Logged
So says Armok, God of blood.
Sszsszssoo...
Sszsszssaaayysss...
III...

Armok

  • Bay Watcher
  • God of Blood
    • View Profile
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #86 on: August 19, 2009, 01:17:00 pm »

Why is not more happening about this? I'm starting to get worried, we have a deadline you know.

I find it extremely frustrating to not know what I have to do in order to help.
Logged
So says Armok, God of blood.
Sszsszssoo...
Sszsszssaaayysss...
III...

SolarShado

  • Bay Watcher
  • Psi-Blade => Your Back
    • View Profile
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #87 on: August 19, 2009, 03:50:44 pm »

Well qwertyuiopas, I might give that a try, but I'm not one of the VN archivers, nor am I really interested in becoming one. Besides, I don't seem to have much free time lately...

wget has an option to recursively grab many pages in one go.
Armok, wget isn't too hard to figure out, nor is installing Cygwin. Esprcially if you're familiar with a command line. I can write up a step-by-step thing for you if you want.

That's assuming someone can figure out how to get wget through to the actual board... I'd assume it involves cookies...
Logged
Avid (rabid?) Linux user. Preferred flavor: Arch

qwertyuiopas

  • Bay Watcher
  • Photoshop is for elves who cannot use MSPaint.
    • View Profile
    • uristqwerty.ca, my current (barren) site.
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #88 on: August 19, 2009, 07:49:56 pm »

That is the one part where javascript is undeniably better.

If javascript had easy clientside file output(and if it does, please tell me), it would be perfect for it.

And I have found a potential flaw with my code(though if it supported file output, it would dump it all and a proper C or similar program can parse it all): Deleted accounts. Their posts would probably be skipped.
Logged
Eh?
Eh!

qwertyuiopas

  • Bay Watcher
  • Photoshop is for elves who cannot use MSPaint.
    • View Profile
    • uristqwerty.ca, my current (barren) site.
Re: Task Force Needed: Do your part for Forum Preservation!
« Reply #89 on: August 19, 2009, 08:13:15 pm »

‼idea‼

I had just shut down my computer when the solution hit me:

Javascript, though retaining login information required to access VN, cannot create meaningful output.

HOWEVER, it can send text to web pages.

AND I have a web server from long long ago when I was looking at PHP.

AND PHP can write to files!



So, I will have to do this myself, but I will write a javascript page that systematically downloads each page of VN, and posts it to a PHP page running on localhost, that then stores the newly recieved page to file, and then I .zip it all and upload!


HOWEVER!

If I distribute it, it would only be to one or two trusted people as it may contain login information.

I WILL validate the output manually, to ensure a total save.


And I may be unable to do it myself.



SO, I encourage any of you who has the required experience, setup, and access permission, to go do it yourself just in case.
Logged
Eh?
Eh!
Pages: 1 ... 4 5 [6] 7 8