Page 2 of 2

Posted: Tue Aug 19, 2008 8:37 pm
by dch24
The thing is, it turns out wget isn't the right tool for the job. So I've got to write it up in perl when I can find a spare minute.

Then running it every two weeks would be completely automatic.

Posted: Sat Dec 06, 2008 7:24 pm
by dch24
OK, I've modified the swish-e spider to crawl talk-polywell.org

I posted the backup of the forums here:
http://polywell.nfshost.com/2008_12_06_few_links.tar.bz2 (6,047,321 bytes)

The script I use is here:
http://polywell.nfshost.com/2008_12_06_phpbb_spider.txt

Joe, you'll see lots of accesses by the user agent 'swish-e', that was me developing the spider. I've now set the user agent to 'polywell-backup-bot-0.1'

It still has lots of trouble loading outside links, but it did get a few. I think some of them might be spam. I'll try to track those down and filter them out.

If anything here, at talk-polywell.org didn't get downloaded, please let me know.

Unlike previous backups, this file is only 5.8 Meg! It's because I improved the spidering a lot. So I can leave it posted for a while and not run out of money. :-)