Topic: Bay12_SS: The Shitpost Simulator. (Read 21522 times)

misko27 · « **Reply #75 on:** January 30, 2016, 09:33:40 pm »

I've got 9600, can I get one? I may not be too prolific nowadays, but I was on FG&RP enough to vastly inflate my numbers.

My Name is Immaterial · « **Reply #76 on:** January 30, 2016, 09:37:58 pm »

Quote from: chaotic skies on January 30, 2016, 09:32:21 pm

Is it really? Alright, so that textfile if being called "The One Percent", and the rest of the forum is going in a file called "The Other 99 Percent." And, of course, there's still going to be one that's the entirity of the forum.

Yep. There are 454 people with over 3k posts, out of 46315 total members, so it's actually .98024%, to be precise.
If you raise that minimum to 4k, only 345 meet that requirement, or about three-fourths of a percent.

Hmm. Maybe after midterms, graphing individual post counts will be my next data project.

Detros · « **Reply #77 on:** January 30, 2016, 09:38:52 pm »

I put DF Wikipedia page into that online version (http://www.yisongyue.com/shaney/) and got things like (in italics):

Adams has two favourite bugs: The other involves a dwarven executioner, with broken arms unable to use up his $15,000 savings.
Everything comes from Armok: Rivers are created by tracing their paths from the original Armok to Dwarf Fortress
New feature - smeltable traps: Players can use traps and engineering in addition to training an army.Traps can be smelted to produce their corresponding metal bars.
We need to get a job to help Toady with work on Armok: Adams said that the text-based graphics forces players to get a job in order to finish his previous work, Armok.
Slaves to Armok: God of Blood Chapter II: Dwarf Fortress: It was named after a long gap.
How to make steel: For steel production, flux stones are used to make mushroom wine.
On the game's community, Tarn Adams said: I'm lucky to be very powerful.

Aklyon · « **Reply #78 on:** January 30, 2016, 09:44:51 pm »

That could be an interesting project, Immaterial.

chaotic skies · « **Reply #79 on:** January 30, 2016, 09:55:10 pm »

Quick question: Where do I put the py file?

EDIT: I don't need hard drive space. I'll by an external 1TB hard drive for every file if I have to.

Sensei · « **Reply #80 on:** January 30, 2016, 11:41:14 pm »

Interesting, this. I've got over 9000 posts, so I think I qualify.

I'd be curious to see one of just popular forum game OPs. We could run a bunch of them through and then play the first one that makes enough sense to be possible.

Amperzand · « **Reply #81 on:** January 30, 2016, 11:43:29 pm »

I approve of this.

O.Wilde · « **Reply #82 on:** January 30, 2016, 11:49:41 pm »

So for all of you wonderful people out there wanting a Loud Whisper Official Posts Authentically Datamined Textfile of Words, here ya go. It took friggin forever even with the scraper (It's 15 fucking megabytes holy shit), but here's 32,000 posts worth of distilled LW. 200 proof.

The weirdest textfile you ever did see. (It's on tinyupload cause it's too big for pastebin)

Side Note: I am actually ridiculously proud of the scraper, having never programmed a working thing in my life beyond a simple calculator. It's a work of art. Even if it is likely horrible optimization wise.

If you want to use the scraper, I'll post the code below. It requires the following:
requests (Can be downloaded through pip)
BeautifulSoup 4 (Also pip)
lxml (Ditto)

I take no responcibility for anything this does to your computer, blah blah blah, use at your own risk, try to not crash any websites. It's commented, but only as far as I understand it. It's likely that some comments are entirely wrong and show a fundamental misunderstanding of everything.

Code: [Select]

############ O.Wilde's Bay 12 Scraper ############
################## V.1 1/30/16 ##################

import requests
import bs4
import lxml
import re
from html.parser import HTMLParser

###########################################################################

def findprofile(profilepage):
    print ('Finding pages to scrape...')
    user = re.sub('http://www.bay12forums.com/smf/index.php?action=profile;u=', '', profilepage) #Removes everything but the user ID number from the  profile link
    url = 'http://www.bay12forums.com/smf/index.php?action=profile;area=showposts;sa=messages;u=' + user #Adds the User ID number obtained in the last step to navigate to their messages page.
    return (url)

def scrapeposts(url):
    page = requests.get(url)
    soup = bs4.BeautifulSoup(page.text, "lxml") #Takes the text of the HTML contained on the URL messages page and makes is usable for our purposes
    data = [a.attrs.get('href') for a in soup.select('div.pagesection a.navPages')] #Selects the data contained in our HTML [a class="navpages"...../div] tags, which is the entirety of the posts listed.
    ppg = re.sub('http(.+?)start=', '', data[0]) #Finds the number of posts per page
    tp = re.sub('http(.+?)start=', '', data[len(data)-1]) #Finds the postnumber that the final page of posts starts on
    pagenumber = (int(tp) / int(ppg)) + 1 #Finds the total number of pages of posts
    counter = 0
    page = 0
    scrapedata = ''
    while counter < pagenumber: #While the page we are working on is less than the total number of pages
        counter = counter + 1 #Add 1 to the counter
        print ('Now scraping page ' + str(counter) + ' out of ' + str(pagenumber) + '!')
        scrapeurl = url + 'start=' + str(((counter - 1) * 15)) #Goes to the url of the page of posts
        scrapepage = requests.get(scrapeurl) 
        scrapesoup = bs4.BeautifulSoup(scrapepage.text, "lxml") #Takes the text of the HTML contained on the URL messages page and makes is usable for our purposes
        scrapetext = soup.select('div.list_posts') #Selects the data contained in our HTML that we want to mine. Specifically, the posts on the page
        scrapedata = scrapedata + ' ' + str(scrapetext) #Adds our freshly scraped data to the string of scraped data mined so far.
    print ('Done!')
    return(scrapedata)

def cleardata(data):
    print('Removing HTML...')
    cleareddata = re.sub('<[^<]+?>', ' ', str(data)) #Removes all strings contained within <...>, this is to remove HTML tags. Replaces with a space.
    print('Removing Quote Tags...')
    cleareddata1 = re.sub("Quote(.+)pm", ' ', cleareddata) #Removes quote tags ending in PM and replaces with a space
    cleareddata2 = re.sub("Quote(.+)am", ' ', cleareddata1) #Same as above, but for am. (I could do this in one step, but I don't know how.)
    print('Removing Tabs...')
    cleareddata3 = re.sub('[\s+]', ' ', cleareddata2) #Removes any extra tabs, replaces with spaces
    print('Removing Non-ASCII Data...')
    cleareddata4 = re.sub(r'[^\x00-\x7F]',' ', cleareddata3) #Removes any non-ascii characters so textfile can be created.
    print('Removing Spaces...')
    cleareddata5 = re.sub(r'\s+', ' ', cleareddata4) #Removes the clutter of spaces created from the previous few steps, replacing them with a single space each.
    return (cleareddata5)

def writefile(data, name):
    print('Writing File...')
    file = open(name + '.txt', 'w') #Creates a new file in which to save our data
    file.write(data) #Writes our data to the file
    file.close #Closes the file
    input('Posts have been scraped, and file created. Thank you for using the Bay 12 Scraper by O.Wilde!')

###########################################################################
    
url = findprofile(input("Please input the profile of the member who's posts you would like to scrape: ")) #Asks for a profile link to scrape, and calls findprofile using that link. Sets url equal to the returned value
messydata = scrapeposts(url)
cleandata = cleardata(messydata)
writefile(cleandata, input('Please input the name of the text file you want to be generated. WARNING: Any file with the same name will be overwritten!!!: '))

chaotic skies · « **Reply #83 on:** January 31, 2016, 12:00:46 am »

AND SO THE APOCALYPSE WAS BROUGHT ABOUT BY A BUNCH OF BORED BAY 12 PROGRAMMERS.

This is going to end badly, once I figure out where to put the py file C:

mainiac · « **Reply #84 on:** January 31, 2016, 12:02:27 am »

So can you make the bots talk to each other?

O.Wilde · « **Reply #85 on:** January 31, 2016, 12:03:57 am »

Oh, sorry! You put the markov file in the Lib folder contained in the python installation folder.

For me, it's <USER>\AppData\Programs\Python\Python35-32\Lib\shitpost.py

chaotic skies · « **Reply #86 on:** January 31, 2016, 12:10:50 am »

Thanks for the info! I probably should have figured that one out...

Quote from: mainiac on January 31, 2016, 12:02:27 am

So can you make the bots talk to each other?

With some work, you could probably make the Scraper open the SS after scraping the posts and feed it the file, but that removes the fun of having several thousand posts from hundreds of people mixed.

O.Wilde · « **Reply #87 on:** January 31, 2016, 12:16:50 am »

Just a note: If you're using python 3.5, you may need to edit the Markov file so that 'xrange' is replaced by 'range'.

chaotic skies · « **Reply #88 on:** January 31, 2016, 12:18:07 am »

This is why I have notepad++. "Find xrange, replace with range." So useful for small updates like this

Aklyon · « **Reply #89 on:** January 31, 2016, 12:19:42 am »

Quote from: chaotic skies on January 30, 2016, 09:18:49 pm

Anything over about 3 or 4 thousand. If there are other people below that, which I find particularly funny, I'll add those too. Basically, yes. You qualify.

I'd mostly been asking since I expected I'd be fairly below the post counts of the rps. But apparently not, I'm actually still on the first page apparently.

Bay 12 Games Forum

News:

Author Topic: Bay12_SS: The Shitpost Simulator. (Read 21522 times)

misko27

Re: Bay12_SS: The Shitpost Simulator.

My Name is Immaterial

Re: Bay12_SS: The Shitpost Simulator.

Detros

Re: Bay12_SS: The Shitpost Simulator.

Aklyon

Re: Bay12_SS: The Shitpost Simulator.

chaotic skies

Re: Bay12_SS: The Shitpost Simulator.

Sensei

Re: Bay12_SS: The Shitpost Simulator.

Amperzand

Re: Bay12_SS: The Shitpost Simulator.

O.Wilde

Re: Bay12_SS: The Shitpost Simulator.

chaotic skies

Re: Bay12_SS: The Shitpost Simulator.

mainiac

Re: Bay12_SS: The Shitpost Simulator.

O.Wilde

Re: Bay12_SS: The Shitpost Simulator.

chaotic skies

Re: Bay12_SS: The Shitpost Simulator.

O.Wilde

Re: Bay12_SS: The Shitpost Simulator.

chaotic skies

Re: Bay12_SS: The Shitpost Simulator.

Aklyon

Re: Bay12_SS: The Shitpost Simulator.