Topic: Lurker Tracker - Web Edition (Read 21927 times)

Think0028 · « **on:** June 10, 2011, 08:31:14 pm »

Hey everyone!

So I've been kind of distracted today in my Mafia games, I apologize to people I'm playing with. My only excuse is that I've been busy working on something, namely, this:

http://think0028.com/lurkertracker.html

Just put in the URL and the replacements and it automatically does everything else.

It's not very fancy right now, but it handles replacements, lets you mark players as moderators or dead and sorts them to the bottom, and shows the standard outputs: links to all posts, last text in red, and time since they last posted. It takes a while to load large threads the first time, but it caches pages so it'll be much faster the second time. Hope it's handy for everyone.

Now also supports sorting either alphabetically or by most recent post.

I've done some debugging, but I'd appreciate stress testing, especially of replacement.

To make a direct link to the Lurker Tracker for a specific game, just do

http://think0028.com/lurkertracker.py?url=yourthreadsurl

replacing yourthreadsurl with the url from any page of the thread.

Source:

Code: [Select]

#!/usr/bin/python

import urllib2
import cgi
import cgitb
import os
import cPickle
import sqlite3
import sys
import zipfile
from datetime import datetime, timedelta
cgitb.enable()

TIMEDIFF = -3

#global variable weeeeeeee
replyNumber = 0


database = 'lurkers.sql'

def createDatabase(verbose = None):
    global database
    conn = sqlite3.connect(database)
    c = conn.cursor()
        
    createPostTable = """
    CREATE TABLE IF NOT EXISTS 'posts' (
      number INTEGER,
      post TEXT,
      author TEXT,
      time TEXT,
      replyNumber INTEGER,
      url TEXT
      );
     """
    
    c.execute(createPostTable)
    
    conn.commit()
    c.close()

def burnQuotes(text):
    #remove quotes from post
    answer = ''
    quotedepth = 0
    i = 0
    while i < len(text):
        if i < len(text)-11:
            if text[i:i+11] == '<blockquote':
                quotedepth += 1
        if quotedepth > 0 and i < len(text)-13:
            if text[i:i+13] == '</blockquote>':
                quotedepth += -1
                i = text.index('>', i)+1
        if quotedepth == 0:
            answer += text[i]
        i += 1
    return answer.strip()

def findRed(text):
    #finds red text
    answer = ''
    reddepth = 0
    i = 0
    while i < len(text):
        marker = 'color: red'
        if i < len(text)-len(marker):
            if text[i:i+len(marker)] == marker:
                if reddepth == 0:
                    answer = ''
                reddepth += 1
                i = text.index('>', i)+1
        if reddepth > 0 and i < len(text)-7:
            if text[i:i+7] == '</span>':
                reddepth += -1
        if reddepth != 0:
            answer += text[i]
        i += 1
    return answer.strip()

def findBlue(text):
    #finds blue text
    answer = ''
    reddepth = 0
    i = 0
    while i < len(text):
        marker = 'color: blue'
        if i < len(text)-len(marker):
            if text[i:i+len(marker)] == marker:
                if reddepth == 0:
                    answer = ''
                reddepth += 1
                i = text.index('>', i)+1
        if reddepth > 0 and i < len(text)-7:
            if text[i:i+7] == '</span>':
                reddepth += -1
        if reddepth != 0:
            answer += text[i]
        i += 1
    return answer.strip()


def parseMessage(postTuple, replacedList, replaceList):
    text = postTuple[1]
    number = postTuple[0]
    author = postTuple[2]
    time = postTuple[3]
    rnumber = postTuple[4]
    #Supports someone replacing in for themselves down the line, not for
    #someone playing two seperate roles
    for i in xrange(len(replacedList)):
        if replacedList[i] == author:
            author = replaceList[i]
    text = burnQuotes(text)
    red = findRed(text)
    blue = findBlue(text)
    return [author, text, red, number, time, rnumber, blue]

def getPages(url, start=0):
    #Returns list of strings of every page url
    answer = []
    f = urllib2.urlopen(url+'.0')
    text = f.read()
    f.close()
    text = text[text.index('Pages')+12:]
    titletext = text[text.index('Topic: ')+7:]
    titletext = titletext[:titletext.index('</span>')]
    title = titletext[:titletext.rindex('&nbsp;')]
    try:
        text = text[:text.index('</div>')]
        replies = text[text.rindex('.')+1:text.rindex('"')]
        replies = int(replies)
    except:
        replies = 0 #if there's only one page, it comes here
    i = 0
    while i <= replies:
        if i >= 15*(start/15):
            answer += [url+'.'+str(i)]
        i += 15
    return (answer, title)

def getPosts(page, start=0):
    global replyNumber
    
    conn = sqlite3.connect(database)
    c = conn.cursor()
    #Returns list of post numbers and post text and post authors and time in a page given by a URL
    answer = []
    f = urllib2.urlopen(page)
    text = f.read()
    f.close()
    dateText = text
    marker = '<li id="time" class="smalltext floatright">'
    dateText = dateText[dateText.index(marker)+len(marker):]
    dateText = dateText[:dateText.index(',', dateText.index(',')+1)+1]
    while text.find("subject_") != -1:
        marker = 'View the profile of '
        text = text[text.index(marker)+len(marker):]
        author = text[:text.index('"')]
        marker = '<h5 id="subject_'
        text = text[text.index(marker)+len(marker):]
        try:
            number = int(text[:text.index('"')])
        except:
            continue
        text = text[text.index('on:</strong>')+12:]
        timeString = text[:text.index(" &#187;")]
        timeString = timeString.replace('<strong>Today</strong> at', dateText).strip()
        time = datetime.strptime(timeString[:-2]+timeString[-2:].upper(), '%B %d, %Y, %I:%M:%S %p')
        marker = '<div class="inner" id="msg_'+str(number)+'">'
        text = text[text.index(marker)+len(marker):]
        post = text[:text.index('<div class="moderatorbar"')]
        post = post[:text.rindex('</div>')]
        post = post[:text.rindex('</div>')].strip()
        if replyNumber > start:
            answer += [(number, post, author, time, replyNumber)]
        replyNumber += 1
    for j in answer:
        c.execute('insert into posts values (?,?,?,?,?,?)', (j[0],unicode(j[1], "utf-8", "ignore"),
                                                             unicode(j[2], "utf-8", "ignore"),
                                                             j[3],
                                                             j[4],
                                                             unicode(url, "utf-8", "ignore")))
    conn.commit()
    c.close()
    return answer

form = cgi.FieldStorage()
createDatabase()
url = form.getfirst('url')
if not url:
    print 'Content-type: text/html\n\n'
    print '<html><body>Please supply a Bay12 forum URL.</body></html>'
    sys.exit()
if 'bay12forums.com/smf/index.php?topic=' not in url:   
    print 'Content-type: text/html\n\n'
    print '<html><body>Sorry, this currently only supports Bay12 forum threads.</body></html>'
    sys.exit()
print 'Content-type: text/html\n\n'
postStart = 0
absolutePostCount = False
if '.msg' in url:
    postStart = url[url.rindex('.msg')+4:]
    if '#' in postStart:
        postStart = postStart[:postStart.rindex('#')]
    absolutePostCount = True
onlyAfterStart = form.getfirst('onlyAfterStart')
if onlyAfterStart:
    postStart = form.getfirst('postStart')
    if '.msg' in postStart:
        postStart = postStart[postStart.rindex('.msg')+4:]
        if '#' in postStart:
            postStart = postStart[:postStart.rindex('#')]
        absolutePostCount = True
    else:
        absolutePostCount = False
try:
    postStart = int(postStart)
except:
    postStart = 0
full = False
if form.getfirst('full') == 'on':
    full = True
replacedList = []
replaceList = []
moderator = form.getfirst('moderator')
if moderator:
    replacedList += [moderator]
    replaceList += ['Moderator']
replaces = form.getfirst('replace')
if replaces:
    replaces = int(replaces)
    for i in xrange(replaces):
        replaced = form.getfirst('replaced'+str(i))
        replace = form.getfirst('replace'+str(i))
        if replace and replaced:
            replacedList += [replaced.strip()]
            replaceList += [replace.strip()]
url = url[:url.rindex('.')]
redDict = {}
actionDict = {}
postDict = {}
timeDict = {}

posts = []
conn = sqlite3.connect(database)
c = conn.cursor()
c.execute('select number, post, author, time, replyNumber from posts where url=? order by number', (url,))
pageposts = [c.fetchall()]
start = -1
if pageposts != [[]]:
    start = pageposts[0][-1][4]
replyNumber = max(0,15*(start/15))
pages = getPages(url, start)
title = pages[1]
pages = pages[0]
pageposts2 =  map(lambda x: getPosts(x, start), pages)
for i in pageposts2:
    newpage = []
    for j in i:
        if j not in pageposts[0]:
            newpage += [j]
    pageposts += [newpage]
conn.commit()
c.close()
for i in pageposts:
    posts += map(lambda x: parseMessage(x, replacedList, replaceList), i)
for j in posts:
    if (not absolutePostCount and int(j[5]) < int(postStart)) or (absolutePostCount and int(j[3]) < int(postStart)):
        continue
    if j[0] not in postDict:
        postDict[j[0]] = []
    postDict[j[0]] += [(url+'.msg'+str(j[3])+'#msg'+str(j[3]), j[5])]
    if j[0] not in redDict:
        redDict[j[0]] = []
    if j[0] not in actionDict:
        actionDict[j[0]] = []
    if str(j[2]) != '':
        redDict[j[0]] += [(str(j[2]), j[3])]
        actionDict[j[0]] += [(str(j[2]), j[3], 'red', j[5])]
    if str(j[6]) != '':
        actionDict[j[0]] += [(str(j[6]), j[3], 'blue', j[5])]
    try:
        timeDict[j[0]] = unicode(j[4].strftime('%Y-%m-%d %H:%M:%S'), 'utf-8', 'ignore')
    except:
        timeDict[j[0]] = j[4]
print "<html><title>Lurker Tracker - " + title + "</title>"
print '<head><script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js"></script>'
print '''<script type="text/javascript">
function showVoteLog(user) {
    $('#voteLog'+user).toggle('slow')
    if ($('#showLog'+user).html() == 'Show Vote Log') {
        $('#showLog'+user).html('Hide Vote Log')
    } else {
        $('#showLog'+user).html('Show Vote Log')
    }
}
</script>'''
print '</head><body>'
print "Players: <br />"
copypasta = '[hr][center][b][i]LurkerTracker[/i][/b][/center][hr][font=courier][size=8pt]'
users = postDict.keys()
sort = form.getfirst('sort')
for i in xrange(len(users)):
    if not sort or sort == 'alpha':
        sortableUser = (users[i].lower(), users[i])
    elif sort == 'post':
        sortableUser = (timeDict[users[i]], users[i])
    else:
        sortableUser = (users[i].lower(), users[i])
    users[i] = sortableUser
users.sort()
if sort and sort == 'post':
    users.reverse()
for i in xrange(len(users)):
    users[i] = users[i][1]
counter = 0
for user in users:
    counter += 1
    if 'Dead:' not in user and 'Nonplayer' != user:
        print str(user) + ": <br />"
        print "<ul>"
        print "<li>Posts: "
        for i in xrange(len(postDict[user])):
            print "<a href='"+postDict[user][i][0]+"'>["+str(i)+"]</a>"
        print "</li>"
        if redDict[user] != [] and user != 'Moderator':
            print "<li>Last Vote: <a style='color: red' href='" + url+'.msg'+str(redDict[user][-1][1])+'#msg'+str(redDict[user][-1][1]) + "'>" + redDict[user][-1][0] + "</a></li>"
            print """<li><button id='showLog"""+str(counter)+"""' onclick="showVoteLog('"""+str(counter)+"""')">Show Vote Log</button></li>"""
        time = datetime.strptime(timeDict[user], u'%Y-%m-%d %H:%M:%S')
        diff = datetime.now() - time - timedelta(0,0,0,0,0,TIMEDIFF) 
        hours = ((diff.microseconds + (diff.seconds + diff.days * 24 * 3600) * 10**6) / 10**6) / 3600
        print "<li>Last posted: "
        if not full:
            print str(hours)
            if hours != 1:
                print " hours ago. "
            else:
                print " hour ago. "
        else:
            print str(diff)[:str(diff).rindex('.')] + ' ago. '
        print "</li>"
        print "</ul>"
        print "<div style='display: none' id='voteLog"+str(counter)+"'><ul>"
        for i in actionDict[user]:
            print "<li><a style='color: "+i[2]+"' href='" + url+'.msg'+str(i[1])+'#msg'+str(i[1]) + "'>Reply #"+str(i[3])+" - "+ str(i[0]) + "</a></li>"
        print "</ul></div>"
        copypasta += "[b]" +str(user) + "[/b]: "
        copypasta += "Last posted: "
        if not full:
            copypasta += str(hours)
            if hours != 1:
                copypasta += " hours ago. "
            else:
                copypasta += " hour ago. "
        else:
            copypasta += str(diff)[:str(diff).rindex('.')] + ' ago. '
        if redDict[user] != []:
            copypasta += "Last vote for [color=red][url=" + url+'.msg'+str(redDict[user][-1][1])+'#msg'+str(redDict[user][-1][1]) + "]"+redDict[user][-1][0] + "[/url][/color] "
        #copypasta += "Posts: "
        #for i in xrange(len(postDict[user])):
        #    copypasta += "[url=" +postDict[user][i]+"]["+str(i)+"][/url]"
        #copypasta += "\n"
for user in users:
    if 'Dead:' in  user and 'Nonplayer' not in user:
        print str(user) + ": <br />"
        print "<ul>"
        print "<li>Posts: "
        for i in xrange(len(postDict[user])):
            print "<a href='"+postDict[user][i][0]+"'>["+str(postDict[user][i][1])+"]</a>"
        print "</li>"
        print "</ul>"
        copypasta += "[b]" +str(user) + "[/b]: "
        #copypasta += "Posts: "
        #for i in xrange(len(postDict[user])):
        #    copypasta += "[url=" +postDict[user][i]+"]["+str(i)+"][/url]"
        #copypasta += "\n"
copypasta += "[/size][/font][hr]"
print "<br /> Copypasta for forums: <textarea cols='80' rows='5'>" + copypasta + '</textarea>'
print "<br /><a href='http://think0028.com/lurkertracker.py?"
formstring = ''
for i in form:
    if i != 'url' and i != 'postStart':
        formstring += str(i)+"="+str(form.getfirst(i))+"&"
ps = form.getfirst('postStart')
if not ps:
    ps = '0'
if '#' in ps:
    ps = ps[:ps.rindex('#')]
u = form.getfirst('url')
if '#' in u:
    u = u[:u.rindex('#')]
formstring = formstring + "postStart=" + ps + "&"
formstring = formstring + "url=" + u
print formstring
print "'>Permalink to this configuration</a>"
print '</body></html>'

Toaster · « **Reply #1 on:** June 10, 2011, 09:40:02 pm »

It bombs on Dethy.

Spoiler: Error (click to show/hide)

Bdthemag · « **Reply #2 on:** June 10, 2011, 09:42:53 pm »

Oh no! How will I procrastinate without being caught now!

Anyways this is pretty awesome and very useful.

webadict · « **Reply #3 on:** June 10, 2011, 09:43:00 pm »

For a second, I thought this was going to be about me.

Bdthemag · « **Reply #4 on:** June 10, 2011, 09:45:45 pm »

Anything created in the Mafia subforums will eventually be about Webadict.

Think0028 · « **Reply #5 on:** June 10, 2011, 09:48:43 pm »

Ah, the method to get the time currently does not work on locked threads. Let me fix that real quick.

Pandarsenic · « **Reply #6 on:** June 10, 2011, 10:30:17 pm »

Quote from: webadict on June 10, 2011, 09:43:00 pm

For a second, I thought this was going to be about me.

Srsly. I was like "But Wuba never lurks."

Think0028 · « **Reply #7 on:** June 11, 2011, 02:03:17 am »

Bug fixed! Lurker Tracker: Kill Webadict Now edition now works on locked threads as well, and now only accesses each page of a thread once. Further speed optimizations will need to somehow change how many posts I can get in one URL request.

Vector · « **Reply #8 on:** June 11, 2011, 09:24:02 am »

Quote from: Think0028 on June 11, 2011, 02:03:17 am

Bug fixed! Lurker Tracker: Kill Webadict Now edition now works on locked threads as well, and now only accesses each page of a thread once. Further speed optimizations will need to somehow change how many posts I can get in one URL request.

Can you employ a depaginator first?

webadict · « **Reply #9 on:** June 11, 2011, 10:33:06 am »

Bug? I used it just now on Third Party and I got this:

Quote

Powder Miner:

Posts:
[1] [2]

Last Vote:
Time since last posted -1 day, 23:58:16

-1 days seems... err... Yeah. Also, what is the second half supposed to mean? It sort of confuses me.

Also, having an easy copypasta method would be amazing.

JanusTwoface · « **Reply #10 on:** June 11, 2011, 10:44:27 am »

Quote from: Think0028 on June 11, 2011, 02:03:17 am

Bug fixed! Lurker Tracker: Kill Webadict Now edition now works on locked threads as well, and now only accesses each page of a thread once. Further speed optimizations will need to somehow change how many posts I can get in one URL request.

Does that include over multiple people accessing the tool? Because tools like this can really slam a server if they try to download all of a larger thread several times.

You could probably save the processed version of earlier pages somewhere (if you're not already), it's not like they'll change...

Think0028 · « **Reply #11 on:** June 11, 2011, 12:51:11 pm »

Quote from: webadict on June 11, 2011, 10:33:06 am

Bug? I used it just now on Third Party and I got this:
Quote
Powder Miner:

Posts:
[1] [2]

Last Vote:
Time since last posted -1 day, 23:58:16

-1 days seems... err... Yeah. Also, what is the second half supposed to mean? It sort of confuses me.

Also, having an easy copypasta method would be amazing.

That's... weird. That's really weird. That suggests that the post occurred 1 minute and 54 seconds ahead of server time... which shouldn't happen. Hrm. I'll do some tests.

Copypasta is much easier and I can do that next.

Currently I am not saving earlier versions of a page, as I was worried about people editing their posts, and then I stopped and hit myself in the head. That said, if I can grab the whole thread in a single page, not sure how much I'd need to do that.

Think0028 · « **Reply #12 on:** June 11, 2011, 01:48:10 pm »

1) It now caches pages and does copypasta.
2) The time thing was a timezone error, now fixed.
3) What do you mean by not understanding the second half, webadict?

webadict · « **Reply #13 on:** June 11, 2011, 02:50:34 pm »

Quote from: Think0028 on June 11, 2011, 01:48:10 pm

1) It now caches pages and does copypasta.
2) The time thing was a timezone error, now fixed.
3) What do you mean by not understanding the second half, webadict?

The way the second half reads in the time since last posted.

I guess it means how many hours, minutes, and seconds ago they last posted, but the display is unnecessary. Just follow the old style and say the last posted XX hours ago.

Think0028 · « **Reply #14 on:** June 11, 2011, 02:54:43 pm »

Ah, okay. I'll change that now.

Bay 12 Games Forum

News:

Author Topic: Lurker Tracker - Web Edition (Read 21927 times)

Think0028

Lurker Tracker - Web Edition

Toaster

Re: Lurker Tracker - Web Edition

Bdthemag

Re: Lurker Tracker - Web Edition

webadict

Re: Lurker Tracker - Web Edition

Bdthemag

Re: Lurker Tracker - Web Edition

Think0028

Re: Lurker Tracker - Web Edition

Pandarsenic

Re: Lurker Tracker - Web Edition

Think0028

Re: Lurker Tracker - Web Edition

Vector

Re: Lurker Tracker - Web Edition

webadict

Re: Lurker Tracker - Web Edition

JanusTwoface

Re: Lurker Tracker - Web Edition

Think0028

Re: Lurker Tracker - Web Edition

Think0028

Re: Lurker Tracker - Web Edition

webadict

Re: Lurker Tracker - Web Edition

Think0028

Re: Lurker Tracker - Web Edition