And now, a brief word from our sponsors...
Tired of lurkers lurking like lurkscum? Annoyed that your little brain can't keep track of people's whereabouts?
Fingers cramped from too much clicking looking for that elusive scummy post?
Never fear, The LurkerTracker® is here!
As I've received a couple of requests, I open this topic to post the source of the LurkerTracker, and process. Hopefully people will find it useful and even grow it into something better.
I - What is it?The LurkerTracker is a perl script that will take a file with one or more saved pages of forum posts, and will create a summary of who posted what when, with URLs, and how long since their last post. If you haven't seen it, as example here's one for the currently running VM4, so you know what it looks like:
LurkerTracker
GlyphGryph has posted 16 times; the last on: 2010/11/13 14:00, 52 hours ago. Posts: [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]
IronyOwl has posted 17 times; the last on: 2010/11/13 18:26, 47 hours ago. Posts: [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17]
JanusTwoface has posted 21 times; the last on: 2010/11/13 17:05, 48 hours ago. Posts: [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]
Ottofar has posted 3 times; the last on: 2010/11/13 03:01, 63 hours ago. Posts: [1][2][3]
Pandarsenic has posted 20 times; the last on: 2010/11/13 02:20, 63 hours ago. Posts: [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]
Zathras has posted 2 times; the last on: 2010/11/13 01:50, 64 hours ago. Posts: [1][2]
webadict has posted 15 times; the last on: 2010/11/15 09:04, 8 hours ago. Posts: [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
*Updated* II - I can haz surz plz?Sure. There are two parts: a small preprocessing script that works around my feeble regexp-fu by making post URLs easier to find, and the lurkertracker proper. They both read from standard input and write to standard output, so the typical use is to pipe them together like:
$ ./posturlfix.pl < saved_pages | ./lurkertracker.pl > summary.txt
Then just copy the contents of summary.txt and paste it into your post.
#!/usr/bin/perl
# Preprocessor for lurkertracker;
# move the post-url closer to the h5 header so it's easier to find.
#
# transforms these two lines:
# <h5 id="subject_1702622">
# <a href="http://www.bay12forums.com/[...]"
# into this one line:
# <h5 id="subject_1702622" url=http://www.bay12forums.com/[...]>
#
# see lurklertracker.pl for more details.
undef $/; # ignore newlines
while (<>) { # get whole file
s/(<h5 id=".*?")>.*?<a href="(.*?)".*?>/$1 url=$2>/gs;
print; # print file to STDOUT
}
#!/usr/bin/perl -w
#
# The LurkerTracker
#
# This script will read from STDIN a saved page (or concatenation of pages)
# of forum posts from Bay12, and produce a summary of who has posted what when,
# along with length of time since the last.
# Due to my feeble regexp-fu, a preprocessing step is needed: posturlfix.pl
#
# Typical usage:
# $ posturlfix.pl < saved_forum_pages | lurkertracker.pl > summary.txt
#
# Several pages can be concatenated together; the script does not care about
# mangled HTML or gaps.
# Beware of posts with "Today" in date, as those will be treated as localtime's
# current date. You should re-save those pages for later summaries when "Today"
# is replaced by the actual date. You should also adjust $TZOFF to match your
# local timezone offset to forum time in seconds.
#
# I hope people find this useful and not too cumbersome. Possible improvements
# abound, mostly on finding a way to do without the preprocessing, and getting
# the pages in real time. Hopefully it evolves into something better.
#
# Suggestions always welcome. Discussion thread and updated source is at:
# http://www.bay12forums.com/smf/index.php?topic=70728.0
#
# The author wishes no credit, and in fact wishes to distance himself from
# such shoddy code. All rights returned to the Public Domain.
use strict;
# Some warnings triggered for uninitialised values, first time uses,
# and otherwise unviable comparisons, but can be ignored if debug print
# shows correct data input.
use Time::Local; # For date/time conversions. May or may not be needed.
my @posts = ();
#( (Author, post-url, post#, date, h, m, am, timestamp), ...);
my %dudes = ();
#[Player => (latest-timestamp, count, url, url, url...), ...];
my %votes = ();
#[Player => (latest-red-text, url), ...]
my $prodtime = 129600; my $prod;
# $prodtime is 36hrs in seconds, edit for different prod rules.
# Players to be prodded will be appended to $prod
# if last post was more than $prodtime ago.
# Replace list for Paranormal 17; left as example.
# Be sure to remove or replace with that of your current game.
my %replace = ( "OldPlayer => "NewPlayer",
PlayerA => "PlayerA: dead, town",
RandomPoster => "Nonplayer",
Mephansteras => "Moderator" );
# just add AnyOneElse => "Nonplayer",
# and 'So and So' => "So and So: dead, role",
# or OldPlayer => "NewPlayer"
# Moderators should be specified to avoid asking them to prod themselves.
my @reds = (); # temporarily hold red text in post
my $i=-1; # index
my $TZOFF=7200; # timezone offset to forum time in seconds
my $DEBUG=0; # Change for debug-mode printout of ordered post list.
my %months = ( January => 1, February => 2, March => 3, April => 4,
May => 5, June => 6, July => 7, August => 8,
September => 9, October => 10, November => 11, December => 12 );
# I - Find the stuff
while (<STDIN>) {
# find post author, and increment index
/<h4><a href=.*>(.*)<\/a><\/h4>/ && ( $posts[++$i][0] =
$replace{$1} ? $replace{$1} : $1 ); # Change to replacement if any.
# find post url, as fixed by preprocessing script posturlfix.pl
/<h5.*url=(.*?)>/ && ( $posts[$i][1] = $1 );
# find post number, date and time.
/<strong>Reply #(\d*) on:<\/strong> (.*?) (\d{2}):(\d{2}):\d{2} (am|pm)/ &&
( ($posts[$i][2], $posts[$i][3], $posts[$i][4], $posts[$i][5], $posts[$i][6]) =
($1, $2, $3, $4, $5) );
# prevent signatures and quotes messing the red text finder
next if /<div class="signature".*\/div>/;
while ( /(.*)<blockquote.*?<\/blockquote>(.*)/ ) { $_ = $1 . $2 ; };
# find red text, record it and the url in $votes{dude}
@reds = /<span style="color: (?:red|\#ff0000);" class="bbc_color">(.*?)<\/span>/g;
( ($votes{$posts[$i][0]}[0], $votes{$posts[$i][0]}[1]) =
($reds[$#reds], $posts[$i][1]) ) if @reds;
};
# II - Print the stuff [DEBUG]
if ($DEBUG) { # Note: replacements have already happened.
print "Author\tPost#\tDate\tURL\n";
print "--------------------------\n";
for $i (0 .. $#posts) {
print $i+1, ": $posts[$i][0]\t#$posts[$i][2]\t$posts[$i][3] ";
print "at $posts[$i][4]:$posts[$i][5] $posts[$i][6]\n$posts[$i][1]\n";
};
print "\n Total posts: ", $#posts+1, "\n";
};
# III - Clean up the stuff
# clean up html from red text
my $red;
foreach $red (keys %votes) {
$votes{$red}[0] =~ s/ //g;
$votes{$red}[0] =~ s/<.*?>//g;
};
# clean up dates, populate dudes
my ($year, $month, $day, $h, $m, $ts);
for $i (0 .. $#posts) {
# Fix dates
# for $posts[i][j],
# where i is index, j is 3: month, day, year; 4: h; 5: m; 6: am; 7: ts
# split if in the form "Month day, year,"; warnings triggered if "Today"
($month, $day, $year) = split(/ /,$posts[$i][3]);
$day =~ s/,//; # remove comma
$month = $months{$month}-1; # make month a number, zero based
$year =~ s/,//; # remove comma
$year -= 1900; # make year 1900 based as per localtime
# if "Today", use =localtime's= today's date instead (beware!)
($day, $month, $year)= (localtime)[3,4,5] if $posts[$i][3] =~ /Today/;
# fix hour of the day to 24hrs format
$h = $posts[$i][4];
$h = 0 if ($posts[$i][6] eq "am" and $h==12);
$h += ($posts[$i][6] eq "pm" and $h<12) ? 12 : 0;
$m = $posts[$i][5];
# create epoch-based timestamp in seconds
$posts[$i][7] = timelocal(0, $m, $h, $day, $month, $year);
# Populate dude summary
# update timestamp; use latest of current or post timestamp.
$dudes{$posts[$i][0]}[0] = ($posts[$i][7]>$dudes{$posts[$i][0]}[0])?
$posts[$i][7]:$dudes{$posts[$i][0]}[0];
# initialise index if needed; urls start at position 2
$dudes{$posts[$i][0]}[1] = 1 if $dudes{$posts[$i][0]}[1] < 1;
# add post url and increment index
$dudes{$posts[$i][0]}[++$dudes{$posts[$i][0]}[1]] = $posts[$i][1];
};
# IV - Report the stuff
my $dude;
my $x; # unused scratch values
my %r; # report: [ alive => (list of alive), dead => (...), mod => (...) ]
my $s; # section of the report to use, "alive" by default
# build report in sections
foreach $dude ( sort keys %dudes ) {
# What's up with dude? In which section of the report should dude go?
$s = "alive";
$s = "dead" if $dude =~ /dead/;
$s = "mod" if $dude =~ /Moderator/ or $dude =~ /Nonplayer/;
# get date from post timestamp
($x, $m, $h, $day, $month, $year, $x, $x, $x) = localtime($dudes{$dude}[0]);
# print dude to report section ( i.e., $r{$s} )
$r{$s} .= "[b]". $dude ."[/b] has [b]". ($dudes{$dude}[1]-1) ." posts[/b]; the last ";
$r{$s} .= "[b]". int ((time + $TZOFF - $dudes{$dude}[0]) / 3600) ." hours ago.[/b] ";
# dude's latest red text, if alive player
$r{$s} .= "Red text: [color=red][url=". $votes{$dude}[1] ."]". $votes{$dude}[0]
."[/url][/color]. " if $s eq "alive";
# list of urls
$r{$s} .= "Posts: ";
for $i (2 .. $dudes{$dude}[1]) {
$r{$s} .= "[url=". $dudes{$dude}[$i] ."][". ($i-1) ."][/url]";}; $r{$s} .= "\n";
# Should dude be prodded? Only if alive player
$prod .= $dude . ", " if (time + $TZOFF - $dudes{$dude}[0]) > $prodtime
and ( $s eq "alive" );
};
# Print header, then the report in sections
print "[hr][center][b][i]LurkerTracker[/i][/b][/center][hr][font=courier][size=8pt]";
print $r{"alive"}, "\n", $r{"dead"}, "\n", $r{"mod"};
# V - Print deaths, replaces & prods.
# $replist skips corpses, mods, and kibitzers; corpses go to the $morgue.
my ($rep, $replist, $morgue);
foreach $rep ( keys %replace ) {
$replist .= $rep . " => " . $replace{$rep} . "; " if
not $replace{$rep} =~ /dead/
and not $replace{$rep} =~ /Nonplayer/
and not $replace{$rep} =~ /Moderator/ ;
$morgue .= $replace{$rep} . "; " if $replace{$rep} =~ /dead/ ;
};
print "[hr][font=courier][size=8pt]" if $replist or $morgue or $prod;
print "[b]Replacements considered:[/b] ", $replist =~ /(.*); $/, ".\n" if $replist;
print "[b]The Morgue:[/b] ", $morgue =~ /(.*); $/, ".\n" if $morgue;
print "\n[b]Moderator: please prod ", $prod =~ /(.*), $/,
".[/b] They haven't posted in over ", $prodtime / 3600, " hours." if $prod;
# Print footer
print "[/size][/font][hr]\n";
# End
Update: this code now reflects the latest version, with
DreathplacementTracking, AutoProd, RedTextFinder, and sorting alive players first. See
here for what the new version looks like.
III - How do I use it? What are the caveats?It is as easy as 1-2-3, just thee little
(cumbersome, but see point IV below for possible improvements) steps. At the moment, the process is as listed below. I'll use the running VM4 as an example, which will produce the summary at the start of this post.
1. Grab the pages.Go to the topic, and for each page since the game proper started (page one, in this case, page four in P17, with standard 15-posts-per-page settings), right click on the page and select "save page as...", with the HTML-only option. It will default to "index.php.html"; to this, I append the page number. I end up with index.php.html.1 through index.php.html.8 (eight pages in that game as of this writing; if you use forum settings for more posts per page, this step will use fewer files; the script won't care either way).
Optional step: the game proper started with reply #13; if you don't want the "in" and other pre-game discussion to be included, simply edit the first file, search for "#13", and remove all text before the first "
<h4><a href="http://[...]" title="View the profile of Ottofar">Ottofar[/url]</h4>" before that. It will get rid of about 850 lines of crap that the script doesn't care about. Be sure to leave the <h4> line, as that's the first thing we look for in the script.
2. Concatenate your pages into one large file.How you do this depends on your operating system. In linux, one easy way is:
$ for i in 1 2 3 4 5 6 7 8 ; do cat index.php.html.$i >> vm4_pages ; done
You now have one file, vm4_pages, with all the posts in the game.
3. Now run the preprocessor, and feed the output to lurkertracker, and the output of that to your summary text file. A simple way to do this is piping them together, but you can also use intermediate files.
$ ./posturlfix.pl < vm4_pages | ./lurkertracker.pl > vm4_summary.txt
...and that's it. vm4_summary.txt will look like the report at the top of the page. In fact, steps two and three could be combined, just feeding "index.html.*" to the preprocessor, but the intermediate file lets you pause and make sure you have your ducks in a row.
As for caveats or other points, a couple:
- Beware of "Today": the forum will use "Today" instead of the current date for posts less than 24hrs old. The script recognises this and changes it to your current local date. The flipside of this is that if you run the report a couple of days later, it will change it to today's today, not to a couple of days ago's today. So if you wish to keep current, be sure to re-save those pages after their proper date shows instead of "Today."
- Timezone: the posts time is shown in the forum's localtime. For the "last post was so and so ago" part, the script needs an offset between your local time and forum time, in seconds. The variable $TZOFF holds this; it has now 7200, as I'm two hours earlier than the forum, but be sure to change it to match your locale.
- Replacements: the top section, where variables are declared, includes the death/replacement tracking. This should be updated to reflect your current game. The format is pretty straighforward: "oldplayer" => "newplayer" for replacements, "player" => "player: dead, roleflip" for deaths. If confusing, the whole hash can be removed with no impact to the result, but of course deaths/replacements won't be tracked then.
- Prods: there's a variable called $prodtime now, which holds the time from last post needed for a prod to be printed for the player; it needs to be in seconds, and is currently set at 36 hours. Edit if your prodding rules are different, or just add a couple of zeroes if you want no prods issued.
- Debug: the script includes a "debug print" mode. If the variable $DEBUG is changed to non-zero, it will produce a list of all posts it found, with author, reply number, forum post date and URL. This will make it easier for you to see if it's using the correct data. Sample debug output follows.
Author Post# Date URL
--------------------------
0: Ottofar #13 November 07, 2010, at 01:11 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1697678#msg1697678
1: GlyphGryph #14 November 07, 2010, at 01:24 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1697713#msg1697713
2: JanusTwoface #15 November 07, 2010, at 01:32 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1697731#msg1697731
3: IronyOwl #16 November 07, 2010, at 02:51 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1697917#msg1697917
4: webadict #17 November 07, 2010, at 03:51 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1698111#msg1698111
5: webadict #18 November 07, 2010, at 03:57 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1698136#msg1698136
6: Pandarsenic #19 November 07, 2010, at 05:06 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1698358#msg1698358
7: IronyOwl #20 November 07, 2010, at 05:11 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1698375#msg1698375
8: JanusTwoface #21 November 07, 2010, at 05:17 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1698384#msg1698384
9: IronyOwl #22 November 07, 2010, at 05:20 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1698393#msg1698393
10: JanusTwoface #23 November 07, 2010, at 05:30 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1698423#msg1698423
[...trimmed...]
90: GlyphGryph #103 November 13, 2010, at 02:00 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1713691#msg1713691
91: JanusTwoface #104 November 13, 2010, at 05:05 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1714240#msg1714240
92: IronyOwl #105 November 13, 2010, at 06:26 pm
http://www.bay12forums.com/smf/index.php?topic=69398.msg1714565#msg1714565
93: webadict #106 <strong>Today</strong> at at 09:04 am
http://www.bay12forums.com/smf/index.php?topic=69398.msg1719983#msg1719983
Total posts: 94
IV - It sucks! I can do better!Please do! My perl is ten years rusty and my regexp-fu feeble; if people find it useful I'm sure many regulars can improve or rewrite it to make it much, much better. It could probably grab the forum pages dynamically instead of having to save them to a file first, be cleaner/smarter/faster, or have a number of new features. Knock yourselves out.