Oh jeez, I missed a few months.
General Update:No major news from this front. An unexpected spike in new topics on the 27th of March, along with an odd increase in online members around that time. Any ideas as to what that could be? Did the forums get hit by spammers?
In other news, I took a class on how to run statistical regressions and model building, so:
I'm proud to reveal the first model of B12 Posting Trends!Based on a regression of daily post counts from 1/1/2015 to 12/31/2016, the equation determining how many posts are expected on a given day is:
Posts = 2506.71 + 2.18*Days since last update + -283.18*February + -576.60*March + -615.83*April + -676.63*May + -762.90*June + -928.52*July + -1077.16*August + -1415.67*September + -1341.61*October + -1552.25*November + -1115.57*December + 120.98*Monday + -101.43*Saturday + -11.37*Sunday + 65.69*Thursday + 121.15*Tuesday + 96.09*Wednesday
R2 = .5074
All variable except Sunday and Thursday are statistically significant at the 95% level.
So what the actual fuck did you just read?I've created a statistical model to determine what factors affected the number of posts on a given day. I looked at a lot of factors, including ban records, update history, day of the week, and a dozen other variables that the math decided wasn't a significant factor. I settled on three variables that are able to predicted 50.74% of the variation in this data set: 1. Days since the last update. 2. Month of the year. 3. Day of the week.
The second two are broken up into what are called fixed effects, or binary variables: it is either Monday (Monday=1), or it is not (Monday=0). This is why there are 17 random variables sitting around; each one is 1 or 0 depending on the day or month you are trying to look at. You may also notice that January and Friday are omitted. This is because Stata, the program I am using, chooses a variable from the set of fixed variables, and sets all the other variables to be compared to that one. Basically, if you're looking at a Friday in January, just look at the constant term, and ignore the month/week variables.
THIS DOES NOT PREDICTED THE RIGHT NUMBER WITH 50% ACCURACY!!! R
2 is a term used to determine how accurately a regression explains the variance of a data set. If I add a very significant variable that explains a lot of the variation, R
2 will increase sharply. This model isn't a great one, because there are so many variables that I can't actually measure and include.
I also came up with a model for Monthly totals, but it's not great, and has poor predictive powers, so I'm not going to be talking about it until I've come up with a better one.
xi: reg posts days i.month i.week
i.month _Imonth_1-12 (naturally coded; _Imonth_1 omitted)
i.week _Iweek_1-7 (_Iweek_1 for week==Friday omitted)
Source | SS df MS Number of obs = 731
-------------+---------------------------------- F(18, 712) = 40.74
Model | 96198825.3 18 5344379.19 Prob > F = 0.0000
Residual | 93397910.4 712 131176.841 R-squared = 0.5074
-------------+---------------------------------- Adj R-squared = 0.4949
Total | 189596736 730 259721.556 Root MSE = 362.18
------------------------------------------------------------------------------
posts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
days | 2.177757 .1950183 11.17 0.000 1.794877 2.560637
_Imonth_2 | -283.1795 66.53974 -4.26 0.000 -413.8171 -152.542
_Imonth_3 | -576.6013 65.56843 -8.79 0.000 -705.3319 -447.8708
_Imonth_4 | -615.8251 67.04985 -9.18 0.000 -747.4642 -484.186
_Imonth_5 | -676.6282 66.33213 -10.20 0.000 -806.8582 -546.3982
_Imonth_6 | -762.9003 67.2682 -11.34 0.000 -894.968 -630.8325
_Imonth_7 | -928.5204 67.41095 -13.77 0.000 -1060.868 -796.1724
_Imonth_8 | -1077.156 69.2055 -15.56 0.000 -1213.027 -941.2848
_Imonth_9 | -1415.672 71.93065 -19.68 0.000 -1556.893 -1274.45
_Imonth_10 | -1341.613 74.06899 -18.11 0.000 -1487.033 -1196.193
_Imonth_11 | -1552.247 77.57988 -20.01 0.000 -1704.56 -1399.934
_Imonth_12 | -1115.574 66.61788 -16.75 0.000 -1246.365 -984.7826
_Iweek_2 | 120.9837 50.14381 2.41 0.016 22.53629 219.4311
_Iweek_3 | -101.4252 49.99382 -2.03 0.043 -199.5781 -3.272237
_Iweek_4 | -11.36799 50.12367 -0.23 0.821 -109.7759 87.03989
_Iweek_5 | 65.69134 49.99832 1.31 0.189 -32.47042 163.8531
_Iweek_6 | 121.1508 50.14273 2.42 0.016 22.70554 219.5961
_Iweek_7 | 96.08893 50.13079 1.92 0.056 -2.332919 194.5108
_cons | 2506.71 56.12025 44.67 0.000 2396.529 2616.891
------------------------------------------------------------------------------
If you want to learn more, shoot me a PM, and I can walk you through it. Alternatively, you can check out the Excel sheets, Sandboxes 5 & 6, where I setup the models. I can also send over the raw Stata files if you want to take a look at them.