So there is definite utility to getting at online data like this. But there's definitely a lot of potential for abuse, especially this notion of "let us collect EVERYTHING and then we'll only go back and mine it after the fact". I've heard some pundits from the intel community pointing to the Tsarnaevs as proof of this and saying that they were able to go look at their Facebook, email, etc. and quickly determine their links. To which my response is: Big fucking whoop. Didn't do much good to stop them in Boston. And it still took locking down an entire major American city for two days and an old-fashioned manhunt to track them down. It's not Minority Report. No amount of intel is going to let you predict and prevent attacks because the level of detail you'd need would be unacceptably intrusive, and you'd be completely buried in that same level of detail for the 99.99% of the populace who AREN'T a threat.
This bit actually isn't really true. Case in point,
Watson. It isn't people sifting through the data you should worry about. They don't, and it's how they stay technically legal. "No one is listening to your phone calls," that statement is accurate. Because that would be inefficient in the highest degree. Computer programs are what is dealing with this data. It's a Big Data project, which is extremely evident both from how the NSA has been talking about it and how the information is handled/collected. This data is quite clearly being used as training sets for learning algorithms; which, again, is in line with what the NSA has stated.
The program is almost certainly as large or larger than Watson, which itself used a bit over 400 learning algorithms in combination with a meta-self-analysis of how those algorithms performed compared to one another in a variety of self-defined categories. If IBM did that as a small side-project, you can damn well bet the NSA has a much better one for surveillance. After all, unlike IBM, such a project falls smack dab in the middle of their mission goals.
Then there's a question of capability: what is it able to do, what is its purpose. Quite frankly, I actually believe the NSA's answer that it is using this data for investigating 'foreign threats.' The reason I believe it is because the data they are looking at would be very useful for such a purpose, and for the reasons they have stated (a control group). In training such learning algorithms, that metadata would be very useful. A pattern recognition system capable of picking out such threats would be very low-hanging fruit indeed, and if they didn't have such a system, I would call them pants-on-head incompetent. As a result of that and the agency's current primary mission goals, I would say that yes, they do have a very powerful system for detecting foreign-based groups planning 'events' in the US.
I would also say they probably don't have a system for accurately detecting domestic threats -- yet. However, that is the next low-hanging fruit. If they don't have one 5 years from now, color me shocked. And when I say domestic threats, I both mean violent and activist groups. Lone wolves who keep to themselves would be a good deal harder to detect. That isn't to say they don't have some sort of system/s for detecting them today; they do. But they are still likely quite primitive, requiring a rather large amount of manual labor to investigate leads. By the time the next president is sworn in, pretty much all the systems for a largely-automated surveillance state will probably be in place. And that is what I'm more worried about.