A clock speed of 3200 mhz means you get 3 billion operations in a second
Sort of, actually DDR (double data rate) frequency quoted is twice the actual clock speed because it transfers data twice per clock cycle, so it's 1.6 billion clock cycles per second. But the actual operations like set row to read, read a column etc. take several clock cycles - let's say 15 because that's what mine takes. So it's about 10ns just in the latency for the ram chip to respond, which works out to around 33 clock cycles of my CPU unless I'm mistaken, so I'm thinking halving the memory clock would increase the inherent latency on the chip to a full 66 clock cycles.
But anyway, neither the RAM chip's latency or the delay waiting for the signal to carry through the little wires from RAM to CPU would be increased by underclocking the CPU so I don't see how a highly memory bound workload would show such a direct correlation with CPU clock speed.
Right, sorry, 3 billion distinct opportunities to request/return an operation per second, which is not the same thing as ops per second - my bad.
But we're talking like 100 ns for a round trip here. It's not "one hundred cycles" as Putnam threw out for a random example, it's certainly not 30 cycles as in your example, it's MULTIPLE HUNDREDS of cycles.
Op time and data transfer windows are tiny fractions of the total time the CPU spends waiting. Nobody talks about this part because there isn't dick anyone can do about it, because we haven't managed to make electron wave propagation any faster over those kinds of distances yet, but when you hear things like "accessing data in RAM is an order of magnitude slower than accessing data in the L2 cache" this is what they are talking about. Adjusting the RAM clock is a fractional percent difference here.
That kind of small optimization -really- adds up when you're performing tens of thousands of operations in series - I am not saying RAM timings do not matter for performance - but the moment that one operation depends on the next one to complete it becomes very meager difference. Not all memory I/O problems are created equal, hence me talking about 'latency' vs. 'bandwidth' in network terms.
Here's is how this relates to the CPU / Ram timing test:
Decrease CPU clockspeed - Program moves slower. But you have introduced other opportunities for bottleneck. A program that was not CPU bound before can now become CPU bound.
We have proven nothing with this test, save that there is a point at which CPU speed can be a limiting factor for DF performance, but this shouldn't really be a surprise to anyone.
Turn the CPU clockspeed back up, and adjust RAM timings - Program stays the same speed. But as we've established, if the RAM I/O problem you are having is one of data latency, and not one of bandwidth, adjusting the timings will have near 0 effect.
We have also proven nothing with this test, other than that our problem probably wasn't one of RAM bandwidth.
Hence, we have not proven that the program was CPU bound or memory bound - those observations are consistent with a scenario in which your problems with performance are a result of conditional execution against data in RAM. Given that Putnam has also talked a bit about how conditionals in RAM are a big problem for DF performance, I'm inclined to interpret that as the explanation until evidence to the contrary is presented.