actually, there is a barrier for data structures where increasing their size will reduce performance. Once a data structure can no longer fit in available cache space, it takes hundreds of cpu cycles to retrieve it from main memory. Its called cache thrashing, and its probably the biggest reason why AMD chips are less powerful than Intel, they just have smaller cache.
a) A number of AMDs have less cache than a number of Intels, more like. Unless Intel's chucked all the budget/low-cache systems recently and AMD hasn't bothered with top-end variants, but I haven't cared to check the current situation in detail recently, so I could be wrong and my real comment is:
b) The time taken to transfer data (raw data[1] blocks, not data structures) is limited by
bandwidth to the processor, which is narrower than the cache may be small on all systems I've worked with where there
is a cache that's anything more than just a bus-width register.
But I am a bit out-of-date with modern architectures and methodologies, so I suppose someone could have developed an efficient sub-caching system on a system with an ultra-wide and/or ultra-fast pipeline into it, or even manage to do a L2/L1 staging system for data (and sub-set of data) of differing priorities. And of course taking into account multi-core requirements where applicable.
Personally, based upon older phenomena such as "Pagefile Thrashing" and similar (except, I suppose, in reverse), I would have used the term "Cache Thrashing" to indicate that, having loaded one or more complete sets of data into the cache the computations then indicate that another bit of data is required, displacing data that (unbeknownst to the cachig algorithm) is subsequentally to be called back upon by the new data, which displaces another cache item but (once back) facilitates computations that ask for
that old item, etc. Thus shoving whole pages (or cache-level equivalent) of data in and out of the quick-but-small storage area (cache for memory, memory for pagefile) taking more time to access than it might have been if it was just a matter of reading the large-but-slow memory in-situ in the first place.
Well, those are my thoughts, for what it's worth. Probably worth nowt.
[1] As the processor won't have any idea it what groupings it should cache, unless you've programmed at Assembler level and deliberately worked within MMX-style arrayed data references (or whatever they use these days), or have a very good compiler that can manage to optimise that for you
despite your being unaware of what's required. If you have a structure containing a rapidly changing/accessed flag/connectivity byte[2] next to a fairly static array[3] only occasionally queried, then does the cache (and data pipeline bringing it to the chip for caching) shove a block area of memory across or pick and choose bits and pieces (e.g. just the flag bytes) from a number of different cachable areas? Depending on the microcode, assembler and whatnot, it could go either way, but my hunch is that a blocks would be cached just for the sake of their flag-bytes, in this kind of circumstance. Less extreme variants possible, but with the same idea.
[2] Maybe saying "If trying to walk through here, you can go from $adjacent_area1 to $adjacent_area2 in $steps_for_1to2 cycles" in a DF context, and used to determine if it's worth pathing through that area.
[3] The internal routing details of a DF 'zone' which, due to it being not the most promising route and rejected as a possible path, isn't actually explicitly queried to find
how to pass from Adjacent Area 1 to Adjacent Area 2 through this.