*snip*
In short, yep.
What people call a 2D array is actually this (addresses and values arbitrarily added for clarity):
Numbers over the first elements are the starting address of the allocated blocks of memory; numbers inside are the values.
So, when you create one of these things, you first create the first block of memory in that image (0xab78700).
So the correct operation is actually
float **A = (float**)malloc(4*sizeof(
float*)); as was pointed out by the earlier post. In a 32 bit program, yours would work by coincidence, since a float is 32 bits. In a 64 bit program, it would crash, since the pointer size is 64 bits in a 64 bit program.
After that, you allocate the blocks of memory in which you are actually going to be putting your data. When you make your calls to malloc, it returns the pointer values telling you where those newly allocated blocks of memory are (0xbf749030, 0xbc974970, 0xac974310, 0x4683bc70 in this hypothetical case). So you then take those values, and store them in your first array, which essentially is serving as a ledger telling you where to find your other arrays. That's what you're doing with your mallocs in the for loop.
After that, you now have your "2D array" which contains the pointers locating your actual data blocks. You can now fill them with float values at your leisure by looking into your first array for an allocated block's pointer, then following that pointer to its block.
A[2][3] essentially is just doing:
float* B = A[2];
float C = B[3];
composed in a single line.
Something else to keep in mind: when using C++, you should generally
avoid this sort of multidimensional array. Calls to malloc are expensive; doing a small number of large mallocs is almost always better than a large number of small mallocs. As implied by the diagram, the memory addresses it gives you can also be scattered around; which results in very slow access times due to cache misses (which occur when the CPU isn't able to predict where data will be coming from next, and so is unable to move it into a nearby cache level; similar to a small-scale version of loading from disk vs from RAM; code using data near other data it recently used helps performance here). For these reasons, it is usually better to do an allocation for RowSize*ColSize, allocate a 1D array, and just index into it with [x+y*width].