A GPU can be effectively used as a specialized physics hardware unit; the architecture is very amiable to such a purpose. Most high end physics stuff is just a bunch of highly parallel, simple tasks like matrix multiplication.
When it comes to specialized hardware, I can't see a legitimate use for much more than a CPU and GPU setup for anything other than extreme cases. The CPU has its fast sequential operations, the GPU has its OMGWTFBBQ high level overall processing power.
The other thing about dedicated hardware is it requires dedicated hardware, which means larger overall circuitry This tautology reduces the speed at which the computer can run. The speed of light is about 300 million m/s. A 1 megahertz computer goes through 1 million cycles per second, allowing light to travel a maximum distance of 300 meters. A 1 gigahertz computer goes through 1 billion cycles per second, allowing information to travel a maximum distance of 30 centimeters through circuitry. In a typical 3 gigahertz CPU, light can travel through approximately 10 centimeters of wire. As a result of this limitation, dedicated hardware like the GPU must run differently (after all, it takes around 3 CPU cycles just for light travelling in a straight line to move from 1 end of my 12 inch long GPU to the other; actual clock speeds are typically around 500megahertz) and, unless the data being computed is truly massive, a large chunk of time will be spent as overhead.
For example, the notes from this particular CUDA lecture I attended:
http://www.cs.rit.edu/~ark/lectures/cuda01/CPU vs. GPU computation time and total time comparison (msec) -- minimum of three runs
----CPU---- ----GPU----
N Comp Total Comp Total
1024 25 32 0 156
2048 74 88 0 284
4096 209 224 1 224
8192 711 748 5 564
16384 2549 2656 22 1752
This was the result of computing the outer product of 2 vectors of length N. Note that while the CPU spends nearly all its time doing computations, the GPU is spending something on the order of 99% of its time merely transferring the data. That is one of the real limitation of dedicated hardware external to the CPU: telling it to do something takes an incredibly long time. If it were doing something much more complex computationally, the time spent computing would make up for the lost time, as demonstrated by the 100:1 computing advantage that particular GPU had over the CPU; but unless such massive computations need to be made, spending the time to transfer data elsewhere is a bit of a waste.
... That said, next quarter myself and some others will be doing an independent study about good practices for programming on the GPU, and good uses we find to put it to, which I will of course report here.