In the Gamasutra interview, Cerny states the following: added another bus to the GPU that allows it to read directly from system memory or write directly to system memory As a result, if the data that being passed back and forth between CPU and GPU is small, you don have issues with synchronization between them anymore We can pass almost 20 gigabytes a second down that bus. That not very small in today's terms it's larger than the PCIe on most PCs! sounds almost exactly like Garlic, with some additional HSA features baked in. Remember, the point of HSA is to allow CPU and GPU to share a common set of pointers and swap data more efficiently. It suggests that the PS4 interconnect structure looks something like this:

This simplified structure shows the GPU with the lion share of access to memory bandwidth. Both the Onion and Garlic interfaces are faster than they were in Llano, and they tied to much faster memory, but they function in the same basic way. This is the most logical design based on what AMD has done before, it incorporates the direct memory bus that Cerny discusses, and it would be the easiest system for AMD to design given the firm limited resources.

The disadvantage is that it not particularly efficient. This table of available CPU GPU bandwidth in Llano based on the type of operation being conducted indicates the problem:

Much of the anecdotal information on the PS4 suggests that the chip is designed for a much greater degree of sharing than Piledrive/Llano. We suspect that Option 1 integrated HSA like features with an APU style design. AMD would have had to make a number of improvements to bring these various capabilities up to more uniform bandwidth / latency, but these were improvements the company was planning to make with HSA in any case.

Option 2: Hearkening back to R600 and a modern ring busAMD could have opted for a ring bus. Ring buses are great for joining multiple components in a high bandwidth, low latency configuration where data is shared across multiple elements. Intel uses a ring bus for Sandy Bridge and Ivy Bridge, and AMD first programmable GPU (R600) used one as well. The advantage of a ring bus is that it be simple. Not every component needs the same amount of memory bandwidth (the estimated 176GBps of memory bandwidth would be wasted on the CPUs) so you end up with 20GBps of bandwidth for the CPU cores and 176GBps of bandwidth for the GPU.

Sony has some experience with ring buses the PS3 Cell Architecture used one to manage communication between the various processing elements but we don think this is a likely approach for the PS4. There no particular problem that a ring bus would solve, and no specific use case that strongly suggests AMD would adopt one. Intel has used a ring bus in Sandy Bridge and Ivy Bridge, but these GPUs are tiny compared to the 18 CU design that built into the PlayStation 4.

The PS3 with its 8 cores Cell CPU was way ahead of anything that Intel or AMD had at the time.

is the reason why it was so advanced. IBM had and still has more resources than Intel to deliver the most advanced CPUs on the market. what standard? Not lithography. IBM is part of the Common Platform Alliance, which is still working on 28nm tech, with 20nm and below scheduled for risk production around the same time Intel moves to 14nm.

Not manufacturing. IBM total wafer start capacity is a fraction of Intel

Not EUV. No one has figured EUV out yet, but IBM isn leading the pack.

IBM memory processes have a small advantage over Intel in some metrics at this time:Besides that, I aware of nothing. Yes, IBM still keeps a hand in research. But when it comes to manuf ray ban acturing and volume, they have faded from the market. It not an accident.

Urgh, your terminology is confusing Are you referring to the SPUs or PPU? If the SPUs, you are correct, quite advanced except the programming paradigm was widly different from standard programming. The PPU has penalties and isn anything special. It has a 50 cycle LHS (Load Hit Store) associated with many operations that the 360 does not This causes serious issues for intensely used code that requires optimization. The Cell was engineered in a manner that caused headache to developers that the standard PPC cores did not have. Its a give/take with the previous gen all over the place.

