Goodbye DDR, hello serial memory
16 October 2014
DDR4 is the last of the popular DDR line of memories that the majority of Xilinx customers use. Multiple contenders are vying for a chunk of that market share, leading Tamara I Schmitz, Xilinx, to speculate on its successor.
A seismic shift is shaking up the memory landscape, as the line of popular DDR memories will end with DDR4. This is not cause for immediate panic; DDR3 has a comfortable address on the majority of system boards and DDR4, though ramping slowly, will replace some of those sockets and serve them for years to come. Still, customers are eyeing the next crop of memories and mulling over trade-offs such as bandwidth, capacity or power reductions. The likely successor is LPDDR3/4, with certain application spaces preferring serial DRAM solutions such as Hybrid Memory Cube (HMC).
DDR3 enjoys almost 70 per cent of the DRAM market today, with a sharp uptick in adoption between 2009 and 2010. DDR4 has been slower in adoption, partly because of the incursions of Mobile DRAM, also known as LPDDR.
DDR4 is picking up momentum. Its advantages are a lower supply voltage, which saves power, and higher speed. It will eventually take over for DDR3 in almost every market, eventually driven by the PC space. Despite the fact that PCs no longer drive up to 70 per cent of DRAM consumption, they are still the largest commodity-device segment. For now, according to memory vendors, DDR4 usage is localised more to the server space rather than in personal electronics segments. Still, DDR4 is an excellent choice - it is a well-known memory type and will be available for a very long time, particularly as there is no successor.
Why is there no DDR5? Consumers have an insatiable demand for memory bandwidth for songs, picture or video on smartphones. While, these expectations typically mean more components and more board space, consumers don’t always want their electronics devices to grow to a size proportional to their capacity or performance.
When memory is used with a Xilinx FPGA, there are specific guidelines about how to lay out the board to ensure proper margin and overall system success. Examples include trace lengths, termination resistors and routing layers. These rules limit how much the design can be compacted, or how close together the parts can be placed.
The alternative to the smallest board design would be some bleeding-edge type of packaging. Unfortunately, a new packaging technology that would include die stacking with through-silicon vias (TSVs) would translate into a significantly higher cost. Based on the economies of scale of the industry infrastructure, DDR memory is not a high-cost device and would not be able to adopt a radical departure in packaging or absorb an increased price point.
Consumers also want more speed. Running a system at a higher speed has implications on the board design. DDR is a memory with a single-ended signal that needs proper termination. The faster you operate the system, the shorter the traces need to be from the memory to the FPGA to ensure proper functionality. This means that the devices need to be placed in closer proximity to the FPGA. This will limit the number of memory devices that you can use in a design. Many DDR4 designs will have approached their limit, packing as many devices around the FPGA as possible. Any speed improvements in DDR5 would reduce the area available for the memory devices, reducing the available capacity.
Trends show that the server market is adopting DDR4 while the lower cost of DDR3 continues for now to make it the predominant choice in the PC segment. There is no doubt that consumer appetite will continue to grow for more speed as well as more memory capacity, and eventually PCs will migrate to DDR4.
The most likely choice to replace DDR3 and DDR4 is LPDDR4. The LP stands for low power. Low-Power DDR4 is a type of DDR memory that has been optimised for the wireless market. Advantages are that it is well known, the specs are defined and it is available. The low-power optimisation makes LPDDR4 only a little more expensive than DDR, and it still uses the I/O pins that DDR uses. That makes for ease of migration, because LPDDR4 runs in the same frequency range that DDR runs in.
However, the biggest trade-off is its lifetime. Since the wireless market turns over its products every six to nine months, LPDDR memories change fast, too. If a big company sells products for 10 to 15 years, it is difficult to accommodate a memory that changes every six to nine months. Possibly a manufacturer could guarantee to deliver one version of those devices for that company for 10 to 15 years under a special agreement. Currently, that business model does not exist. Special arrangements could include preserving a process flow - an expensive endeavour that might be worth it only for the largest of opportunities.
Serial memory is emerging as a viable alternative and is a completely different way of looking at the memory space.
As far as an FPGA goes, memory is the last frontier, the last section to go serial. The reason for that is latency. The time it takes to turn the data from a parallel stream into serial, sending it down the serial link, to turn it back from serial to parallel, always took too long. Now, the trade-offs from using the serial link are tolerable in some applications (such as those where there are multiple writes and few reads, like a test-and-measurement system for a CT scanner or a set of telescopes scanning the sky). On the other hand, if the measurement of quality is to write data and immediately read that same data, then serial memory will not perform as well as parallel data in any form. However, if the measure of good memory is high bandwidth, storing lots of videos or sending loads of information over the Internet, then serial memory is tempting.
Latency aside, lifespan is not a problem; these products will be made as long as there is an appetite for them, in comparison with the shorter availability of LPDDR. In fact, if the desire for serial memory grows, multiple vendors are likely to join the business.
Instead of using I/O pins, serial memory leverages SerDes technology. In FPGAs, it is possible to use serial interfacing (transceivers) to run at high rates. More recently, based on the need to reduce latency, vendors have addressed those performance concerns as well. This well-developed serial technology can support very high throughput of 15Gbit/s. The next generation (in the case of HMC or Hybrid Memory Cube) is planned to reach 30Gbit/s.
The strongest serial-memory candidate to replace DDR DRAM, HMC, is being promoted by the HMC Consortium and spearheaded by Micron (illustrated). It is, though, just one type of serial memory. In addition to HMC, MoSys is developing its Bandwidth Engine, a sort of serial SRAM, and Broadcom offers a range of serial-interface TCAMs. Samsung and SK Hynix are promoting High-Bandwidth Memory (HBM), which is a TSV-based DRAM stack with a wide parallel interface. This choice might seem lower risk, since it uses a parallel interface.
At this point, HMC is the strongest contender to take market share from DDR3 and DDR4. It has four or eight stacks of DRAM connected together with TSV technology on top of a logic layer to create the 2G or 4G package. The logic layer creates a convenient interface.
Up to eight devices can be daisychained to add capacity. There is 256bit access and enormous throughput considering the one- to four-link capability (in steps of half a link). Each link comprises 16 transceivers (eight for a half link)—all capable of handling 15Gbit/s. That is an extraordinary amount of bandwidth previously unavailable to memory designers.
Figure 3 shows the lower pin count in the HMC solution by at least a factor of eight, reducing board complexity and routing. The high bandwidth of the SerDes links allows fewer devices. This single device and an FPGA deliver almost a factor of 20 in reduced board space. Finally, the HMC solution consumes one third of the power/bit.
Bandwidth Engine (BE2) by MoSys is like a serial SRAM, not a serial DRAM, using the transceivers to achieve 16Gbit/s. However, it is not a likely replacement for DDR. Instead, with its 72bit access and lower latency, the technology targets QDR or RLDRAM. The application would be storage for packet headers or a lookup table instead of a packet buffer as in the case of DDR.
TCAM (ternary content-addressable memory) is a high-speed memory that performs broad searches of pattern matching found in high-performance routers and switches. The high performance is paid for with expense, power and heat. It is also parallel in nature—it doesn’t use SerDes to reach those speeds. Broadcom is offering serial versions so that the advantages found in serial memories (low pin count and high speeds) can still be associated with a TCAM.
The third memory type is HBM. What people don’t realise is that you can’t buy an HBM device. You would be buying a die from, for example, SK Hynix, and have to mount that inside your package on an interposer, or silicon substrate. Connections between the device and the memory would need to be included in the interposer design to enable this high-bandwidth, parallel memory.
For this memory type to take over the market, companies would need to decide what they want to share in terms of trade secrets and would also have to agree on standards adoptions (interposer design, heights, interfaces, tolerances).
HBM’s latency will be small as the electrons will travel ridiculously short distances since they are within the package. It’s a fantastic idea, but further out into the future.
The one option that is in production now is MoSys’ Bandwidth Engine, BE2. HMC is sampling, and will be in full production by the end of the year. LPDDR4 will be sampling by the middle of this year. HBM is not available as a standalone package, though there are talks about the possibility of serialising HBM into its own package. To buy a die and integrate HBM into a package, talk to Samsung, Hynix or other, smaller vendors, which customers are doing right now.
Contact Details and Archive...