Micron HMC Controller
What HMC specification does the Pico HMC controller implement?
The full Hybrid Memory Cube Consortium’s Specification 1.1. This is the specification that corresponds to the second generation HMC.
What are the currently supported FPGA devices?
The HMC Controller currently works with Altera Stratix V, Arria 10 and Kintex UltraScale devices.
What kind of interface does the HMC controller have?
The HMC controller has a five 128-bit port interface or a 512-bit AXI-4 interface with one 128-bit port used for host accesses.
What are the benefits of using one interface over the other?
If you have many requesters and more random accesses then the five 128-bit port controller is better. If you need a single wide port the 512-bit AXI interface is more suitable.
What are the bandwidth numbers using the HMC?
Bandwidth numbers can be viewed in the HMC Controller User Guide. However, the controller itself is designed so it doesn’t have to throttle any of the speeds at any point. Therefore, anything that you see from the performance model of the chip is going to be available to you through our controller.
What clock speed is the controller running at?
The clock speed is 187.5MHz for a x8 controller with 15Gbs links, 375MHz for x16 at 15Gbps, 156.25Mhz for x8 at 12.5Gbps, 312.5 for x16 at 12.5Gbps, 125MHz for x8 at 10Gbps and 250Mhz for x16 at 10Gbps.
What is the latency internal to the Controller?
The total combined latency for the HMC controller can vary from 100ns-700ns round trip (both the RX and the TX sides for a round trip transaction). This range is dependent on how the controller is configured and what kind of features are used. For example, if you are utilizing the multiport interface, the controller will be taking care of creating well formed packets according to the HMC protocol. The 512-bit AXI interface has read data reordering built in so read data is always returned to the user in the order requested but some packets may have a longer latency as a result. Another piece of the picture is the link retry feature and the link CRC. This feature requires that the controller do complete CRC checks on all the incoming data before it is actually delivered. This is one of the main features that increases the latency to ~300ns. Without this the controller is going to be at ~140ns or even down to ~100ns. There are a few reasons why our customers might turn off the CRC checks on incoming data prior to delivery:
- Customers have an application architecture (sitting on top of the controller) that allows for the error to be squashed downstream. In other words, the controller is able to do CRC checks in parallel of the data being delivered by throwing an error flag that is then taken care of within the application architecture itself. In this way, the controller does not have to gate data until we are absolutely certain that it was received.
- Customers have hardare that is designed with enough margin that they are able to turn their retry features off or only keep on the feature that thows an error flag, but doesn’t actually retrain the link.
NOTE: In the rare event of a retry, a long tail will be added to the 300ns latency.
Why is the interface 640 bits wide rather than a binary multiple like 256/512/1024?
This came about as a natural compromise between a couple of different things. The transceivers for Xilinx and Altera both have slightly different gear boxes in terms of when they take in 16 streams of data and turn that into 640 bits and balancing that with clock speed. Narrow is better and 512 would have been a nice number in that it is a binary multiple, but we would have had to run at almost 450MHz to make that work which would just be pushing a bit too much. We are trying to get as narrow as possible without running the clock rate too fast. For example, we could go all the way up to 1024 which is what OpenSilicon was doing for a while, but it is too wide and too slow and causes more problems than it solves. Also, 512 sounds nice, but it actually doesn’t work out well with the packet sizes. For example, the biggest packet which is 128 Byte is going to be 8-flits, plus the header and tail which is going to be 9 flits which does not go into 512 well.
Is there a User Guide on how to best optimize the HMC?
See the hmc_gen2_user_guide.pdf document available on micron.com for information on this. Also the HMC Controller User Guide available with the pico release.
Is there actual command scheduling in the controller or is it all in order from the perspective of the user interface?
The memory cube itself may reschedule. This is because it has the performance to do multiple things simultaneously, so it will let requests pass each other. This could potentially mean that requests are coming back to the controller out of order. We do have the ability to configure on some logic to the controller that can reorder the data if your application requires it. At that point, it is a matter of determining requirements in terms of lowest latency versus in order transactions.
How much of the FPGA does the HMC controller use in terms of LUT, Block ram, interconnect, etc?
The controller uses approximately 32K ALMs/LUTs and 3Mb of memory in Altera/Xillinx FPGAs.
Are there any design examples using the controller?
GUPs has been implemented on all HMC modules (see ‘HMC Hardware and Systems’ for more information). This design will be included with your purchase of the board. Also an AXI HMC memory test sample application is provided that utilizes the 512-bit AXI interface.
More information and design examples are in the works.
What does the controller do to maximize throughput?
The HMC controller is a fully pipelined block designed to maximize throughput. While both read and write operations require multiple clock cycles to complete, the controller allows users to issue many read and/or write requests before the first response is returned by the HMC. This pipelining of read and write requests greatly improves the throughput of the memory for user applications.
Is ECC done within the HMC or within the Controller?
CRC error dectection is used on the serdes links. The CRC is generated on TX packets and checked on RX packets in the HMC controller. An error will trigger a retry on the failed packet. The HMC memory itself uses ECC error detection and correction inside the memory arrays themselves.