Will read requests come back in order?
No not necessarily. Each vault controller will have a queue that is used to buffer references for that vault’s memory. The vault controller may execute references within that queue based on need rather than order of arrival. Therefore, responses from vault operations back to the external serial I/O links can be out of order.
However, requests from a particular external serial link to the same vault/bank address are executed in order. Requests form different external serial links to the same vault/bank address are NOT guaranteed to be executed in a specific order and must be managed by the host controller. See page 12 of attached specification document.
What is the address mapping scheme? Can I customize the address mapping scheme for the HMC?
The user can select a specific address mapping scheme so that access of the HMC vaults and banks is optimized for the characteristics of the request address stream. If the request address stream is either random or generally sequential from the low-order address bits, the host can choose to use one of the default address map modes offered in the HMC. The default address mapping is based on the maximum block size chosen in the address map mode register. It maps the vault address toward the less significant address bits, followed by the bank address just above the vault address. This address mapping algorithm is referred to as “low interleave,” and forces sequential addressing to be spread across different vaults and then across different banks within a vault, thus avoiding bank conflicts.
How many of the vaults can I access from one vault?
Each link is associated with local vaults that are referred to as a quadrant. Access from a link to a local quadrant may have lower latency than to a vault outside the local quadrant. However, you may access any vault from any link. Bits within the vault address designate both the specific quadrant and vaults within that quadrant. Please see Memory Address-to-Link Mapping in the HMC Gen2 datasheet.
What are the current HMC configurations?
HMC Gen 2 supports 10, 12.5, or 15Gbps lane speed, up to 4GB of density and a maximum of 240GB/s of aggregate link bandwidth.
What is the deal with timing in the HMC?
All in-band communication across a link is packetized. There is no specific timing associated with memory requests, and responses can generally be returned in a different order than requests were made because there are multiple independent vaults, each executing independently of the others.
Can I implement my own logic in the HMC logic die?
If each Read or write request requires a tag, how many tags do I have available at any one time?
Tag fields in packets are nine bits long, which is enough space for 512 tags. Other host links will use the same tag range of 512 tags, but they are uniquely identified by their association with each different host link. Additionally in a multi-HMC topology, the host could choose to expand the number of usable tags by associating each tag set (based on host link) with the target curve of the request. Thus, the total maximum number of unique tags is 512 times the number of host links of one HMC, times the number of cubes in the chain. For example, if four links of one HMC are connected from an HMC to the host, there are 2048 usable tags for requests to the HMC.
Does the HMC have CRC and ECC?
A CRC field is included in the tail of every packet. The CRC covers the entire packet (header, all data, and the nonCRC tail bits). The link retry logic retransmits packets across the link if link errors are detected. CRC provides command and data checking all the way to the destination vault controller. ECC provides error coverage of the data from the vault controller to and from the DRAM arrays. Thus, data is protected along the entire path to and from the memory device.
What kind of monitoring and/or testing equipment is currently used for the HMC?
We use the sidewinder to access information from I2C and JTAG. Please contact us for more information.
What is a FLIT?
Commands and data are transmitted in both directions across the link using a packet-based protocol where the packet consists of 128-bit flow units called FLITs. These FLITs are serialized, transmitted across the physical lanes of the link, and then reassembled at the receiving end of link.
Can you tell me more about Atomic operations?
Atomic requests involve reading 16 bytes of data from DRAM (as determined by the request ADRS field0, performing an operation on the data through the use of a 16-byte operand(either included in the request packet or pre-defined), and then writing the results back to the same location in DRAM with no intervening accesses allowed.
What is the latency inside the memory? With the memory and the controller?
The unloaded latency for the HMC memory will be sub-100ns. The performance model that can be provided to show what happens when you load the HMC up with a certain access pattern.
Do you have any data on Latency Numbers with Daisy Chaining?
We do not have these yet. However, we can provide our thoughts:
“There is an important trade-off between the capacity and performance of the HMC. As more cubes are added to a network of memory devices, there are more overheads associated with moving data around the network as compared to a single cube. However, the trade-offs are reasonable in that adding capacity in the form of extra cubes only has a reasonably small impact on overall execution time. In addition there are techniques that can be applied to further hide the performance impact associated with cube chaining.”
In short, the latency of the controller with the FPGA is going to be higher than that within the HMC logic layer itself, so adding more cubes is only going to minimally increase the latency (i.e. adding an extra HMC in a chain won’t be a linear relationship to latency increase).
Power in the HMC
Can you talk about the Energy Efficiency of the HMC in pJ/bit, but also in terms of varied types of bandwidth?
Micron has a power calculator that can be obtained on micron.com or by request to HPC@micron.com
Can I turn off a link I am not using to conserve energy?
Yes you can. Each link can independently be set into a lower power state thorugh the usage of the power state management pins.