Will read requests come back in order?
Not necessarily. Each vault controller will have a queue that is used to buffer references for that vault’s memory. The vault controller may execute references within that queue based on need rather than order of arrival. Therefore, responses from vault operations back to the external serial I/O links can be out of order.
However, requests from a particular external serial link to the same vault/bank address are executed in order. Requests form different external serial links to the same vault/bank address are NOT guaranteed to be executed in a specific order and must be managed by the host controller. See page 12 of attached specification document.
What is the address mapping scheme? Can I customize the address mapping scheme for the HMC?
The user can select a specific address mapping scheme so that access of the HMC vaults and banks is optimized for the characteristics of the request address stream. If the request address stream is either random or generally sequential from the low-order address bits, the host can choose to use one of the default address map modes offered in the HMC. The default address mapping is based on the maximum block size chosen in the address map mode register. It maps the vault address toward the less significant address bits, followed by the bank address just above the vault address. This address mapping algorithm is referred to as “low interleave,” and forces sequential addressing to be spread across different vaults and then across different banks within a vault, thus avoiding bank conflicts. See Memory Addressing on page 43 of attached specification document.
How many of the vaults can I access form one vault?
Each link is associated with eight local vaults that are referred to as a quadrant. Access from a link to a local quadrant may have lower latency than to a link outside the quadrant. However, you may access any vault from any link. Bits within the vault address designate both the specific quadrant and vaults within that quadrant. Please see Memory Address-to-Link Mapping on page 44 of attached specification document.
What is are the current HMC configurations?
HMC Gen 2 only has the option for 10, 12.5, or 15Gbps lane speed, up to 4GB of density and a maximum of 240GB/s of aggregate link bandwidth. Please see page 14 of the attached specification document for more information.
What is the deal with timing in the HMC?
All in-band communication across a link is packetized. There is no specific timing associated with memory requests, and responses can generally be returned in a different order than requests were made because there are multiple independent vaults, each executing independently of the others. Page 14 of attached document.
Can I implement my own logic in the HMC logic die?
If each Read or write request requires a tag, how many tags do I have available at any one time?
Tag fields in packets are eleven bits long, which is enough space for 2048 tags. Other host links will use the same tag range of 2048 tags, but they are uniquely identified by their association with each different host link. Additionally in a multi-HMC topology, the host could choose to expand the number of usable tags by associating each tag set (based on host link) with the target curve of the request. Thus, the total maximum number of unique tags is 2048 times the number of host links of one HMC, times the number of cubes in the chain. For example, if four links of one HMC are connected from an HMC to the host, there are 8192 usable tags for requests to the HMC. Please refer to page 54 of attached specification documentation.
Does the HMC have CRC and ECC?
A CRC field is included in the tail of every packet. The CRC covers the entire packet (header, all data, and the nonCRC tail bits). The link retry logic retransmits packets across the link if link errors are detected. CRC provides command and data checking all the way to the destination vault controller. ECC provides error coverage of the data from the vault controller to and from the DRAM arrays. Thus, data is protected along the entire path to and from the memory device. See page 42 of attached specification document for more information on the CRC algorithm and calculation protocol.
What kind of monitoring and/or testing equipment is currently used for the HMC?
We use the sidewinder to access information from I2C and JTAG. Please contact us for more information.
What is a FLIT?
Commands and data are transmitted in both directions across the link using a packet-based protocol where the packet consist of 128-bit flow units called FLITs. These FLITs are serialized, transmitted across the physical lanes of the link, and then reassembled at the receiving end of link. See Page 18 of attached specification for more information.
Can you tell me more about Atomic operations?
Atomic requests involve reading 16 bytes of data from DRAM (as determined by the request ADRS field0, performing an operation on the data through the use of a 16-byte operand(either included in the request packet or pre-defined), and then writing the results back to the same location in DRAM. For allowed operations, please see page 70 of attached specification document.
What is the latency inside the memory? With the memory and the controller?
The unloaded latency for the HMC will be sub-100ns. The performance model that can be provided to you will show what happens when you load the HMC up with a certain access pattern.
What kind of testing has been implemented on the HMC/ Temperature and vibration stressing studies? Do have HMC in extended temp range?
Do you have any data on Latency Numbers with Daisy Chaining?
We do not have these yet. However, we can provide our thoughts:
“There is an important trade-off between the capacity and performance of the HMC. As more cubes are added to a network of memory devices, there are more overheads associated with moving data around the network as compared to a single cube. However, the trade-offs are reasonable in that adding capacity in the form of extra cubes only has a reasonably small impact on overall execution time. In addition there are techniques that can be applied to further hide the performance impact associated with cube chaining.”
In short, the latency of the controller with the FPGA is going to be higher than that within the HMC logic layer itself, so adding more cubes is only going to minimally increase the latency (i.e. adding an extra HMC in a chain won’t be a linear relationship to latency increase). It is obviously application specific, but if you have a latency of around 200 ns then adding three more HMC’s might put you at 300 ns
Power in the HMC
Can you talk about the Energy Efficiency of the HMC in pJ/bit, but also in terms of varied types of bandwidth?
Micron has a power simulator that can be accessed by contacting HPC@micron.com
Can I turn off a link I am not using to conserve energy?
Yes you can. Each link can independently be set into a lower power state thorugh the usage of the power state management pins. See page 35 of attached specification for more information.