
SanDisk's High Bandwidth Flash (HBF)
Revolutionizing AI Memory Technology
Table of Contents
Introduction
What is High Bandwidth Flash (HBF)?
Technical Architecture
Advanced Stacking Design
Memory Interface Design
Performance and Capacity
Unprecedented Memory Density
Bandwidth Considerations
Applications in AI Computing
AI Inference Optimization
Hybrid Memory Configurations
Challenges and Limitations
Write Endurance Concerns
Latency Implications
Future Outlook
Conclusion
Works Cited
Introduction
AI computing has grown rapidly, but memory capacity and speed remain bottlenecks. SanDisk's new High Bandwidth Flash (HBF) aims to solve this problem. HBF offers similar speed to High Bandwidth Memory (HBM) but with much more storage at a lower cost. This breakthrough could change how AI models run, especially for large language models that currently need distributed computing.
What is High Bandwidth Flash (HBF)?
HBF is a new memory technology that combines NAND flash storage's high capacity with HBM-like speed. It offers 8 to 16 times more capacity than HBM at similar prices, addressing a major AI computing limitation.
HBF isn't meant to replace regular RAM or SSDs. Instead, it targets AI workloads that need massive memory for model storage and processing. SanDisk has redesigned NAND flash to allow fast parallel access, approaching DRAM-like performance for specific tasks while keeping NAND's density and cost benefits.
This approach differs from previous attempts to put SSDs on graphics cards. HBF uses the same interface as HBM, with 1024 data lines connecting each module to the GPU, though it needs some protocol changes to work with flash memory.
Technical Architecture
Advanced Stacking Design
SanDisk's HBF uses BiCS technology and CBA wafer bonding for efficient, high-density stacking. They've developed a way to stack 16 dies without warping, a significant advance in semiconductor packaging.
The HBF structure has multiple HBF core dies connected through silicon and micro bumps. These connect to a logic die and interface layer, which then links to a GPU, CPU, TPU, or SoC die. This stack sits on an interposer above a package substrate, similar to HBM but optimized for NAND flash.
Memory Interface Design
HBF's electrical interface matches existing HBM implementations, which should make it easier for hardware makers to adopt. However, HBF isn't a direct HBM replacement due to needed protocol changes for flash memory's unique traits.
SanDisk has worked with major AI industry players to develop HBF, suggesting they're designing it with practical use in mind.
Performance and Capacity
Unprecedented Memory Density
HBF's main advantage is its huge capacity compared to current HBM. Each HBF module can hold up to 512 GB—about 21 times more than current 24 GB HBM3e modules. In a standard eight-stack setup, HBF could provide 4 TB of memory on a single GPU or AI processor.
To put this in context, modern large language models like GPT-4 need about 3.6 TB of memory for their 1.8 trillion parameters. With HBF, it might be possible to fit these huge models entirely in a single GPU's memory, potentially transforming AI processing.
Bandwidth Considerations
While exact bandwidth figures aren't known, industry experts think the first HBF generation should achieve at least 128 GB/s bandwidth per stack. This matches early HBM implementations, though it's unclear if future versions might reach the 1 TB/s per stack of the latest HBM3E.
Even if HBF starts with lower bandwidth than current HBM, its massive capacity could enable new AI model deployment approaches that prioritize single-device model storage over distributed computing.
Applications in AI Computing
AI Inference Optimization
SanDisk has positioned HBF for read-intensive AI inference tasks, aligning with current AI deployment trends. This focus makes sense for developments like test-time scaling, where models do extra computations to refine outputs without changing training parameters.
HBF's large capacity could enable new model deployment methods. For example, Mixture of Experts (MoE) architectures, which use only parts of a larger model for each task, could benefit from having the entire model accessible in fast memory.
Hybrid Memory Configurations
SanDisk has suggested combining HBF with traditional HBM for optimized AI workload memory setups. For instance, six HBF modules (3072 GB) could pair with two HBM modules (48 GB) on one accelerator. This would let the HBM handle frequently updated data like AI model caches, while the larger HBF stores more static model parameters.
Challenges and Limitations
Write Endurance Concerns
The main challenge for HBF adoption is flash memory's limited write endurance. Flash cells can only handle a finite number of write cycles before degrading. While SanDisk hasn't detailed its approach to this issue, using pseudo-Single Level Cell (pSLC) NAND might offer a good balance of durability and cost.
For AI inference tasks that mostly read from memory, this limitation may be less important. Model parameters typically load once and stay static during inference, which suits NAND's strengths.
Latency Implications
While HBF may match HBM in raw bandwidth, flash memory inherently has higher latency than DRAM. This makes HBF potentially unsuitable for tasks that need immediate memory access with minimal delay. The impact will vary depending on specific AI workloads—those with predictable memory access patterns may see little performance impact, while others with random access needs might experience noticeable delays.
Future Outlook
SanDisk plans three generations of HBF development, showing long-term commitment to this technology. By making HBF an open standard, they seem to be aiming for industry-wide adoption rather than keeping it proprietary. This approach could speed up integration into commercial systems if HBF delivers on its promises.
For future DIY enthusiasts and system builders interested in AI, HBF might create new possibilities for high-performance computing setups. While initial uses will likely target data center AI, the technology could eventually influence consumer-grade hardware as it matures and costs decrease.
Conclusion
SanDisk's High Bandwidth Flash offers a promising solution to AI computing's memory bottleneck. By combining NAND flash's capacity with high-bandwidth access, HBF addresses the trade-off between memory capacity and speed.
While HBF faces challenges in write endurance, latency, and implementation complexity, its massive capacity makes it well-suited for specific AI inference tasks that current memory technologies struggle with. Its focus on read-intensive workloads aligns well with both NAND technology's strengths and modern AI inference patterns.
For the DIY computing community, technologies like HBF signal an exciting future where storage and memory boundaries continue to blur, potentially enabling new approaches to system design, especially for AI applications.
Works Cited
"HBF: 4 TByte Memory on One GPU." Heise Online, 13 Feb. 2025, www.heise.de/en/news/HBF-4-TByte-memory-on-one-GPU-10281621.html
"SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs." TechPowerUp, 14 Feb. 2025, www.techpowerup.com/forums/threads/sandisk-develops-hbm-killer-high-bandwidth-flash-hbf-allows-4-tb-of-vram-for-ai-gpus.332516/
"SanDisk's High Bandwidth Flash Memory Could Equip GPUs With 4TB Of VRAM." HotHardware, 14 Feb. 2025, hothardware.com/news/sandisk-hbf-memory
