NextSilicon’s Maverick-2 Dataflow Engine Redefines Computational Efficiency with Novel Architecture

The Dawn of a New Computing Paradigm

After eight years of development and $303 million in funding, NextSilicon has officially launched its Maverick-2 dataflow engine, marking a significant departure from traditional CPU and GPU architectures. The company is simultaneously introducing Arbel, a custom RISC-V processor designed to complement Maverick-2 in creating powerful host-accelerator combinations. This innovative approach represents what could be the most substantial architectural shift in high-performance computing in decades, with Sandia National Laboratory expected to be among the first major deployments.

The Dawn of a New Computing Paradigm
Breaking the Von Neumann Bottleneck
Intelligent Computing Architecture: A Hardware Revolution
The Software Magic Behind the Hardware
Performance and Efficiency Implications
The Future of High-Performance Computing

Breaking the Von Neumann Bottleneck

NextSilicon’s approach fundamentally challenges the computing status quo. According to Ilan Tayari, NextSilicon’s co-founder and VP of Architecture, traditional CPUs dedicate only about 2% of their silicon to actual computation through Arithmetic Logic Units (ALUs). The remaining 98% serves as overhead for managing instruction and data flow within the Von Neumann architecture that has dominated computing since the 1940s.

“Today’s high-end processors have become complicated and chunky, both physically and practically,” Tayari explained during the Maverick-2 launch. “They dedicate 98 percent of their silicon to overhead, traffic management, and data shuffling—not actual computation.”

Intelligent Computing Architecture: A Hardware Revolution

The Maverick-2 chip represents a monumental engineering achievement with 54 billion transistors fabricated on TSMC’s 5nm process. The monolithic die features four compute regions containing 224 compute blocks arranged in a grid pattern. Each compute block contains hundreds of interconnected ALUs, potentially totaling tens of thousands across the entire chip., according to recent developments

Unlike traditional architectures where compute units remain idle much of the time, NextSilicon’s Intelligent Computing Architecture (ICA) enables what the company calls “mill cores”—configurable logic blocks that can be dynamically mapped to application requirements. These mill cores can support hundreds of threads simultaneously, significantly outperforming traditional CPUs (typically 2 threads) and GPUs (32-64 threads).

The Software Magic Behind the Hardware

What makes NextSilicon’s approach truly revolutionary is how software interacts with hardware. Existing C, C++, or Fortran code can be compiled for the dataflow engine, where it’s literally mapped onto the ALU fabric. The compiler automatically optimizes how applications run across NextSilicon’s host processor, the embedded RISC-V cores, and the extensive ALU arrays., according to market trends

Elad Raz, NextSilicon’s CEO, emphasizes that “mill cores can be loaded and deleted as needed in a matter of nanoseconds” according to workload demands. This dynamic reconfiguration enables unprecedented utilization rates, potentially reaching 75-80% across thousands of simultaneous threads—a dramatic improvement over traditional architectures., as previous analysis

Performance and Efficiency Implications

The architectural advantages translate into tangible benefits for HPC applications:

Reduced overhead: Minimal silicon dedicated to control logic and data management
Higher utilization: Compute units remain actively engaged rather than waiting for data
Energy efficiency: Less power wasted on speculative execution and branch prediction
Simplified cooling: Reduced thermal management requirements

Tayari notes that while GPUs improve on CPU efficiency by dedicating up to 30% of silicon to computation, their mutually exclusive compute blocks limit simultaneous operation. NextSilicon’s approach eliminates these limitations through its reconfigurable dataflow architecture.

The Future of High-Performance Computing

NextSilicon’s timing couldn’t be more strategic. As traditional scaling approaches face diminishing returns, the computing industry desperately needs architectural innovation. The company’s HPC-first focus addresses the specific needs of scientific computing, engineering simulation, and research institutions that continue to rely heavily on 64-bit floating-point precision.

While AI applications can run on the architecture, NextSilicon’s primary mission remains serving the traditional HPC market. As Raz and Tayari have demonstrated, sometimes the most revolutionary advances come not from incremental improvements, but from fundamentally rethinking how computation should work from the ground up.

The computing industry will be watching closely as Sandia National Laboratory and other early adopters begin deploying Maverick-2 systems in production environments. If NextSilicon’s architectural bets pay off, we may be witnessing the beginning of a new era in computational efficiency and performance.