VAST Data Powers the Storage for CoreWeave AI Infrastructure

The GPUs and accelerators at the foundation of our current AI-fueled moment need data to survive. Letting a thirty-thousand-dollar NVIDIA H100 sit idle while waiting for data is a tangible waste. Letting dozens, or even hundreds, of these accelerators sit idle while data makes its way from storage reduces the value of the entire system.

At the same time, the cost and complexity involved in building the machines that power generative AI have pushed the technology to the cloud. Specialty hyperscalers like CoreWeave have emerged to provide turn-key platforms for enterprises at every rung of the AI ladder.

Realizing the ultimate value and finding success requires that providers build an infrastructure tuned to the highest degree of efficiency. Systems must be balanced, with the latency of every data transfer minimized while bandwidth is maximized.

Traditional AI systems relied heavily on roll-you-own solutions based on parallel file systems such as WEKA or Lustre. This week, CoreWeave, perhaps the leading specialty cloud provider for AI systems, announced that VAST Data has been selected to provide the storage that underpins CoreWeave’s offerings.

The VAST Data Difference

VAST Data made its name delivering high-performance, highly scalable storage based on a unique “disaggregated shared everything” (DASE) architecture. The VAST Data approach has led it to great success in markets where scalability and performance are paramount, including finance, health sciences, and many windowless buildings surrounding Washington, D.C.

VAST Data’s DASE architecture is elemental to the success of its data storage platform. It is designed to offer high performance, scalability, and cost-effectiveness, targeted for high-performance workloads requiring extensive data processing, such as AI and deep learning.

The primary features and components of the VAST Data DASE architecture:

Disaggregated Architecture: DASE separates the data storage and compute resources, allowing for flexible scaling of each component independently. This disaggregation enables better resource allocation and optimization, making it suitable for modern, data-intensive workloads.
Shared Everything: DASE’s “Shared Everything” approach means that data is accessible across all the storage nodes, regardless of which node data is stored on. This shared access approach simplifies data management and access, ensuring that data can be read from any node, which is crucial for distributed and parallel processing workloads.
Global Namespace: DASE offers a global namespace that provides a single, unified view of the data across all nodes. This makes it easier for users to access and manage their data, regardless of where it’s physically stored within the storage cluster.
Scalability: The DASE architecture is highly scalable. It allows for seamless capacity and performance expansion by adding more storage nodes. As your data and workload requirements grow, customers can quickly scale your storage infrastructure to meet these demands.
Hardware and Software Integration: DASE is designed to work efficiently with enterprise-grade hardware and VAST Data’s software stack. This integration ensures optimal performance and resilience.
Data Reduction: VAST Data’s architecture includes data reduction techniques, such as data deduplication and compression, to minimize storage space usage and optimize performance.
Efficient Network Usage: DASE includes features like RDMA (Remote Direct Memory Access) to facilitate efficient network data transfers, reducing latency and improving overall system performance.

The VAST Data DASE architecture is a modern storage solution that addresses the needs of data-intensive applications and workloads. Its disaggregation, shared access, scalability, and global namespace design principles contribute to efficient, cost-effective, and high-performance storage solutions.

These same principles make it ideal for the unique demands of generative AI workloads.

Who is CoreWeave?

Renowned for its ability to provide large-scale GPU resources to meet the computational demands of various industries, including those relying heavily on AI and GPU-accelerated workloads, CoreWeave has emerged over the past several years as the leading specialized cloud provider for delivering high-performance cloud infrastructure

CoreWave offers specialized cloud solutions designed for compute-intensive use cases. This includes applications like machine learning, large language models, and other GPU-intensive tasks. Their infrastructure is tailored to deliver the performance required for such demanding workloads. And the company doesn’t seem to be challenged in obtaining the NVIDIA GPUs required to service the needs of the current generative AI community.

CoreWeave’s expertise in providing GPU-centric cloud infrastructure and empowering AI and other data-intensive technologies have positioned it as a vital player in the AI cloud industry. Their commitment to delivering scalable, high-performance GPU computing resources has made them a trusted partner for a wide range of organizations and projects.

VAST Data and CoreWeave

VAST Data and CoreWeave announced a strategic partnership where CoreWeave will use the VAST Data Platform to build a global NVIDIA-powered accelerated computing cloud capable of managing and securing vast amounts of data for generative AI, high-performance computing (HPC), and visual effects (VFX) tasks. This partnership aims to provide highly secure, scalable, and efficient solutions for accelerated compute use cases, such as machine learning, rendering, and batch processing

CoreWeave selected VAST Data after rigorous research and testing, citing VAST’s scalability, performance, and multi-tenant enterprise AI cloud capabilities. VAST Data’s architecture simplifies large-scale AI workloads, making them faster and easier to manage. CoreWeave believes that the VAST Data Platform will seamlessly scale performance and capacity to meet the significant growth in the AI space.

There are several key elements from VAST Data’s architecture that CoreWeave wants to weave into its AI infrastructure, including delivering millions of IOPS, terabytes per second of performance from cost-effective NVMe infrastructure, secure tenant and workload isolation with QoS policies, non-stop operations that facilitate online upgrades and expansions, and multi-protocol capabilities, allowing writing in one protocol and reading in another, all on a single platform. The VAST Data solution is software-defined, operating on customer-acquired hardware, with software licensing based solely on actual capacity usage. It’s an ideal marriage.

The partnership between VAST Data and CoreWeave leverages NVIDIA technology to create a new data platform architecture for large-scale data pipelines and AI workloads. VAST Data’s platform, certified for use with NVIDIA DGX SuperPOD, eliminates infrastructure silos and simplifies AI at virtually limitless levels of scale and performance. This collaboration aims to redefine modern AI computing and provide a foundation for AI-powered advancements.

Analyst’s Take

CoreWeave and VAST Data are companies amid rapid growth fueled by the need for accelerated analytics and AI. For VAST, the deal with CoreWeave is just the latest in an impressive years-long run of growth and innovation.

VAST Data began the year licensing a subset of its technology to Hewlett Packard Enterprise to use as the basis for HPE’s Alletra file storage solutions. In May, the company was certified as a datastore for NVIDIA’s DGX SuperPod, a certification requiring the demonstration of rigorous, scalable performance. Just last month, VAST Data announced that it was selected by G42 Cloud to provide the storage architecture for G42’s global network of AI supercomputers.

Not to be lost in all this momentum is the announcement earlier this year of the VAST Data Platform, VAST’s global data infrastructure offering that unifies storage, database, and virtualized compute engine services in a scalable system built from the ground up for the future of AI.

Its customers are clearly responding to VAST’s growing list of accomplishments and design wins. VAST Data, a private company, noted that in 2022 it experienced impressive financial growth, more than doubling its annual recurring revenue and growing by 2.5 times compared to the previous year.

CoreWeave didn’t choose VAST Data because of its momentum, however. It chose VAST Data because VAST has the right architecture to service the demanding requirements of generative AI workloads. CoreWeave believes that VAST Data will keep its tens of thousands of GPUs from stalling out waiting for storage. I will believe CoreWeave, as not many companies understand AI infrastructure better than it does. This is a big win for VAST Data, one that anyone building a high-performance AI infrastructure should pay attention to.

Disclosure: Steve McDowell is an industry analyst, and NAND Research an industry analyst firm, that engages in, or has engaged in, research, analysis, and advisory services with many technology companies, which may include those mentioned in this article. Mr. McDowell does not hold any equity positions with any company mention in this article.

Read the full article here