The promise of AI in the enterprise is huge—as in, unprecedentedly huge. The speed at which a company can get from concept to value with AI is unmatched. This is why, despite its perceived costs and complexity, AI and especially generative AI are a top priority for virtually every organization. It’s also why the market has witnessed AI companies emerge from everywhere in an attempt to deliver easy AI solutions that can meet the needs of businesses, both large and small, in their efforts to fully maximize AI’s potential.
In this spirit of operationalizing AI, tech giant Nvidia has focused on delivering an end-to-end experience by addressing this potential along with the vectors of cost, complexity and time to implementation. For obvious reasons, Nvidia is thought of as a semiconductor company, but in this context it’s important to understand that its dominant position in AI also relies on its deep expertise in the software needed to implement AI. This is why Nvidia NeMo is the company’s response to these challenges; it’s a platform that enables developers to quickly bring data and large language models together and into the enterprise.
As part of enabling the AI ecosystem, Nvidia has just announced a partnership with Mistral AI, a popular LLM provider, to introduce the Mistral NeMo language model. What is this partnership, and how does it benefit enterprise IT? I’ll unpack these questions and more in this article.
Digging Deeper On Mistral NeMo Technical Details
As part of the Nvidia-Mistral partnership, the companies worked together to train and deliver Mistral NeMo, a 12-billion-parameter language model in an FP-8 data format for accuracy, performance and portability. This low-precision format is extremely useful in that it enables Mistral NeMo to fit into the memory of an Nvidia GPU. Further, this FP-8 format is critical to using the Mistral NeMo language model across various use cases in the enterprise.
Mistral NeMo features a 128,000-token context length, which enables a greater level of coherency, contextualization and accuracy. Consider a chatbot that provides online service. The 128,000-token length enables a longer, more complete interaction between customer and company. Or imagine an in-house security application that manages access to application data based on a user’s privileged access control. Mistral NeMo’s context length enables the complete dataset to be displayed in an automated and complete fashion.
The 12-billion-parameter size is worth noting as it speaks to something critical to many IT organizations: data locality. While enterprise organizations require the power of AI and GenAI to drive business operations, several considerations including cost, performance, risk and regulatory constraints prevent them from doing this on the cloud. These considerations are why most enterprise data sits on-premises even decades after the cloud has been embraced.
Many organizations prefer a deployment scenario that involves training a model with company data and then inferencing across the enterprise. Mistral NeMo’s size enables this without substantial infrastructure costs (a 12-billion-parameter model can run efficiently on a laptop). Combined with its FP-8 format, this model size enables Mistral NeMo to run anywhere in the enterprise—from an access control point to along the edge. I believe this portability and scalability will make the model quite attractive to many organizations.
Mistral NeMo was trained on the Nvidia DGX Cloud AI platform, utilizing Megatron-LM running 3,072 of Nvidia’s H100 80GB Tensor Core GPUs. Megatron-LM, part of the NeMo platform, is an advanced model parallelism technique designed for scaling large language models. It effectively reduces training times by splitting computations across GPUs. In addition to speeding up training times, Megatron-LM trains models for performance, accuracy and scalability. This is important when considering the broad use of this LLM within an organization in terms of function, language and deployment model.
It’s All About The Inference
When it comes to AI, the real value is realized in inferencing—in other words, where AI is operationalized in the business. This could be through a chatbot that can seamlessly and accurately support customers from around the globe in real time. Or it could be through a security mechanism that understands a healthcare worker’s privileged access level and allows them to see only the patient data that is relevant to their function.
In response, Mistral NeMo has been curated to deliver enterprise readiness completely, more easily and more quickly. The Mistral and Nvidia teams utilized Nvidia TensorRT-LLM to optimize Mistral NeMo for real-time inferencing and thus ensure the absolute best performance.
While it may seem obvious, the collaborative focus on ensuring the best, most scalable performance across any deployment scenario speaks to the understanding both companies seem to have around enterprise deployments. Meaning, it is understood that Mistral NeMo will be deployed across servers, workstations, edge devices and even client devices to leverage AI fully. In any AI deployment like this, models tuned with company data have to meet stringent requirements around scalable performance. And this is precisely what Mistral NeMo does. In line with this, Mistral NeMo is packaged as an Nvidia NIM inference microservice, which makes it straightforward to deploy AI models on any Nvidia-accelerated computing platform.
The Real Enterprise Value
I started this analysis by noting the enterprise AI challenges of cost and complexity. Security is also an ever-present challenge for enterprises, and AI can create another attack vector that organizations must defend. With these noted, I see some obvious benefits that Mistral NeMo and NeMo as a framework can deliver for organizations.
- Operational Agility — With the Nvidia NeMo development platform, enterprises can quickly and easily build and customize AI models that drive efficiency. Whether improving internal processes through intelligent automation or developing new AI-driven products and services, NeMo can be the tool that makes AI real.
- Operational Efficiency — Mistral NeMo’s high accuracy and performance can maximize the efficiency of enterprise applications. For example, customer service chatbots powered by Mistral NeMo can handle complex, multi-turn conversations, providing precise and contextually relevant responses. This reduces the need for human intervention, streamlining customer support workflows and improving response times.
- Multilingual Capabilities — One of Mistral’s standout features is its depth of multilingual support. This support is critical in a world where the smallest of organizations can have customers from around the globe. In what seems like a recurring theme, Mistral NeMo enables organizations to achieve this level of support quickly, easily and cost-effectively.
- Security and Compliance — The most valuable data is often the most sensitive. Many enterprises operate under strict security and compliance regulations. Mistral NeMo, deployed via the Nvidia AI Enterprise framework, ensures enterprise-grade security and support. This includes dedicated feature branches, rigorous validation processes and comprehensive service-level agreement support.
- Cost-Effective Scalability — The ability to run Mistral NeMo on cost-effective hardware like the Nvidia L40S, Nvidia GeForce RTX 4090 or RTX 4500 GPUs makes it accessible to organizations of all sizes.
Closing Thoughts
As an ex-IT executive, I understand the challenge of adopting new technologies or aligning with technology trends. It is costly and complex and usually exposes a skills gap within an organization. As an analyst who speaks with many former colleagues and clients on a daily basis, I believe that AI is perhaps the biggest technology challenge enterprise IT organizations have ever faced.
Nvidia continues to build its AI support with partnerships like the one with Mistral by making AI frictionless for any organization, whether it’s a large government agency or a tiny start-up looking to create differentiated solutions. This is demonstrated by what the company has done in terms of enabling the AI ecosystem, from hardware to tools to frameworks to software.
The collaboration between Nvidia and Mistral AI is significant. Mistral NeMo can become a critical element of an enterprise’s AI strategy because of its scalability, cost and ease of integration into the enterprise workflows and applications that are critical for transformation.
While I expect this partnership to deliver real value to organizations of all sizes, I’ll especially keep an eye on the adoption of Mistral NeMo across the small-enterprise market segment, where I believe the AI opportunity and challenge is perhaps the greatest.
Read the full article here