IBM just announced a new collection of AI models, its third generation of Granite LLMs. The cornerstone models of the new collection are the Granite 3.0 2B Instruct and the Granite 3.0 8B Instruct models (Instruct means that these models can more accurately understand and execute instructions). The models were trained on over 12 trillion tokens across 12 different human languages and in 116 different programming languages. These models all come with an Apache 2.0 open source license. It is also important to note that IBM Granite models are indemnified against legal problems with training data when used in the IBM watsonx AI platform.
Enterprise Uses For Smaller Granite Models
IBM designed the new 2B and 8B Granite models to handle a wide range of common enterprise tasks. Think of these models as go-to tools for everyday language jobs such as summarizing articles, finding important information, writing code and creating explainer documents. The models also work well on common language tasks such as entity extraction and retrieval-augmented generation that improves accuracy of the text. According to IBM, by the end of 2024 Granite 3.0 models will be capable of understanding documents, interpreting charts, and answering questions about a GUI or product screen.
AI agents are rapidly becoming more important, and creating agentic use cases is a new capability for Granite 3.0 that was not previously available in IBM language models. Agentic use cases can proactively identify needs, use tools and initiate actions within predefined parameters without human intervention. Typical agentic use cases are virtual assistants, customer service, decision support and recommendations and a variety of other complex tasks.
AI speculative decoders are also a new IBM offering. Decoders optimize an LLM’s generated text by making guesses about the identification of future tokens. IBM’s speculative decoder called Granite 3.0 8B Accelerator can speed up text generation by as much as 2x during inference.
Granite 3.0 models will have another update in a few weeks. IBM will increase their context size from 4,000 to 128,000 tokens, which is a key enabler for longer conversations as well as the RAG tasks and agentic use cases mentioned above. By the end of the year, IBM plans to add vision input to the models, which will increase their versatility and allow their use in more applications.
Benchmarks For Performance And Cybersecurity
The LLM leaderboard on Hugging Face evaluates and ranks open-source LLMs and chatbots according to benchmark performance. The chart above shows how the IBM Granite 3.0 8B Instruct model compared to Llama 3.1 8B Instruct and Mistral 7B Instruct. The Granite 3.0 2B Instruct model performs similarly well in comparison to other top models.
The IBM Research cybersecurity team helped identify high-quality data sources that were used to train the new Granite 3.0 models. IBM Research also helped develop the public and proprietary benchmarks needed to measure the model’s cybersecurity performance. As shown in the chart, the IBM Granite 3.0 8B Instruct model was the top performer in all three cybersecurity benchmarks against the same Llama and Mistral models mentioned above.
Future Granite Mixture-Of-Experts Models
At some time in the future, IBM plans to release several smaller and more efficient models, including Granite 3.0 1B A400M, a 1-billion-parameter model, and Granite 3.0 3B A800M, a 3-billion-parameter model. Unlike the Granite 3.0 models discussed above, the future models will not be based on the dense transformer architecture but will instead use a mixture-of-experts architecture.
The MoE architecture divides a model into several specialized expert sub-networks for more efficiency. MoE models are small and light, but still considered to be best-in-class for efficiency, with a good balance between cost and power. These models use a small fraction of their total parameters for inference. For example, the 3-billion-parameter MoE model uses only 800 million parameters during inference, and the 1-billion-parameter MoE model uses only 400 million parameters during inference. IBM developed them for applications such as edge and CPU server deployments.
In 2025, IBM is planning to scale its biggest MoE architecture models upwards from 70 billion parameters to 200 billion parameters. Initially, the models will have language, code and multilingual capabilities. Vision and audio will be added later. All of these future Granite models will also be available under Apache 2.0.
Granite Guardian Models
Along with the Granite 3.0 2B and 8B models, IBM also announced a Granite Guardian 3.0 model, which acts as a guardrail for inputs and outputs of other Granite 3.0 models. When monitoring inputs, Granite Guardian looks for jailbreaking attacks and other potentially harmful prompts. To ensure safety standards are met, Granite Guardian also monitors LLM output for bias, fairness and violence.
These models also provide hallucination detection for grounded tasks that anchor model outputs to specific data sources. In a RAG workflow, the Granite Guardian verifies if an answer is based on provided grounding context. If the answer isn’t grounded in the context, the model flags it as an exception.
By 2025, IBM plans to reduce the size of Granite Guardian models to somewhere between 1 billion and 4 billion parameters. Reducing the model size makes them more versatile and accessible. It will also allow wider deployment across various industries and applications such as edge devices, healthcare, education and finance.
Ongoing Evolution Of IBM Granite Models
IBM’s Granite 3.0 models are high-performing open source models with benchmarks to back up their performance and security. IBM plans to add new developer-friendly features to these models such as structured JSON prompts. As with previous Granite models, updates will be made on a regular basis to ensure models remain current. That means we can be on the lookout for a conveyor belt of new features as they are developed. Unlike some of the competitive open-source models with custom licenses, the Granite models’ lack of restrictions on its Apache 2.0 license makes them adaptable for a wide variety of applications.
It appears that IBM has big plans for the future of the entire Granite 3.0 collection.
Read the full article here