At Google Cloud Next, the search giant’s annual cloud computing conference, generative AI took center stage. From Sundar Pichai’s and Thomas Kurian’s keynotes to various breakout and partner-led sessions, the conference was squarely focused on generative AI.
With the investments made in generative AI, Google is confident it will become the key differentiating factor, putting its cloud platform ahead of the competition.
Accelerated Computing for Generative AI
The foundation models and the large language models demand high-end accelerators for training, fine-tuning, and inference. Google’s innovations in Tensor Processing Units (TPUs) are helping the company with the accelerated computing required for generative AI.
At Cloud Next, Google announced the latest generation of its TPU, Cloud TPU v5e. It has a smaller 256-chip footprint per Pod, which is optimized for the state-of-the-art neural network architecture based on the transformer architecture.
Compared to Cloud TPU v4, the new Google Cloud TPU v5e has up to 2x higher training performance per dollar and up to 2.5x higher inference performance per dollar for LLMs and generative AI models.
Google is introducing Multislice technology in preview to make it easier to scale up training jobs, allowing users to quickly scale AI models beyond the boundaries of physical TPU pods—up to tens of thousands of Cloud TPU v5e or TPU v4 chips. Until now, TPU training jobs were limited to a single slice of TPU chips, with the most extensive jobs having a maximum slice size of 3,072 chips for TPU v4. Developers can use Multislice to scale workloads up to tens of thousands of chips over an inter-chip interconnect (ICI) within a single pod or across multiple pods over a data center network (DCN).
Google has leveraged Multislice technology to train its large language model, PaLM 2. It’s now available to customers to train their custom models.
While TPUs reduce the reliance on NVIDIA GPUs, Google Cloud also supports the latest H100 GPUs with the availability of A3 VMs.
An Extensive Choice of Foundation Models
Google Cloud’s key differentiating factor is the choice of foundation models it offers to its customers. Backed by cutting-edge research from Google DeepMind, Google Cloud delivers foundation models such as PaLM, Imagen, Codey, and Chirp. These are the same models that power some of the core products of Google, including Search and Translate.
Having its own foundation models enables Google to iterate faster based on usage patterns and customer feedback. Since the announcement of PaLM2 at Google I/O in April 2023, the company has enhanced the foundation model to support 32,000 token context windows and 38 new languages. Similarly, Codey, the foundation model for code completion, offers up to a 25% quality improvement in major supported languages for code generation and code chat.
The primary benefit of owning the foundation model is the ability to customize it for specific industries and use cases. Google builds upon PaLM 2 investments to deliver Med-PaLM 2 and Sec-PaLM 2, the large language models fine-tuned for medical and security domains.
Besides the home-grown foundation models, Google Cloud’s Vertex AI Model Garden hosts some of the most popular open source models, such as Meta’s Llama2, Code Llama, TII’s Falcon, and others.
Google Cloud will also support third-party models such as Anthropic’s Claude2, Databricks’s Dolly V2, and Palmyra-Med-20b from Writer.
Google has the broadest spectrum of foundation models available to its customers. They can choose from the best-of-breed, state-of-the-art models offered by Google, its partners, or the open source community.
Generative AI Platform for Researchers and Practitioners
AI researchers experimenting with foundation models to pre-train and fine-tune can use Google Cloud’s Vertex AI. At the same time, Vertex AI appeals to developers not familiar with the inner workings of generative AI.
By bringing together Colab Enterprise and Vertex AI, Google enables researchers to create highly customized runtime environments to run Notebooks in a collaborative mode. This brings the best of both worlds – collaboration and customization. The Colab notebooks are launched within Compute Engine VMs with custom configurations. This enables enterprises to choose an appropriate GPU for running experiments.
Data scientists can use Colab Enterprise to accelerate AI workflows. It gives them access to all of the features of the Vertex AI platform, including integration with BigQuery for direct data access and even code completion and generation.
The Generative AI Studio enables developers to quickly prototype applications that consume the foundation models without learning the nuts and bolts. From building simple chatbots to prompt engineering to fine-tuning models with custom datasets, Generative AI Studio reduces the learning curve for infusing GenAI into applications.
Vertex AI now comes with a dedicated vector database in the form of a Matching Engine service, which can be used for storing text embeddings and performing similarity searches. This service becomes an integral part of building LLM-powered applications that need contextual data access to deliver accurate responses.
Vertex AI has a clean and straightforward interface and user experience aligned with the personas of a researcher, developer, or practitioner.
Building Search and Conversation AI Apps with No-Code
If Vertex AI is meant for technology professionals familiar with the MLOps workflow of training, serving, and fine-tuning foundation models, Google Cloud has also invested in no-code tools that put the power of large language models in the hands of developers.
Vertex AI Search and Conversation, formerly known as Gen App Builder, enables developers to bring Google-style search and chatbot capabilities based on various structured and unstructured data sources.
Vertex AI Search enables organizations to build Google Search-quality, multimodal, multi-turn search applications powered by foundation models, including the ability to ground outputs in enterprise data alone or use enterprise data to supplement the foundation model’s initial training. It will soon have enterprise access controls to ensure that only the right people can see the information. It will also have features like citations, relevance scores, and summarization to help people trust results and make them more useful.
Vertex AI Conversation enables the development of natural-sounding, human-like chatbots and voicebots using foundation models that support audio and text. Developers can use it to quickly create a chatbot based on a website or collection of documents. Vertex AI lets developers combine deterministic workflows with generative outputs for more customization. They can do this by combining rules-based processes with dynamic AI to make apps that are fun and reliable. For example, users can tell AI agents to book appointments or make purchases.
Google has also announced Vertex AI extensions, which can retrieve information in real-time and act on behalf of users across Google and third-party applications like Datastax, MongoDB, and Redis, as well as Vertex AI data connectors. This capability helps ingest data from enterprise and third-party applications like Salesforce, Confluence, and JIRA, connecting generative applications to commonly used enterprise systems.
One of the smartest moves from Google is the integration of Dialogflow with LLMs. By pointing an agent to the source, such as a website or a collection of documents, developers can quickly generate chatbot code that can be easily embedded into a web application.
Exploit Generative AI Investments to Deliver Duet AI
Google’s AI assistant technology, branded as Duet AI, is firmly grounded in one of its foundation models – PaLM 2. The company is integrating the AI assistant with various cloud services, including Google Cloud and Workspace.
Duet AI is available for cloud developers in services including Google Cloud Console, Cloud Workstations, Cloud Code IDE and Cloud Shell Editor. It is also available in third-party IDEs like VSCode and JetBrains IDEs like CLion, GoLand, IntelliJ, PyCharm, Rider and WebStorm.
Using Duet AI in Google Cloud integration services such as Apigee API Management and Application Integration, developers can design, create, and publish APIs using simple natural language prompts.
Google Cloud is one of the first hyperscalers to bring AI assistants to CloudOps and DevOps professionals. Duet AI can help operators automate deployments, ensure applications are configured correctly, quickly understand and debug issues, and create more secure and reliable applications.
Natural language prompts in Cloud Monitoring can be translated into PromQL queries to analyze time-series metrics such as CPU usage over time. Duet AI can also provide intuitive explanations of complex log entries in Logs Explorer for easier root-cause analysis and suggestions for resolving issues raised by Error Reporting. This is especially useful in performing root cause analysis and post-mortem of incidents.
Google didn’t limit Duet AI to developers and operators. It has extended it to databases, including Cloud Spanner, BigQuery, and AlloyDB. DB professionals can even migrate legacy databases to Google CloudSQL with the help of Duet AI, which assists in mapping the schema, syntax, and semantics of stored procedures and triggers.
For DevSecOps, Google has integrated Duet AI with security-related services, including Chronicle Security Operations, Mandiant Threat Intelligence and Security Command Center. Duet AI can quickly summarize and categorize information about threats, turn natural language searches into queries, and suggest next steps to fix problems. This can reduce the time it takes to find and fix problems and make security professionals more productive.
Read the full article here