تابعنا على
SAP Executive Warns Enterprise AI Governance Is Critical for Profit Margins

AI Updates

Google Cloud and NVIDIA Unveil Infrastructure to Cut AI Inference Costs by Tenfold

Google Cloud and NVIDIA Unveil Infrastructure to Cut AI Inference Costs by Tenfold

At the Google Cloud Next conference, Google and NVIDIA outlined a hardware roadmap designed to reduce the cost of AI inference at scale. The companies introduced new A5X bare-metal instances, which run on NVIDIA Vera Rubin NVL72 rack-scale systems. Through hardware and software codesign, the architecture aims to deliver up to ten times lower inference cost per token compared to previous generations, while achieving ten times higher token throughput per megawatt.

Connecting thousands of processors requires massive bandwidth to prevent processing delays. The A5X instances address this by pairing NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology. This configuration scales to 80,000 NVIDIA Rubin GPUs within a single site cluster, and up to 960,000 GPUs across a multisite deployment.

Operating at this scale requires sophisticated workload management. Routing data across nearly a million parallel processors demands exact synchronization to avoid idle compute time.

Data Governance and Security for Regulated Industries

Beyond raw processing capabilities, data governance remains a primary issue for enterprise deployments. Highly regulated sectors, including finance and healthcare, often stall machine learning initiatives due to data sovereignty requirements and the risks of exposing proprietary information.

To address these compliance mandates, Google Gemini models running on NVIDIA Blackwell and Blackwell Ultra GPUs are entering preview on Google Distributed Cloud. This deployment method allows organizations to retain frontier models entirely within their controlled environments, alongside their most sensitive data stores.

The architecture incorporates NVIDIA Confidential Computing. This hardware-level security protocol ensures that training models operate within a protected environment where prompts and fine-tuning data remain encrypted. The encryption prevents unauthorized parties, including the cloud infrastructure operators themselves, from viewing or altering the underlying data.

For multi-tenant public cloud environments, a preview of Confidential G4 VMs equipped with NVIDIA RTX PRO 6000 Blackwell GPUs introduces these same cryptographic protections. This release represents the first cloud-based confidential computing offering for NVIDIA Blackwell GPUs, giving regulated industries access to high-performance hardware without violating data privacy standards.

Reducing Operational Overhead in Agentic AI Training

Building multi-step agentic systems requires connecting large language models to complex application programming interfaces, maintaining continuous vector database synchronization, and actively mitigating algorithmic hallucinations during execution. To streamline this engineering requirement, NVIDIA Nemotron 3 Super is now available on the Gemini Enterprise Agent Platform. The platform provides developers with tools to customize and deploy reasoning and multimodal models designed for agentic tasks.

Training these models at scale introduces heavy operational overhead, particularly when managing cluster sizing and hardware failures during long reinforcement learning cycles. Google Cloud and NVIDIA introduced Managed Training Clusters on the Gemini Enterprise Agent Platform, which includes a managed reinforcement learning API built with NVIDIA NeMo RL. This system automates cluster sizing, failure recovery, and job execution, allowing data science teams to concentrate on model quality rather than low-level infrastructure management.

CrowdStrike actively utilizes NVIDIA NeMo open libraries, including NeMo Data Designer and NeMo Megatron Bridge, to generate synthetic data and fine-tune models for domain-specific cybersecurity applications. Operating these models on Managed Training Clusters with Blackwell GPUs accelerates their automated threat detection and response capabilities.

Legacy Architecture Integration and Physical Simulations

The integration of machine learning into heavy industry and manufacturing presents a different class of engineering challenges. Connecting digital models to physical factory floors requires exact physical simulations, massive compute power, and standardization across legacy data formats.

NVIDIA’s AI infrastructure and physical AI libraries are now available on Google Cloud, providing the foundation for organizations to simulate and automate real-world manufacturing workflows. Major industrial software providers, such as Cadence and Siemens, have made their solutions available on Google Cloud, accelerated by NVIDIA infrastructure. These tools power the engineering and manufacturing of heavy machinery, aerospace platforms, and autonomous vehicles.

Manufacturing firms often run on decades-old product lifecycle management systems, making the translation of geometry and physics data difficult. By utilizing NVIDIA Omniverse libraries and the open-source NVIDIA Isaac Sim framework via the Google Cloud Marketplace, developers can bypass some of these translation issues to construct physically accurate digital twins and train robotics simulation pipelines prior to physical deployment.

Google Cloud and NVIDIA have not announced specific availability dates for all preview services. Further details on pricing and regional rollouts are expected in upcoming quarters as the infrastructure moves toward general availability.

Click to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Articles in AI Updates