Key Takeaways from NVIDIA GTC 2024

Kent Noyes, Senior Director of AI Security

NVIDIA GTC was an energized, exciting event. I was happy to see that it provided a healthy number of cybersecurity sessions, and I was privileged to host one myself. One of the best presentations I attended was delivered by David Reber, the Chief Security Officer at NVIDIA. He did a wonderful job of describing the data challenges and human limitations we have in security, and the need to apply AI to make that space more effective. He also dove into cyber obstacles in building generative AI systems for general use. For example, he highlighted data access control challenges in which we now need to incorporate "access control on sentences," as opposed to documents. We are all on a journey toward this advanced form of Context-Based Access Control, but it is an evolving method and we're all learning together how to scale it.

This and other cyber presentations that I attended validated that, while a significant portion of securing AI systems is still traditional security, there now exists a number of modern, niche solution areas that will need to be utilized to reduce risk and achieve compliance in AI. NVIDIA Confidential Computing, LLM firewalls, attestation methods, prompt injection detection, generative AI usage detection, and LLM vulnerability scanners all fall into this category.

One concept stands firm: as any new attack surface evolves, as we see with the expansion of AI, governance must be applied as a foundation. Teamed with corresponding policies and standards, governance allows for an organized and consistent approach to AI security enterprise democratization.

Kent Noyes GTC Expo session Secure Generative AI: Building Trustworthy Systems — Kent Noyes GTC Expo session **Secure Generative AI: Building Trustworthy Systems**

Jeff Fonke, Senior Practice Manager High-Performance Architecture

GTC 2024 was a marathon and a sprint all in four short days. The excitement was in the air throughout the whole week. On Monday, Jensen's fantastic keynote included announcements about the Blackwell AI processor, the B200 chip, the GB200 Superchip and the NIM (NVIDIA Inference Microservices) deployment model.

At WWT, we plan to take full advantage of the NIM model and leverage the AI Proving Ground to be a place where clients can check out how these NIM deployments work.

The partner forum illustrated the power of the NVIDIA ecosystem. Best-of-breed technology mega giants were among the crowd on the showroom floor including long-time tech giants to small AI startups.

We had a great turnout at our booth where we showcased our NVIDIA Nemo RAG Demo. A proud sponsor and major award winner (NVIDIA's AI Enterprise Partner of the Year), WWT represented very well on the showroom floor.

WWT's experts providing a demo at the booth

Yoni Malchi, Managing Director, AI and Data

Fantastic GTC this year, as usual. Great to be back in person seeing some unreal presentations, meeting great people and learning what's next! Some of my takeaways:

1. Companies are continuing to push AI out of the lab and into production-it's getting there but there's still a lot of work to do.

2. A huge space for innovation right now is building a high-fidelity retrieval-augmented generation (RAG) system to include proprietary data. Don't be fooled by how easy it is to spin up a POC (again, back to the point above on pushing AI into production).

3. The bleeding edge is moving from text generation to agents that can reason and take action. We are researching this right now at WWT and it shows a ton of potential.

Derrick Monahan, Technical Solutions Architect, High Performance Architecture, AI and Data

This was my first NVIDIA GTC. Having attended many large industry conferences in the past, it exceeded my expectations. Excitement throughout the exhibitor hall was apparent. Jensen Huang's keynote address started with "I hope you realize this is not a concert." However, GTC 2024 felt like a concert for four days straight!

Four key highlights from GTC 2024:

New and enhanced software approaches

Generative AI with NIM

NVIDIA NIM (NVIDIA inference microservices) is an innovative approach to the delivery and packaging of AI-powered applications.

Key example: In Healthcare AI, NVIDIA launched more than two dozen new gen AI-powered microservices to enable healthcare and life sciences to leverage the latest advances in GenAI across the areas of drug discovery, med tech and digital health.

The new suite of NVIDIA healthcare microservices includes optimized NVIDIA NIM AI models and workflows with industry-standard application programming interfaces, functioning as building blocks for the development and deployment of cloud-native applications.

Industry Adoption and Strategic Collaborations

I spent some time in the exhibit hall with AWS exploring how NIM is being leveraged by cloud service providers. AWS and NVIDIA have collaborated on a healthcare-specific project to expand computer-aided drug discovery through the use of new NVIDIA BioNeMo™ FMs for generative chemistry, protein structure prediction, and insights into how drug molecules interact with targets. The AWS HealthOmics service, which assists healthcare and life sciences organizations in storing, querying, and analyzing genomic, transcriptomic, and other omics data, will soon include these new models.

In addition, the teams at AWS HealthOmics and NVIDIA Healthcare are launching generative AI microservices to further digital health, medtech, and drug discovery. They're doing this by providing a new catalog of GPU-accelerated cloud endpoints for biology, chemistry, imaging, and healthcare data so that healthcare organizations can benefit from the most recent developments in generative AI on AWS.

AWS + NVIDIA: Amazon SageMaker integration with NVIDIA NIM inference microservices helps customers further optimize the price performance of foundation models running on GPUs

Comprehensive Ecosystem: Boosting AI and Data Processing

NVIDIA also announced software enhancements across the Blackwell architecture, new switches, and BlueField-3 SuperNICs designed to expedite AI, data processing, cloud, and HPC (High-Performance Computing) tasks.

Accelerated Computing and Blackwell architecture

NVIDIA is aiming for a 4x gain in training performance and an even more significant 30x increase in inference performance when compared to the H100 at the cluster level, all while achieving 25x improved energy efficiency. Impressive!

Confidential Computing: Moreover, Blackwell introduces capabilities for confidential computing that ensure secure data processing on GPUs in trusted execution environments. More on this below.

Networking Capabilities and Platforms advancements:

The integration of Grace Blackwell Superchip (GB200) with Quantum-X800 InfiniBand and Spectrum-X800 Ethernet platforms underscores NVIDIA's strategy in creating comprehensive ecosystems. In the case of X800, NVIDIA enables networking speeds up to 800Gb/s. These new platforms are poised to revolutionize how data centers handle complex AI workloads, scientific simulations, and massive data-driven operations.

NVIDIA Quantum-X800 InfiniBand

Focuses on highest-performance AI-dedicated infrastructure with end-to-end 800Gb/s throughput. Optimized for massive-scale AI workloads.

NVIDIA Spectrum-X800 Ethernet

Focus on optimizing AI performance and processes within data centers. This improves network performance, enabling faster AI workload processing and execution, especially in relation to the versatility of ethernet in supporting diverse sets of application requirements.

NVIDIA NVLink (fifth gen)

Interconnect bandwidth is a critical ingredient for accelerator performance from a hardware perspective.

NVIDIA has expanded its bandwidth and scalability with Blackwell by introducing its new NVLink technology. At 1800GB/second per GPU, this is the biggest jump in NVLink bandwidth in the last several years.

The key enhancements with NVLink are actually coming from the higher signaling rate of 200Gbps. So think NVLink 5.0: 200 Gbit/s (signaling rate) * 2 lanes/link = 400 Gbit/s (or 50 GB/s). This means 100 GB/s total bandwidth. With 18 links, NVIDIA achieves 1800 GB/sec bandwidth per chip!

Benefits of In-Network Computing:

The leap to in-network computing capabilities offered by the Quantum-X800 allows more processing tasks to be done within the network itself, reducing latency and improving performance. This also facilitates the development of more complex AI models and the processing of expansive data volumes by ensuring efficient, high-speed communication between multiple computing nodes.

AI security: Context Centric Security and Confidential Computing

We talk about secure, responsible and trustworthy AI at WWT on a daily basis. An area of the conference that didn't get as much attention, yet I believe will become more important to our customers' security strategy includes the concept of Context Centric security. In addition, NVIDIA announced new Confidential Computing features and Hardware Attestation to address security gaps that exist in data protection.

During GTC, NVIDIA mentioned a potential for half a trillion dollars of cyber crime expected in CY2024. Here are some of the ways NVIDIA proposes to address these threats.

The Era of Context Centric Security

We are seeing a pivot in the industry. LLMs are now driving context centric security. But the industry as a whole has not figured this out. What I learned at GTC, as the security paradigm evolves, NVIDIA is taking steps in helping to create new strategies that address this shift.

The Context of Data, context of the User, and context of the use of information at any given time. As a result, it gets complicated quickly because data changes, it's synthesized, it's summarized, it's different from the form where controls were originally applied.

Many organizations are still in an RBAC world, but with GenAI how do you know what your intended use is? How do you put more variables in access control decisions? The problem as NVIDIA calls out: The systems must do this, have all information available, and do it at runtime. NVIDIA is helping customers mitigate these new risks by looking at the problem across 5-tiers based upon a new secure foundation.

Confidential Computing

NVIDIA announced the support of new Confidential Computing features and new platforms including HGX protected PCIe Architecture. Confidential Computing enables a holistic data security model trying to solve for data that is IN USE so it can be protected and can be used without compromised.

Threats addressed by Confidential Computing:

Data and Code Confidentiality
Data and Code Integrity
Physical attacks with everyday tools (interposers on buses such as PCIe and DDR memory cannot leak data or code

Confidential Computing is important because it secures your data while it's being processed. But equally important is enabling the users to verify the state of the environment if and when they want to run a secure workload - this is where Attestation comes in.

Attestation

NVIDIA also announced general availability of a full suite of attestation software. Attestation ensures the integrity and trustworthy of an environment. Healthcare organizations would not want to transfer data to a system without validating the state of the system - they are dealing with sensitive data. Example: declining to transfer EHRs to an unverified system.

Financial organizations who want to run an ML algorithm on a GPU want to validate the security state of the GPU before they run the workloads. Example: refusing to execute high-frequency trading algorithms on non-trustworthy GPUs.

Phillip Hendrickson, Technical Solutions Architect

Jensen Huang views AI as an industrial revolution. In the last industrial revolution, the discovery of electricity and the ability to reliably generate it transformed the world. Today, AI generates not electricity but content – text, video, audio, etc. – and will similarly transform the world, in his opinion.

NVIDIA is a company interested in accelerated computing, not GPUs. They make GPUs, sure, but they're about much more than just GPUs. Today, the picture that comes to Jensen's mind when he thinks about a "GPU" is a rack-scale system with many GPUs that look like a single large GPU, many CPUs, and a large, unified pool of memory.

NVIDIA understands the power and importance of ecosystem. Their practice is to understand how a particular domain works (healthcare, drug discovery, manufacturing, automotive, etc.) by building stuff specific to that domain. They then open it up and make it available to everyone. The importance of domain-specific foundation models & software was also very apparent in the conference sessions, which ranged from financial services to medicine to robotics to earthquake modeling.

Another thing NVIDIA understands is the importance of lowering barriers to entry. While they've supported and encouraged containerized workloads for a number of years, they're leveraging that support to create a repository of domain-specific NIMs (NVIDIA Inference Microservices), containers that contain a full-stack suite of software + model + APIs optimized for NVIDIA hardware. Employees across the company frequently and regularly use NIMs in their own work.

A consistent and pervasive theme running throughout the conference was the importance of systems-level thinking. The company understands the importance of matching compute, memory, networking, and storage performance. They also understand the importance of designing their hardware to match the intended algorithms/workloads. The importance of systems-level thinking also showed up in sessions related to workload design.

My quote from the conference is "The easy: building a RAG application. The complex: system design for a RAG application." To effectively leverage AI, people and organizations need to similarly think at the systems level, matching infrastructure to models to use cases.

It was clear from the conference that the AI landscape is rapidly evolving. New foundation models are being regularly released and the authors of the original transformer model paper generally expressed hope that neural networks would both get more efficient and evolve past transformers. Jensen also expressed a desire to move past tokens as the currency models generate.

It's unclear how the model landscape is going to evolve. With the Blackwell platform, NVIDIA is preparing to support larger and larger models, and some have expressed the opinion that models will improve by getting larger while training data gets relatively smaller. However, others have suggested that models perform better when they become smaller but are trained on relatively more data. Multi-modal models will probably drive the trend to larger models before efficiencies that make them smaller are discovered/developed.

Adding a couple of things about Blackwell:

The platform will initially be available only to CSPs (Azure, AWS, OCI, GCP). It'll likely be a couple of years before hardware becomes available for on-prem deployments.
The platform will come in a form factor that lets it be swapped with current DGX/HGX H100 hardware. However, it will also come in a form factor with built-in water cooling as the only option for heat dissipation. Everyone should be thinking about and doing capacity planning now for data center power and cooling if they're not already.

Bobby Baker, Business Development Manager

GTC was one of the largest and busiest conferences I have ever attended and was engaging from start to finish. With excellent booth placement, an incredible selection of sessions and ample customer and partner meetups formed a well-rounded conference for WWT. Jensen Huang's keynote was remarkable as usual, however, the technical depth and dry humor combined to entertain the audience. Few, if any can match Jensen's presence during the keynote. WWT's utilization of the offsite restaurant proved to be an ideal setting for several very productive meetings.

Additional highlights include:

Healthcare Keynote by Kimberly Powell – AI Powered solutions including an AI Healthcare Agent demo was motivating and impressive on the breath of problems NVIDIA and partners are addressing with AI Solutions.

Digital Twin Roundtable — Led by an NVIDIA moderator, several planned updates were discussed. Additionally, four use cases were discussed by federal contractors and agencies.

Each use case presented was pleasantly very similar to the Unmanned Undersea Vehicle Omniverse Digital Twin demo Jason Craig created and was displayed in the WWT booth. There is significant digital twin activity in the federal segment as evidenced by the 35+ attendees for this private session.

Mezcal, a San Jose restaurant where WWT hosted meetings and networking events