Today's AI/ML headlines are brought to you by ThreatPerspective

Digital Event Horizon

NVIDIA NIM Microservices Accelerate Generative AI Inference on AWS with Enhanced Security and Performance




Amazon Web Services (AWS) has announced that it is expanding its collaboration with NVIDIA by making NVIDIA NIM microservices available across key AWS AI services. This move aims to accelerate generative AI applications by providing developers with a broad catalog of NVIDIA-optimized inference for commonly used models at scale, while maintaining security and control over their generative AI applications and data. With this expansion, developers and enterprises can achieve high-performance AI with reduced latency and lower costs.



  • NVIDIA NIM microservices are now available across key AWS AI services, including Amazon Elastic Compute Cloud (EC2), Amazon Elastic Kubernetes Service (EKS), and Amazon SageMaker.
  • The expanded availability provides over 100 prebuilt containers built from commonly used models and model families, including Meta’s Llama 3 and NVIDIA's Nematron.
  • NVIDIA NIM microservices can be deployed on AWS services for self-hosting or optimized to run on NVIDIA accelerated computing instances on AWS.
  • The expansion aims to accelerate generative AI applications by providing developers with a broad catalog of NVIDIA-optimized inference for commonly used models at scale.
  • NVIDIA NIM microservices can be accessed directly from the AWS Marketplace, Amazon Bedrock Marketplace, and Amazon SageMaker JumpStart.



  • Generative AI has become a transformative force across various industries, driving demand for secure, high-performance inference solutions to scale increasingly complex models efficiently and cost-effectively. To address this growing need, Amazon Web Services (AWS) has extended its collaboration with NVIDIA by announcing that it has expanded the availability of NVIDIA NIM microservices across key AWS AI services.

    NVIDIA NIM, a part of the NVIDIA AI Enterprise software platform available in the AWS Marketplace, provides developers with a set of easy-to-use microservices designed for secure, reliable deployment of high-performance, enterprise-grade AI model inference across clouds, data centers and workstations. These prebuilt containers are built on robust inference engines, such as NVIDIA Triton Inference Server, NVIDIA TensorRT, NVIDIA TensorRT-LLM and PyTorch, and support a broad spectrum of AI models — from open-source community ones to NVIDIA AI Foundation models and custom ones.

    The expanded availability of NVIDIA NIM microservices across key AWS AI services includes Amazon Elastic Compute Cloud (EC2), Amazon Elastic Kubernetes Service (EKS) and Amazon SageMaker. Developers can now preview over 100 NIM microservices built from commonly used models and model families, including Meta’s Llama 3, Mistral AI’s Mistral and Mixtral, NVIDIA’s Nematron, Stability AI’s SDXL and many more on the NVIDIA API catalog.

    The most commonly used ones are available for self-hosting to deploy on AWS services and are optimized to run on NVIDIA accelerated computing instances on AWS. This expansion aims to accelerate generative AI applications by providing developers with a broad catalog of NVIDIA-optimized inference for commonly used models at scale, while maintaining security and control over their generative AI applications and data.

    NVIDIA NIM microservices now available directly from the AWS Marketplace, as well as Amazon Bedrock Marketplace and Amazon SageMaker JumpStart, enable developers to deploy NVIDIA-optimized inference containers on AWS according to their unique needs and requirements. By doing so, developers and enterprises can achieve high-performance AI with NVIDIA-optimized inference containers across various AWS services.

    NVIDIA NIM microservices have been integrated into several use cases such as SoftServe’s Generative AI solutions fully deployed on AWS and accelerated by NVIDIA NIM and AWS services. The solutions, available on AWS Marketplace, include SoftServe Gen AI Drug Discovery, SoftServe Gen AI Industrial Assistant, Digital Concierge, Multimodal RAG System, Content Creator and Speech Recognition Platform.

    These solutions are based on NVIDIA AI Blueprints, comprehensive reference workflows that accelerate AI application development and deployment and feature NVIDIA acceleration libraries, software development kits and NIM microservices for AI agents, digital twins and more. By leveraging NVIDIA NIM microservices, developers can achieve high-performance generative AI with reduced latency and lower costs.

    The expanded availability of NVIDIA NIM microservices on AWS further solidifies the partnership between NVIDIA and AWS, enabling a seamless integration of these powerful tools to support the growing demand for secure, high-performance inference solutions in the rapidly evolving field of generative AI.



    Related Information:

  • https://blogs.nvidia.com/blog/nim-microservices-aws-inference/


  • Published: Wed Dec 4 19:24:56 2024 by llama3.2 3B Q4_K_M











    © Digital Event Horizon . All rights reserved.

    Privacy | Terms of Use | Contact Us