Digital Event Horizon

Llama 3.1 models are now available in Amazon SageMaker JumpStart

Today, we are excited to announce that the state-of-the-art Llama 3.1 collection of multilingual large language models (LLMs), which includes pre-trained and instruction tuned generative AI models in 8B, 70B, and 405B sizes, is available through Amazon SageMaker JumpStart to deploy for inference. Llama is a publicly accessible LLM designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative artificial intelligence (AI) ideas. In this post, we walk through how to discover and deploy Llama 3.1 models using SageMaker JumpStart. The following table lists all the Llama models available in SageMaker JumpStart along with the model_ids, default instance types, and the maximum number of total tokens (sum of number of input tokens and number of generated tokens) supported for each of these models. For increased context length, customers can modify the default instance type in the SageMaker JumpStart UI. Model Name Model ID Default instance type Supported instance types Meta-Llama-3.1-8B meta-llama-3-1-8b ml.g5.4xlarge (2,000 context length ) ml.g5.4xlarge, ml.g5.12xlarge, ml.g5.24xlarge, ml.g5.48xlarge, ml.g5.4xlarge, ml.g5.8xlarge, ml.g6.12xlarge, ml.p4d.24xlarge, ml.p5.48xlarge Meta-Llama-3.1-8B-Instruct meta-llama-3-1-8b-instruct ml.g5.4xlarge (2,000 context length ) Same as Llama-3.1-8B Meta-Llama-3.1-70B meta-llama-3-1-70b ml.p4d.24xlarge (12,000 context length on 8 A100s) ml.g5.48xlarge, ml.g6.48xlarge, ml.p4d.24xlarge, ml.p5.48xlarge Meta-Llama-3.1-70B-Instruct meta-llama-3-1-70b-instruct ml.p4d.24xlarge (12,000 context length on 8 A100s) Same as Llama-3.1-70B Meta-Llama-3.1-405B meta-llama-3-1-405b ml.p5.48xlarge 2x ml.p5.48xlarge Meta-Llama-3.1-405B-Instruct meta-llama-3-1-405b-instruct ml.p5.48xlarge 2x ml.p5.48xlarge Meta-Llama-3.1-405B-FP8 meta-llama-3-1-405b-fp8 ml.p5.48xlarge (8,000 context length on 8 H100s) ml.p5.48xlarge Meta-Llama-3.1-405B-Instruct-FP8 meta-llama-3-1-405-instruct-fp8 ml.p5.48xlarge (8,000 context length on 8 H100s) ml.p5.48xlarge Inference and example prompts for Llama-3.1-405B-Instruct You can use Llama models for text completion for any piece of text. Through text generation, you can perform a variety of tasks, such as question answering, language translation, sentiment analysis, and more. Input payload to the endpoint looks like the following code: <|begin_of_text|><|start_header_id|>[ROLE_1]<|end_header_id|> [ROLE_1_MESSAGE]<|eot_id|><|begin_of_text|><|start_header_id|>[ROLE_2]<|end_header_id|> [ROLE_2_MESSAGE]<|eot_id|><|begin_of_text|><|start_header_id|>[ROLE_3]<|end_header_id|> ... The roles should alternate between user and assistant while optionally starting with a system role. In the next example, we show how to use Llama Instruct models within a conversational context, where a multi-turn chat is occurring between a user and an assistant. The first few rounds of the conversation are provided as input to the model: <|begin_of_text|><|start_header_id|>user<|end_header_id|> I am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris: 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows. These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|> What is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|> This produces the following response: The Eiffel Tower! It's an iconic symbol of Paris and one of the most recognizable landmarks in the world. Here are some reasons why it's so great: 1. **Engineering marvel**: When it was built for the 1889 World's Fair, the Eiffel Tower was a groundbreaking feat of engineering. It was the tallest structure in the world at the time, standing at 324 meters (1,063 feet). 2. **Breathtaking views**: The Eiffel Tower offers stunning views of the city from its observation decks. On a clear day, you can see for miles in every direction, taking in the beautiful Parisian landscape. 3. **Romantic atmosphere**: The Eiffel Tower is often associated with romance and love. It's a popular spot for proposals, weddings, and romantic getaways. 4. **Historical significance**: The Eiffel Tower has played a significant role in history, serving as a communication tower during World War I and a symbol of French resistance during World War II. 5. **Iconic design**: The Eiffel Tower's lattice-like design is instantly recognizable and has been imitated and parodied countless times in art, architecture, and popular culture. Llama Guard You can also use the Llama Guard model to help add guardrails for these models. Llama Guard provides input and output guardrails for LLM deployments. Llama Guard is a publicly available model that performs competitively on common open benchmarks and provides developers with a pre-trained model to help defend against generating potentially risky outputs. This model has been trained on a mix of publicly available datasets to enable detection of common types of potentially risky or violating content that may be relevant to a number of developer use cases. You can use Llama Guard as a supplemental tool for developers to integrate into their own mitigation strategies, such as for chatbots, content moderation, customer service, social media monitoring, and education. By passing user-generated content through Llama Guard before publishing or responding to it, developers can flag unsafe or inappropriate language and take action to maintain a safe and respectful environment. Llama Guard is available on SageMaker JumpStart. Conclusion In this post, we explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and run a wide range of pre-trained FMs for inference, including Meta’s most advanced and capable models to date. Llama 3.1 models are available today in SageMaker JumpStart initially in the US East (N. Virginia), US East (Ohio), and US West (Oregon) AWS Regions. Get started with SageMaker JumpStart and Llama 3.1 models today. Resources For additional resources, refer to the following: SageMaker JumpStart model catalog Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker Train, deploy, and evaluate pretrained models with SageMaker JumpStart About the Authors Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family. James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In his spare time he enjoys seeking out new cultures, new experiences, and staying up to date with the latest technology trends.You can find him on LinkedIn. Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker built-in algorithms team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron. Jonathan Guinegagne is a Senior Software Engineer with Amazon SageMaker JumpStart at AWS. He got his master’s degree from Columbia University. His interests span machine learning, distributed systems, and cloud computing, as well as democratizing the use of AI. Jonathan is originally from France and now lives in Brooklyn, NY. Eissa Jamil is a Partner Engineer in AI at Meta, working with the Llama family of models. He’s been an engineer supporting Llama since helping to launch Llama 2 and has been working closely with AWS to build out support for Llama across their services like SageMaker JumpStart and Amazon Bedrock. Outside of work, he enjoys spending time with his family, building prototype projects, and weightlifting or competing in sports. Helen Suk is a Product Marketing Manager for AI at Meta, where she oversees the Llama family growth, building on her legacy as a founding team member who launched Llama 2 in 2023. Previously, she led marketing efforts for PyTorch, where she developed a passion for open-source innovation and AI. Outside of work, Helen enjoys serving up tennis aces and exploring the Pacific Northwest with her family.

Published: 2024-07-23T16:16:39

Today's AI/ML headlines are brought to you by ThreatPerspective

Llama 3.1 models are now available in Amazon SageMaker JumpStart