Digital Event Horizon
Researchers at Hugging Face have successfully benchmarked the performance of two representative agentic AI workload components - text embedding and text generation - on Google Cloud Compute Engine CPU instances, demonstrating significant performance enhancements thanks to Intel's Advanced Matrix Extensions (AMX). The study highlights potential for deploying light-weight agentic AI solutions wholly on CPUs, offering exciting prospects for enhancing productivity and operations across industries.
Researchers from Hugging Face successfully benchmarked the performance of text embedding and text generation on Google Cloud Compute Engine CPU instances.The study compared two instances, C4 (with 5th gen Intel Xeon processors) and N2 (with 3rd gen Intel Xeon processors), demonstrating benefits of AMX.C4 showed significant performance enhancements in text embedding (10x to 24x higher throughput) and text generation (13 times faster).The study found a TCO advantage of 7x ~ 19x for text embedding and 1.7x ~ 2.9x for text generation on C4 compared to N2.Agentic AI systems have the potential to autonomously solve complex problems, beyond chatbots.
In a groundbreaking study published recently, researchers from Hugging Face have successfully benchmarked the performance of two representative agentic AI workload components - text embedding and text generation - on Google Cloud Compute Engine CPU instances. The findings of this study have significant implications for the deployment of light-weight agentic AI solutions wholly on CPUs, thereby reducing host-accelerator traffic overheads.
The researchers utilized two Google Cloud Compute Engine CPU instances - N2 and C4 - to compare their performance in text embedding and text generation benchmarks. The C4 instance was powered by 5th generation Intel Xeon processors (code-named Emerald Rapids) that integrated Intel AMX to boost AI performance, while the N2 instance was powered by 3rd generation Intel Xeon processors (code-named Ice Lake). This comparison allowed the researchers to demonstrate the benefits of AMX.
The study employed Hugging Face's unified benchmark library for multi-backends and multi-devices, known as optimum-benchmark, which measures performance on multiple architectures. The optimum-intel backend was used in this study, leveraging an Intel acceleration library that accelerates end-to-end pipelines on Intel architectures (CPU, GPU).
In the text embedding benchmark, the GCP C4 instance delivered approximately 10x to 24x higher throughput over N2, showcasing significant performance enhancements thanks to AMX. In contrast, the C4 instance demonstrated a consistently faster performance in the text generation benchmark compared to N2, achieving throughputs of 13 times better across batch sizes of 1 to 16 without compromising latency.
These results highlight that C4's hourly price is about 1.3x that of N2, which means it keeps 7x ~ 19x Total Cost of Ownership (TCO) advantage over N2 in text embedding and 1.7x ~ 2.9x TCO advantage in text generation. The researchers concluded that these findings demonstrate the potential of deploying light-weight agentic AI solutions wholly on CPUs, providing exciting prospects for enhancing productivity and operations across industries.
Moreover, this study aligns with the view that the next frontier of artificial intelligence lies in agentic AI, which combines LLM's sophisticated reasoning and iterative planning capabilities with strong context understanding enhancement. Agentic AI systems are capable of autonomously solving complex multi-step problems and can directly take actions using tool calling capabilities, far beyond chatbots.
In light of these breakthroughs, the researchers plan to explore deploying light-weight agentic AI solutions wholly on CPUs once Google Cloud Compute Engine has its new Granite Rapids instance available. This study serves as an excellent example of how benchmarking research can drive innovation and advancements in AI technology.
Related Information:
https://huggingface.co/blog/intel-gcp-c4
Published: Tue Dec 17 02:48:14 2024 by llama3.2 3B Q4_K_M