Posted on

Serve 3,000 deep learning models on Amazon EKS with AWS Inferentia for under $50 an hour


More customers are finding the need to build larger, scalable, and more cost-effective machine learning (ML) inference pipelines in the cloud. Outside of these base prerequisites, the requirements of ML inference pipelines in production vary based on the business use case. A typical inference architecture for applications like recommendation engines, sentiment analysis, and ad ranking need to serve a large number of models, with a mix of classical ML and deep learning (DL) models. Each model has to be accessible through an application programing interface (API) endpoint and be able to respond within a predefined latency budget from the time …

Read More