How iFood built a platform to run hundreds of machine learning models with Amazon SageMaker Inference

Headquartered in São Paulo, Brazil, iFood is a national private company and the leader in food-tech in Latin America, processing millions of orders monthly. iFood has stood out for its strategy of incorporating cutting-edge technology into its operations. With the support of AWS, iFood has developed a robust machine learning (ML) inference infrastructure, using services such as Amazon SageMaker to efficiently create and deploy ML models. This partnership has allowed iFood not only to optimize its internal processes, but also to offer innovative solutions to its delivery partners and restaurants.

iFood’s ML platform comprises a set of tools, processes, and workflows developed with the following objectives:

Accelerate the development and training of AI/ML models, making them more reliable and reproducible
Make sure that deploying these models to production is reliable, scalable, and traceable
Facilitate the testing, monitoring, and evaluation of models in production in a transparent, accessible, and standardized manner

To achieve these objectives, iFood uses SageMaker, which simplifies the training and deployment of models. Additionally, the integration of SageMaker features in iFood’s infrastructure automates critical processes, such as generating training datasets, training models, deploying models to production, and continuously monitoring their performance.

In this post, we show how iFood uses SageMaker to revolutionize its ML operations. By harnessing the power of SageMaker, iFood streamlines the entire ML lifecycle, from model training to deployment. This integration not only simplifies complex processes but also automates critical tasks.

AI inference at iFood
iFood has harnessed the power of a robust AI/ML platform to elevate the customer experience across its diverse touchpoints. Using the cutting edge of AI/ML capabilities, the company has developed a suite of transformative solutions to address a multitude of customer use cases:

Personalized recommendations – At iFood, AI-powered recommendation models analyze a customer’s past order history, preferences, and contextual factors to suggest the most relevant restaurants and menu items. This personalized approach makes sure customers discover new cuisines and dishes tailored to their tastes, improving satisfaction and driving increased order volumes.
Intelligent order tracking – iFood’s AI systems track orders in real time, predicting delivery times with a high degree of accuracy. By understanding factors like traffic patterns, restaurant preparation times, and courier locations, the AI can proactively notify customers of their order status and expected arrival, reducing uncertainty and anxiety during the delivery process.
Automated customer Service – To handle the thousands of daily customer inquiries, iFood has developed an AI-powered chatbot that can quickly resolve common issues and questions. This intelligent virtual agent understands natural language, accesses relevant data, and provides personalized responses, delivering fast and consistent support without overburdening the human customer service team.
Grocery shopping assistance – Integrating advanced language models, iFood’s app allows customers to simply speak or type their recipe needs or grocery list, and the AI will automatically generate a detailed shopping list. This voice-enabled grocery planning feature saves customers time and effort, enhancing their overall shopping experience.

Through these diverse AI-powered initiatives, iFood is able to anticipate customer needs, streamline key processes, and deliver a consistently exceptional experience—further strengthening its position as the leading food-tech platform in Latin America.

Solution overview

The following diagram illustrates iFood’s legacy architecture, which had separate workflows for data science and engineering teams, creating challenges in efficiently deploying accurate, real-time machine learning models into production systems.

In the past, the data science and engineering teams at iFood operated independently. Data scientists would build models using notebooks, adjust weights, and publish them onto services. Engineering teams would then struggle to integrate these models into production systems. This disconnection between the two teams made it challenging to deploy accurate real-time ML models.

To overcome this challenge, iFood built an internal ML platform that helped bridge this gap. This platform has streamlined the workflow, providing a seamless experience for creating, training, and delivering models for inference. It provides a centralized integration where data scientists could build, train, and deploy models seamlessly from an integrated approach, considering the development workflow of the teams. The interaction with engineering teams could consume these models and integrate them into applications from both an online and offline perspective, enabling a more efficient and streamlined workflow.

By breaking down the barriers between data science and engineering, AWS AI platforms empowered iFood to use the full potential of their data and accelerate the development of AI applications. The automated deployment and scalable inference capabilities provided by SageMaker made sure that models were readily available to power intelligent applications and provide accurate predictions on demand. This centralization of ML services as a product has been a game changer for iFood, allowing them to focus on building high-performing models rather than the intricate details of inference.

One of the core capabilities of iFood’s ML platform is the ability to provide the infrastructure to serve predictions. Several use cases are supported by the inference made available through ML Go!, responsible for deploying SageMaker pipelines and endpoints. The former are used to schedule offline predictions jobs, and the latter are employed to create model services, to be consumed by the application services. The following diagram illustrates iFood’s updated architecture, which incorporates an internal ML platform built to streamline workflows between data science and engineering teams, enabling efficient deployment of machine learning models into production systems.

Integrating model deployment into the service development process was a key initiative to enable data scientists and ML engineers to deploy and maintain those models. The ML platform empowers the building and evolution of ML systems. Several other integrations with other important platforms, like the feature platform and data platform, were delivered to increase the experience for the users as a whole. The process of consuming ML-based decisions was streamlined—but it doesn’t end there. The iFood’s ML platform, ML Go!, is now focusing on new inference capabilities, supported by recent features in which the iFood’s team was responsible for supporting their ideation and development. The following diagram illustrates the final architecture of iFood’s ML platform, showcasing how model deployment is integrated into the service development process, the platform’s connections with feature and data platforms, and its focus on new inference capabilities.

One of the biggest changes is oriented to the creation of one abstraction for connecting with SageMaker Endpoints and Jobs, called ML Go! Gateway, and also, the separation of concerns within the Endpoints, by the use of the Inference Components feature, making the serving faster and more efficient. In this new inference structure, the Endpoints are also managed by the ML Go! CI/CD, leaving for the pipelines, to deal only with model promotions, and not the infrastructure itself. It will reduce the lead time to changes, and change failure ratio over the deployments.

Using SageMaker Inference Model Serving Containers:

One of the key features of modern machine learning platforms is the standardization of machine learning and AI services. By encapsulating models and dependencies as Docker containers, these platforms ensure consistency and portability across different environments and stages of ML. Using SageMaker, data scientists and developers can use pre-built Docker containers, making it straightforward to deploy and manage ML services. As a project progresses, they can spin up new instances and configure them according to their specific requirements. SageMaker provides Docker containers that are designed to work seamlessly with SageMaker. These containers provide a standardized and scalable environment for running ML workloads on SageMaker.

SageMaker provides a set of pre-built containers for popular ML frameworks and algorithms, such as TensorFlow, PyTorch, XGBoost, and many others. These containers are optimized for performance and include all the necessary dependencies and libraries pre-installed, making it straightforward to get started with your ML projects. In addition to the pre-built containers, it provides options to bring your own custom containers to SageMaker, which include your specific ML code, dependencies, and libraries. This can be particularly useful if you’re using a less common framework or have specific requirements that aren’t met by the pre-built containers.

iFood was highly focused on using custom containers for the training and deployment of ML workloads, providing a consistent and reproducible environment for ML experiments, and making it effortless to track and replicate results. The first step in this journey was to standardize the ML custom code, which is actually the piece of code that the data scientists should focus on. Without a notebook, and with BruceML, the way to create the code to train and serve models has changed, to be encapsulated from the start as container images. BruceML was responsible for creating the scaffolding required to seamlessly integrate with the SageMaker platform, allowing the teams to take advantage of its various features, such as hyperparameter tuning, model deployment, and monitoring. By standardizing ML services and using containerization, modern platforms democratize ML, enabling iFood to rapidly build, deploy, and scale intelligent applications.

Automating model deployment and ML system retraining

When running ML models in production, it’s critical to have a robust and automated process for deploying and recalibrating those models across different use cases. This helps make sure the models remain accurate and performant over time. The team at iFood understood this challenge well—not only the model is deployed. Instead, they rely on another concept to keep things running well: ML pipelines.

Using Amazon SageMaker Pipelines, they were able to build a CI/CD system for ML, to deliver automated retraining and model deployment. They also integrated this entire system with the company’s existing CI/CD pipeline, making it efficient and also maintaining good DevOps practices used at iFood. It starts with the ML Go! CI/CD pipeline pushing the latest code artifacts containing the model training and deployment logic. It includes the training process, which uses different containers for implementing the entire pipeline. When training is complete, the inference pipeline can be executed to begin the model deployment. It can be an entirely new model, or the promotion of a new version to increase the performance of an existing one. Every model available for deployment is also secured and registered automatically by ML Go! in Amazon SageMaker Model Registry, providing versioning and tracking capabilities.

The final step depends on the intended inference requirements. For batch prediction use cases, the pipeline creates a SageMaker batch transform job to run large-scale predictions. For real-time inference, the pipeline deploys the model to a SageMaker endpoint, carefully selecting the appropriate container variant and instance type to handle the expected production traffic and latency needs. This end-to-end automation has been a game changer for iFood, allowing them to rapidly iterate on their ML models and deploy updates and recalibrations quickly and confidently across their various use cases. SageMaker Pipelines has provided a streamlined way to orchestrate these complex workflows, making sure model operationalization is efficient and reliable.

Running inference in different SLA formats

iFood uses the inference capabilities of SageMaker to power its intelligent applications and deliver accurate predictions to its customers. By integrating the robust inference options available in SageMaker, iFood has been able to seamlessly deploy ML models and make them available for real-time and batch predictions. For iFood’s online, real-time prediction use cases, the company uses SageMaker hosted endpoints to deploy their models. These endpoints are integrated into iFood’s customer-facing applications, allowing for immediate inference on incoming data from users. SageMaker handles the scaling and management of these endpoints, making sure that iFood’s models are readily available to provide accurate predictions and enhance the user experience.

In addition to real-time predictions, iFood also uses SageMaker batch transform to perform large-scale, asynchronous inference on datasets. This is particularly useful for iFood’s data preprocessing and batch prediction requirements, such as generating recommendations or insights for their restaurant partners. SageMaker batch transform jobs enable iFood to efficiently process vast amounts of data, further enhancing their data-driven decision-making.

Building upon the success of standardization to SageMaker Inference, iFood has been instrumental in partnering with the SageMaker Inference team to build and enhance key AI inference capabilities within the SageMaker platform. Since the early days of ML, iFood has provided the SageMaker Inference team with valuable inputs and expertise, enabling the introduction of several new features and optimizations:

Cost and performance optimizations for generative AI inference – iFood helped the SageMaker Inference team develop innovative techniques to optimize the use of accelerators, enabling SageMaker Inference to reduce foundation model (FM) deployment costs by 50% on average and latency by 20% on average with inference components. This breakthrough delivers significant cost savings and performance improvements for customers running generative AI workloads on SageMaker.
Scaling improvements for AI inference – iFood’s expertise in distributed systems and auto scaling has also helped the SageMaker team develop advanced capabilities to better handle the scaling requirements of generative AI models. These improvements reduce auto scaling times by up to 40% and auto scaling detection by six times, making sure that customers can rapidly scale their inference workloads on SageMaker to meet spikes in demand without compromising performance.
Streamlined generative AI model deployment for inference – Recognizing the need for simplified model deployment, iFood collaborated with AWS to introduce the ability to deploy open source large language models (LLMs) and FMs with just a few clicks. This user-friendly functionality removes the complexity traditionally associated with deploying these advanced models, empowering more customers to harness the power of AI.
Scale-to-zero for inference endpoints – iFood played a crucial role in collaborating with SageMaker Inference to develop and launch the scale-to-zero feature for SageMaker inference endpoints. This innovative capability allows inference endpoints to automatically shut down when not in use and rapidly spin up on demand when new requests arrive. This feature is particularly beneficial for dev/test environments, low-traffic applications, and inference use cases with varying inference demands, because it eliminates idle resource costs while maintaining the ability to quickly serve requests when needed. The scale-to-zero functionality represents a major advancement in cost-efficiency for AI inference, making it more accessible and economically viable for a wider range of use cases.
Packaging AI model inference more efficiently – To further simplify the AI model lifecycle, iFood worked with AWS to enhance SageMaker’s capabilities for packaging LLMs and models for deployment. These improvements make it straightforward to prepare and deploy these AI models, accelerating their adoption and integration.
Multi-model endpoints for GPU – iFood collaborated with the SageMaker Inference team to launch multi-model endpoints for GPU-based instances. This enhancement allows you to deploy multiple AI models on a single GPU-enabled endpoint, significantly improving resource utilization and cost-efficiency. By taking advantage of iFood’s expertise in GPU optimization and model serving, SageMaker now offers a solution that can dynamically load and unload models on GPUs, reducing infrastructure costs by up to 75% for customers with multiple models and varying traffic patterns.
Asynchronous inference – Recognizing the need for handling long-running inference requests, the team at iFood worked closely with the SageMaker Inference team to develop and launch Asynchronous Inference in SageMaker. This feature enables you to process large payloads or time-consuming inference requests without the constraints of real-time API calls. iFood’s experience with large-scale distributed systems helped shape this solution, which now allows for better management of resource-intensive inference tasks, and the ability to handle inference requests that might take several minutes to complete. This capability has opened up new use cases for AI inference, particularly in industries dealing with complex data processing tasks such as genomics, video analysis, and financial modeling.

By closely partnering with the SageMaker Inference team, iFood has played a pivotal role in driving the rapid evolution of AI inference and generative AI inference capabilities in SageMaker. The features and optimizations introduced through this collaboration are empowering AWS customers to unlock the transformative potential of inference with greater ease, cost-effectiveness, and performance.

“At iFood, we were at the forefront of adopting transformative machine learning and AI technologies, and our partnership with the SageMaker Inference product team has been instrumental in shaping the future of AI applications. Together, we’ve developed strategies to efficiently manage inference workloads, allowing us to run models with speed and price-performance. The lessons we’ve learned supported us in the creation of our internal platform, which can serve as a blueprint for other organizations looking to harness the power of AI inference. We believe the features we have built in collaboration will broadly help other enterprises who run inference workloads on SageMaker, unlocking new frontiers of innovation and business transformation, by solving recurring and important problems in the universe of machine learning engineering.”

– says Daniel Vieira, ML Platform manager at iFood.

Conclusion

Using the capabilities of SageMaker, iFood transformed its approach to ML and AI, unleashing new possibilities for enhancing the customer experience. By building a robust and centralized ML platform, iFood has bridged the gap between its data science and engineering teams, streamlining the model lifecycle from development to deployment. The integration of SageMaker features has enabled iFood to deploy ML models for both real-time and batch-oriented use cases. For real-time, customer-facing applications, iFood uses SageMaker hosted endpoints to provide immediate predictions and enhance the user experience. Additionally, the company uses SageMaker batch transform to efficiently process large datasets and generate insights for its restaurant partners. This flexibility in inference options has been key to iFood’s ability to power a diverse range of intelligent applications.

The automation of deployment and retraining through ML Go!, supported by SageMaker Pipelines and SageMaker Inference, has been a game changer for iFood. This has enabled the company to rapidly iterate on its ML models, deploy updates with confidence, and maintain the ongoing performance and reliability of its intelligent applications. Moreover, iFood’s strategic partnership with the SageMaker Inference team has been instrumental in driving the evolution of AI inference capabilities within the platform. Through this collaboration, iFood has helped shape cost and performance optimizations, scale improvements, and simplify model deployment features—all of which are now benefiting a wider range of AWS customers.

By taking advantage of the capabilities SageMaker offers, iFood has been able to unlock the transformative potential of AI and ML, delivering innovative solutions that enhance the customer experience and strengthen its position as the leading food-tech platform in Latin America. This journey serves as a testament to the power of cloud-based AI infrastructure and the value of strategic partnerships in driving technology-driven business transformation.

By following iFood’s example, you can unlock the full potential of SageMaker for your business, driving innovation and staying ahead in your industry.

About the Authors

Daniel Vieira is a seasoned Machine Learning Engineering Manager at iFood, with a strong academic background in computer science, holding both a bachelor’s and a master’s degree from the Federal University of Minas Gerais (UFMG). With over a decade of experience in software engineering and platform development, Daniel leads iFood’s ML platform, building a robust, scalable ecosystem that drives impactful ML solutions across the company. In his spare time, Daniel Vieira enjoys music, philosophy, and learning about new things while drinking a good cup of coffee.

Debora Fanin serves as a Senior Customer Solutions Manager AWS for the Digital Native Business segment in Brazil. In this role, Debora manages customer transformations, creating cloud adoption strategies to support cost-effective, timely deployments. Her responsibilities include designing change management plans, guiding solution-focused decisions, and addressing potential risks to align with customer objectives. Debora’s academic path includes a Master’s degree in Administration at FEI and certifications such as Amazon Solutions Architect Associate and Agile credentials. Her professional history spans IT and project management roles across diverse sectors, where she developed expertise in cloud technologies, data science, and customer relations.

Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and Amazon SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Gopi Mudiyala is a Senior Technical Account Manager at AWS. He helps customers in the financial services industry with their operations in AWS. As a machine learning enthusiast, Gopi works to help customers succeed in their ML journey. In his spare time, he likes to play badminton, spend time with family, and travel.

Source link

What's Hot

Trump voters tell ABC News that Iran strikes don’t conflict with his America First agenda

Mahmoud Khalil, in 1st broadcast interview, says he’ll continue to ‘advocate for what’s right’

Israel and Iran agree to ceasefire to bring end to ’12 DAY WAR,’ Trump says

How iFood built a platform to run hundreds of machine learning models with Amazon SageMaker Inference

No-code data preparation for time series forecasting using Amazon SageMaker Canvas

Building a custom text-to-SQL agent using Amazon Bedrock and Converse API

Accelerate foundation model training and inference with Amazon SageMaker HyperPod and Amazon SageMaker Studio

Update on the AWS DeepRacer Student Portal

Building trust in AI: The AWS approach to the EU AI Act

Build a scalable AI video generator using Amazon SageMaker AI and CogVideoX

ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

Laws, norms, and ethics for AI in health

Trump voters tell ABC News that Iran strikes don’t conflict with his America First agenda

Mahmoud Khalil, in 1st broadcast interview, says he’ll continue to ‘advocate for what’s right’

Israel and Iran agree to ceasefire to bring end to ’12 DAY WAR,’ Trump says

Learning from other domains to advance AI evaluation and testing

ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

Our Picks

Trump voters tell ABC News that Iran strikes don’t conflict with his America First agenda

Mahmoud Khalil, in 1st broadcast interview, says he’ll continue to ‘advocate for what’s right’

Israel and Iran agree to ceasefire to bring end to ’12 DAY WAR,’ Trump says

Most Popular

ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

Subscribe to Updates

What's Hot

How iFood built a platform to run hundreds of machine learning models with Amazon SageMaker Inference

Solution overview

Using SageMaker Inference Model Serving Containers:

Automating model deployment and ML system retraining

Running inference in different SLA formats

Conclusion

About the Authors

Related Posts

Subscribe to Updates