Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Texas Gov. Abbott signs redrawn congressional map favoring Republicans into law after Trump push

    August 29, 2025

    Japan accelerates missile deployment amid rising regional tensions

    August 29, 2025

    A conservative Wisconsin Supreme Court justice won’t run again, creating an open seat

    August 29, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » Introducing auto scaling on Amazon SageMaker HyperPod
    AI AWS

    Introducing auto scaling on Amazon SageMaker HyperPod

    adminBy adminAugust 29, 2025No Comments13 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Today, we’re excited to announce that Amazon SageMaker HyperPod now supports managed node automatic scaling with Karpenter, so you can efficiently scale your SageMaker HyperPod clusters to meet your inference and training demands. Real-time inference workloads require automatic scaling to address unpredictable traffic patterns and maintain service level agreements (SLAs). As demand spikes, organizations must rapidly adapt their GPU compute without compromising response times or cost-efficiency. Unlike self-managed Karpenter deployments, this service-managed solution alleviates the operational overhead of installing, configuring, and maintaining Karpenter controllers, while providing tighter integration with the resilience capabilities of SageMaker HyperPod. This managed approach supports scale to zero, reducing the need for dedicated compute resources to run the Karpenter controller itself, improving cost-efficiency.

    SageMaker HyperPod offers a resilient, high-performance infrastructure, observability, and tooling optimized for large-scale model training and deployment. Companies like Perplexity, HippocraticAI, H.AI, and Articul8 are already using SageMaker HyperPod for training and deploying models. As more customers transition from training foundation models (FMs) to running inference at scale, they require the ability to automatically scale their GPU nodes to handle real production traffic by scaling up during high demand and scaling down during periods of lower utilization. This capability necessitates a powerful cluster auto scaler. Karpenter, an open source Kubernetes node lifecycle manager created by AWS, is a popular choice among Kubernetes users for cluster auto scaling due to its powerful capabilities that optimize scaling times and reduce costs.

    This launch provides a managed Karpenter-based solution for automatic scaling that is installed and maintained by SageMaker HyperPod, removing the undifferentiated heavy lifting of setup and management from customers. The feature is available for SageMaker HyperPod EKS clusters, and you can enable auto scaling to transform your SageMaker HyperPod cluster from static capacity to a dynamic, cost-optimized infrastructure that scales with demand. This combines Karpenter’s proven node lifecycle management with the purpose-built and resilient infrastructure of SageMaker HyperPod, designed for large-scale machine learning (ML) workloads. In this post, we dive into the benefits of Karpenter, and provide details on enabling and configuring Karpenter in your SageMaker HyperPod EKS clusters.

    New features and benefits

    Karpenter-based auto scaling in your SageMaker HyperPod clusters provides the following capabilities:

    • Service managed lifecycle – SageMaker HyperPod handles Karpenter installation, updates, and maintenance, alleviating operational overhead
    • Just-in-time provisioning – Karpenter observes your pending pods and provisions the required compute for your workloads from an on-demand pool
    • Scale to zero – You can scale down to zero nodes without maintaining dedicated controller infrastructure
    • Workload-aware node selection – Karpenter chooses optimal instance types based on pod requirements, Availability Zones, and pricing to minimize costs
    • Automatic node consolidation – Karpenter regularly evaluates clusters for optimization opportunities, shifting workloads to avoid underutilized nodes
    • Integrated resilience – Karpenter uses the built-in fault tolerance and node recovery mechanisms of SageMaker HyperPod

    These capabilities are built on top of recently launched continuous provisioning capabilities, which enables SageMaker HyperPod to automatically provision remaining capacity in the background while workloads start immediately on available instances. When node provisioning encounters failures due to capacity constraints or other issues, SageMaker HyperPod automatically retries in the background until clusters reach their desired scale, so your auto scaling operations remain resilient and non-blocking.

    Solution overview

    The following diagram illustrates the solution architecture.

    Karpenter works as a controller in the cluster and operates in the following steps:

    • Watching – Karpenter watches for un-schedulable pods in the cluster through the Kubernetes API server. These could be pods that go into pending state when deployed or automatically scaled to increase the replica count.
    • Evaluating – When Karpenter finds such pods, it computes the shape and size of a NodeClaim to fit the set of pods requirements (GPU, CPU, memory) and topology constraints, and checks if it can pair them with an existing NodePool. For each NodePool, it queries the SageMaker HyperPod APIs to get the instance types supported by the NodePool. It uses the information about instance type metadata (hardware requirements, zone, capacity type) to find a matching NodePool.
    • Provisioning – If Karpenter finds a matching NodePool, it creates a NodeClaim and tries to provision a new instance to be used as the new node. Karpenter internally uses the sagemaker:UpdateCluster API to increase the capacity of the selected instance group.
    • Disrupting – Karpenter periodically checks if a new node is needed or not. If it’s not needed, Karpenter deletes it, which internally translates to a delete node request to the SageMaker HyperPod cluster.

    Prerequisites

    Verify you have the required quotas for the instances you will create in the SageMaker HyperPod cluster. To review your quotas, on the Service Quotas console, choose AWS services in the navigation pane, then choose SageMaker. For example, the following screenshot shows the available quota for g5.12xlarge instances (three).

    To update the cluster, you must first create AWS Identity and Access Management (IAM) permissions for Karpenter. For instructions, see Create an IAM role for HyperPod autoscaling with Karpenter.

    Create and configure a SageMaker HyperPod cluster

    To begin, launch and configure your SageMaker HyperPod EKS cluster and verify that continuous provisioning mode is enabled on cluster creation. Complete the following steps:

    1. On the SageMaker AI console, choose HyperPod clusters in the navigation pane.
    2. Choose Create HyperPod cluster and Orchestrated on Amazon EKS.
    3. For Setup options, select Custom setup.
    4. For Name, enter a name.
    5. For Instance recovery, select Automatic.
    6. For Instance provisioning mode, select Use continuous provisioning.
    7. Choose Submit.

    This setup creates the necessary configuration such as virtual private cloud (VPC), subnets, security groups, and EKS cluster, and installs operators in the cluster. You can also provide existing resources such as an EKS cluster if you want to use an existing cluster instead of creating a new one. This setup will take around 20 minutes.

    Verify that each InstanceGroup is limited to one zone by opting for the OverrideVpcConfig and selecting only one subnet per each InstanceGroup.

    After you create the cluster, you must update it to enable Karpenter. You can do this using Boto3 or the AWS Command Line Interface (AWS CLI) using the UpdateCluster API command (after configuring the AWS CLI to connect to your AWS account).

    The following code uses Python Boto3:

    import boto3
    client = boto3.client('sagemaker')
    response = client.update_cluster(
        ClusterName=,
        AutoScaling = { "Mode": "Enable", "AutoScalerType": "Karpenter" },
        ClusterRole = ,
    )

    The following code uses the AWS CLI:

    aws sagemaker update-cluster \
        --cluster-name  \
        --auto-scaling '{ "Mode": "Enable", "AutoScalerType": "Karpenter" }` \
        --cluster-role 

    After you run this command and update the cluster, you can verify that Karpenter has been enabled by running the DescribeCluster API.

    The following code uses Python:

    import boto3
    client = boto3.client('sagemaker')
    print(sagemaker_client.describe_cluster(ClusterName=).get("AutoScaling"))

    The following code uses the AWS CLI:

    aws sagemaker describe-cluster --cluster-name  --query AutoScaling

    The following code shows our output:

    {'Mode': 'Enable',
     'AutoScalerType': 'Karpenter',
     'Status': 'Enabled'}

    Now you have a working cluster. The next step is to set up some custom resources in your cluster for Karpenter.

    Create HyperpodNodeClass

    HyperpodNodeClass is a custom resource that maps to pre-created instance groups in SageMaker HyperPod, defining constraints around which instance types and Availability Zones are supported for Karpenter’s auto scaling decisions. To use HyperpodNodeClass, simply specify the names of the InstanceGroups of your SageMaker HyperPod cluster that you want to use as the source for the AWS compute resources to use to scale up your pods in your NodePools.

    The HyperpodNodeClass name that you use here is carried over to the NodePool in the next section where you reference it. This tells the NodePool which HyperpodNodeClass to draw resources from. To create a HyperpodNodeClass, complete the following steps:

    1. Create a YAML file (for example, nodeclass.yaml) similar to the following code. Add InstanceGroup names that you used at the time of the SageMaker HyperPod cluster creation. You can also add new instance groups to an existing SageMaker HyperPod EKS cluster.
    2. Reference the HyperPodNodeClass name in your NodePool configuration.

    The following is a sample HyperpodNodeClass that uses ml.g6.xlarge and ml.g6.4xlarge instance types:

    apiVersion: karpenter.sagemaker.amazonaws.com/v1
    kind: HyperpodNodeClass
    metadata:
      name: multiazg6
    spec:
      instanceGroups:
        # name of InstanceGroup in HyperPod cluster. InstanceGroup needs to pre-created
        # before this step can be completed.
        # MaxItems: 10
        - auto-g6-az1
        - auto-g6-4xaz2

    1. Apply the configuration to your EKS cluster using kubectl:
    kubectl apply -f nodeclass.yaml

    1. Monitor the HyperpodNodeClass status to verify the Ready condition in status is set to True to ensure it was successfully created:
    kubectl get hyperpodnodeclass multiazc5 -oyaml

    The SageMaker HyperPod cluster must have AutoScaling enabled and the AutoScaling status must change to InService before the HyperpodNodeClass can be applied.

    For more information and key considerations, see Autoscaling on SageMaker HyperPod EKS.

    Create NodePool

    The NodePool sets constraints on the nodes that can be created by Karpenter and the pods that can run on those nodes. The NodePool can be set to perform various actions, such as:

    • Define labels and taints to limit the pods that can run on nodes Karpenter creates
    • Limit node creation to certain zones, instance types, and computer architectures, and so on

    For more information about NodePool, refer to NodePools. SageMaker HyperPod managed Karpenter supports a limited set of well-known Kubernetes and Karpenter requirements, which we explain in this post.

    To create a NodePool, complete the following steps:

    1. Create a YAML file named nodepool.yaml with your desired NodePool configuration.

    The following code is a sample configuration to create a sample NodePool. We specify the NodePool to include our ml.g6.xlarge SageMaker instance type, and we additionally specify it for one zone. Refer to NodePools for more customizations.

    apiVersion: karpenter.sh/v1
    kind: NodePool
    metadata:
     name: gpunodepool
    spec:
     template:
       spec:
         nodeClassRef:
          group: karpenter.sagemaker.amazonaws.com
          kind: HyperpodNodeClass
          name: multiazg6
         expireAfter: Never
         requirements:
            - key: node.kubernetes.io/instance-type
              operator: Exists
            - key: "node.kubernetes.io/instance-type"
              operator: In
              values: ["ml.g6.xlarge"]
            - key: "topology.kubernetes.io/zone"
              operator: In
              values: ["us-west-2a"]

    1. Apply the NodePool to your cluster:
    kubectl apply -f nodepool.yaml

    1. Monitor the NodePool status to ensure the Ready condition in the status is set to True:
    kubectl get nodepool gpunodepool -oyaml

    This example shows how a NodePool can be used to specify the hardware (instance type) and placement (Availability Zone) for pods.

    Launch a simple workload

    The following workload runs a Kubernetes deployment where the pods in deployment are requesting for 1 CPU and 256 MB memory per replica, per pod. The pods have not been spun up yet.

    kubectl apply -f https://raw.githubusercontent.com/aws/karpenter-provider-aws/refs/heads/main/examples/workloads/inflate.yaml

    When we apply this, we can see a deployment and a single node launch in our cluster, as shown in the following screenshot.

    To scale this component, use the following command:

    kubectl scale deployment inflate --replicas 10

    Within a few minutes, we can see Karpenter add the requested nodes to the cluster.

    Implement advanced auto scaling for inference with KEDA and Karpenter

    To implement an end-to-end auto scaling solution on SageMaker HyperPod, you can set up Kubernetes Event-driven Autoscaling (KEDA) along with Karpenter. KEDA enables pod-level auto scaling based on a wide range of metrics, including Amazon CloudWatch metrics, Amazon Simple Queue Service (Amazon SQS) queue lengths, Prometheus queries, and resource utilization patterns. By configuring Keda ScaledObject resources to target your model deployments, KEDA can dynamically adjust the number of inference pods based on real-time demand signals.

    When integrating KEDA and Karpenter, this combination creates a powerful two-tier auto scaling architecture. As KEDA scales your pods up or down based on workload metrics, Karpenter automatically provisions or deletes nodes in response to changing resource requirements. This integration delivers optimal performance while controlling costs by making sure your cluster has precisely the right amount of compute resources available at all times. For effective implementation, consider the following key factors:

    • Set appropriate buffer thresholds in KEDA to accommodate Karpenter’s node provisioning time
    • Configure cooldown periods carefully to prevent scaling oscillations
    • Define clear resource requests and limits to help Karpenter make optimal node selections
    • Create specialized NodePools tailored to specific workload characteristics

    The following is a sample spec of a KEDA ScaledObject file that scales the number of pods based on CloudWatch metrics of Application Load Balancer (ALB) request count:

    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: nd-deepseek-llm-scaler
      namespace: default
    spec:
      scaleTargetRef:
        name: nd-deepseek-llm-r1-distill-qwen-1-5b
        apiVersion: apps/v1
        kind: Deployment
      minReplicaCount: 1
      maxReplicaCount: 3
      pollingInterval: 30     # seconds between checks
      cooldownPeriod: 300     # seconds before scaling down
      triggers:
        - type: aws-cloudwatch
          metadata:
            namespace: AWS/ApplicationELB        # or your metric namespace
            metricName: RequestCount              # or your metric name
            dimensionName: LoadBalancer           # or your dimension key
            dimensionValue: app/k8s-default-albnddee-cc02b67f20/0991dc457b6e8447
            statistic: Sum
            threshold: "3"                        # change to your desired threshold
            minMetricValue: "0"                   # optional floor
            region: us-east-2                     # your AWS region
            identityOwner: operator               # use the IRSA SA bound to keda-operator

    Clean up

    To clean up your resources to avoid incurring more charges, delete your SageMaker HyperPod cluster.

    Conclusion

    With the launch of Karpenter node auto scaling on SageMaker HyperPod, ML workloads can automatically adapt to changing workload requirements, optimize resource utilization, and help control costs by scaling precisely when needed. You can also integrate it with event-driven pod auto scalers such as KEDA to scale based on custom metrics.

    To experience these benefits for your ML workloads, enable Karpenter in your SageMaker HyperPod clusters. For detailed implementation guidance and best practices, refer to Autoscaling on SageMaker HyperPod EKS.


    About the authors

    Vivek Gangasani is a Worldwide Lead GenAI Specialist Solutions Architect for SageMaker Inference. He drives Go-to-Market (GTM) and Outbound Product strategy for SageMaker Inference. He also helps enterprises and startups deploy, manage, and scale their GenAI models with SageMaker and GPUs. Currently, he is focused on developing strategies and content for optimizing inference performance and GPU efficiency for hosting Large Language Models. In his free time, Vivek enjoys hiking, watching movies, and trying different cuisines.

    Adam Stanley is a Solution Architect for Software, Internet and Model Provider customers at Amazon Web Services (AWS). He supports customers adopting all AWS services, but focuses primarily on Machine Learning training and inference infrastructure. Prior to AWS, Adam went to the University of New South Wales and graduated with degrees in Mathematics and Accounting. You can connect with him on LinkedIn.

    Kunal Jha is a Principal Product Manager at AWS, where he focuses on building Amazon SageMaker HyperPod to enable scalable distributed training and fine-tuning of foundation models. In his spare time, Kunal enjoys skiing and exploring the Pacific Northwest. You can connect with him on LinkedIn.

    Ty Bergstrom is a Software Engineer at Amazon Web Services. He works on the HyperPod Clusters platform for Amazon SageMaker.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Meet Boti: The AI assistant transforming how the citizens of Buenos Aires access government information with Amazon Bedrock

    August 28, 2025

    Mercury foundation models from Inception Labs are now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

    August 28, 2025

    Speed up delivery of ML workloads using Code Editor in Amazon SageMaker Unified Studio

    August 27, 2025

    Learn how Amazon Health Services improved discovery in Amazon search using AWS ML and gen AI

    August 26, 2025

    Accelerate enterprise AI implementations with Amazon Q Business

    August 26, 2025

    Inline code nodes now supported in Amazon Bedrock Flows in public preview

    August 25, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    How to Enable Remote Access on Windows 10 [Allow RDP]

    May 15, 20252 Views
    Don't Miss

    Texas Gov. Abbott signs redrawn congressional map favoring Republicans into law after Trump push

    August 29, 2025

    Texas Gov. Greg Abbott signed the bill redrawing Texas’ congressional map into law, he said…

    Japan accelerates missile deployment amid rising regional tensions

    August 29, 2025

    A conservative Wisconsin Supreme Court justice won’t run again, creating an open seat

    August 29, 2025

    WATCH: Trump cuts nearly $5B of foreign aid

    August 29, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Texas Gov. Abbott signs redrawn congressional map favoring Republicans into law after Trump push

    August 29, 2025

    Japan accelerates missile deployment amid rising regional tensions

    August 29, 2025

    A conservative Wisconsin Supreme Court justice won’t run again, creating an open seat

    August 29, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.