AI agents have reached a critical inflection point where their ability to generate sophisticated code exceeds the capacity to execute it safely in production environments. Organizations deploying agentic AI face a fundamental dilemma: although large language models (LLMs) can produce complex code scripts, mathematical analyses, and data visualizations, executing this AI-generated code introduces significant security vulnerabilities and operational complexity.

In this post, we introduce the Amazon Bedrock AgentCore Code Interpreter, a fully managed service that enables AI agents to securely execute code in isolated sandbox environments. We discuss how the AgentCore Code Interpreter helps solve challenges around security, scalability, and infrastructure management when deploying AI agents that need computational capabilities. We walk through the service’s key features, demonstrate how it works with practical examples, and show you how to get started with building your own agents using popular frameworks like Strands, LangChain, and LangGraph.

Security and scalability challenges with AI-generated code

Consider an example where an AI agent needs perform analysis on multi-year sales projections data for a product, to understand anomalies, trends, and seasonality. The analysis should be grounded in logic, repeatable, handle data securely, and scalable over large data and multiple iterations, if needed. Although LLMs excel at understanding and explaining concepts, they lack the ability to directly manipulate data or perform consistent mathematical operations at scale. LLMs alone are often inadequate for complex data analysis tasks like these, due to their inherent limitations in processing large datasets, performing precise calculations, and generating visualizations. This is where code interpretation and execution tools become essential, providing the capability to execute precise calculations, handle large datasets efficiently, and create reproducible analyses through programming languages and specialized libraries. Furthermore, implementing code interpretation capabilities comes with significant considerations. Organizations must maintain secure sandbox environments to help prevent malicious code execution, manage resource allocation, and maintain data privacy. The infrastructure requires regular updates, robust monitoring, and careful scaling strategies to handle increasing demand.

Traditional approaches to code execution in AI systems suffer from several limitations:

  • Security vulnerabilities – Executing untrusted AI-generated code in production environments exposes organizations to code injection threats, unauthorized system access, and potential data breaches. Without proper sandboxing, malicious or poorly constructed code can compromise entire infrastructure stacks.
  • Infrastructure overhead – Building secure execution environments requires extensive DevOps expertise, including container orchestration, network isolation, resource monitoring, and security hardening. Many organizations lack the specialized knowledge to implement these systems correctly.
  • Scalability bottlenecks – Traditional code execution environments struggle with the dynamic, unpredictable workloads generated by AI agents. Peak demand can overwhelm static infrastructure, and idle periods waste computational resources.
  • Integration complexity – Connecting secure code execution capabilities with existing AI frameworks often requires custom development, creating maintenance overhead and limiting adoption across development teams.
  • Compliance challenges – Enterprise environments demand comprehensive audit trails, access controls, and compliance certifications that are difficult to implement and maintain in custom solutions.

These barriers have prevented organizations from fully using the computational capabilities of AI agents, limiting their applications to simple, deterministic tasks rather than the complex, code-dependent workflows that could maximize business value.

Introducing the Amazon Bedrock AgentCore Code Interpreter

With the AgentCore Core Interpreter, AI agents can write and execute code securely in sandbox environments, enhancing their accuracy and expanding their ability to solve complex end-to-end tasks. This purpose-built service minimizes the security, scalability, and integration challenges that have hindered AI agent deployment by providing a fully managed, enterprise-grade code execution system specifically designed for agentic AI workloads. The AgentCore Code Interpreter is designed and built from the ground up for AI-generated code, with built-in safeguards, dynamic resource allocation, and seamless integration with popular AI frameworks. It offers advanced configuration support and seamless integration with popular frameworks, so developers can build powerful agents for complex workflows and data analysis while meeting enterprise security requirements.

Transforming AI agent capabilities

The AgentCore Code Interpreter powers advanced use cases by addressing several critical enterprise requirements:

  • Enhanced security posture – Configurable network access options range from fully isolated environments, which provide enhanced security by helping prevent AI-generated code from accessing external systems, to controlled network connectivity that provides flexibility for specific development needs and use cases.
  • Zero infrastructure management – The fully managed service minimizes the need for specialized DevOps resources, reducing time-to-market from months to days while maintaining enterprise-grade reliability and security.
  • Dynamic scalability – Automatic resource allocation handles varying AI agent workloads without manual intervention, providing low-latency session start-up times during peak demand while optimizing costs during idle periods.
  • Framework agnostic integration – It integrates with Amazon Bedrock AgentCore Runtime, with native support for popular AI frameworks including Strands, LangChain, LangGraph, and CrewAI, so teams can use existing investments while maintaining development velocity.
  • Enterprise compliance – Built-in access controls and comprehensive audit trails facilitate regulatory compliance without additional development overhead.

Purpose-built for AI agent code execution

The AgentCore Code Interpreter represents a shift in how AI agents interact with computational resources. This operation processes the agent generated code, runs it in a secure environment, and returns the execution results, including output, errors, and generated visualizations. The service operates as a secure, isolated execution environment where AI agents can run code (Python, JavaScript, and TypeScript), perform complex data analysis, generate visualizations, and execute mathematical computations without compromising system security. Each execution occurs within a dedicated sandbox environment that provides complete isolation from other workloads and the broader AWS infrastructure. What distinguishes the AgentCore Code Interpreter from traditional execution environments is its optimization for AI-generated workloads. The service handles the unpredictable nature of AI-generated code through intelligent resource management, automatic error handling, and built-in security safeguards specifically designed for untrusted code execution.

Key features and capabilities of AgentCore Code Interpreter include:

  • Secure sandbox architecture:
    • Low-latency session start-up time and compute-based session isolation facilitating complete workload separation
    • Configurable network access policies supporting both isolated sandbox and controlled public network modes
    • Implements resource constraints by setting maximum limits on memory and CPU usage per session, helping to prevent excessive consumption (see AgentCore Code Interpreter Service Quotas)
  • Advanced session management:
    • Persistent session state allowing multi-step code execution workflows
    • Session-based file storage for complex data processing pipelines
    • Automatic session and resource cleanup
    • Support for long-running computational tasks with configurable timeouts
  • Comprehensive Python runtime environment:
    • Pre-installed data science libraries, including pandas, numpy, matplotlib, scikit-learn, and scipy
    • Support for popular visualization libraries, including seaborn and bokeh
    • Mathematical computing capabilities with sympy and statsmodels
    • Custom package installation within sandbox boundaries for specialized requirements
  • File operations and data management:
    • Upload data files, process them with code, and retrieve the results
    • Secure file transfer mechanisms with automatic encryption
    • Support for upload and download of files directly within the sandbox from Amazon Simple Storage Service (Amazon S3)
    • Support for multiple file formats, including CSV, JSON, Excel, and images
    • Temporary storage with automatic cleanup for enhanced security
    • Support for running AWS Command Line Interface (AWS CLI) commands directly within the sandbox, using the Amazon Bedrock AgentCore SDK and API
  • Enterprise integration features:

How the AgentCore Code Interpreter works

To understand the functionality of the AgentCore Code Interpreter, let’s examine the orchestrated flow of a typical data analysis request from an AI agent, as illustrated in the following diagram.

The workflow consists of the following key components:

  • Deployment and invocation – An agent is built and deployed (for instance, on the AgentCore Runtime) using a framework like Strands, LangChain, LangGraph, or CrewAI. When a user sends a prompt (for example, “Analyze this sales data and show me the trend by salesregion”), the AgentCore Runtime initiates a secure, isolated session.
  • Reasoning and tool selection – The agent’s underlying LLM analyzes the prompt and determines that it needs to perform a computation. It then selects the AgentCore Code Interpreter as the appropriate tool.
  • Secure code execution – The agent generates a code snippet, for instance using the pandas library, to read a data file and matplotlib to create a plot. This code is passed to the AgentCore Code Interpreter, which executes it within its dedicated, sandboxed session. The agent can read from and write files to the session-specific file system.
  • Observation and iteration – The AgentCore Code Interpreter returns the result of the execution—such as a calculated value, a dataset, an image file of a graph, or an error message—to the agent. This feedback loop allows the agent to engage in iterative problem-solving by debugging its own code and refining its approach.
  • Context and memory – The agent maintains context for subsequent turns in the conversation, during the duration of the session. Alternatively, the entire interaction can be persisted in Amazon Bedrock AgentCore Memory for long-term storage and retrieval.
  • Monitoring and observability – Throughout this process, a detailed trace of the agent’s execution, providing visibility into agent behavior, performance metrics, and logs, is available for debugging and auditing purposes.

Practical real-world applications and use cases

The AgentCore Code Interpreter can be applied to real-world business problems that are difficult to solve with LLMs alone.

Use case 1: Automated financial analysis

An agent can be tasked with performing on-demand analysis of financial data. For this example, a user provides a CSV file of billing data within the following prompt and asks for analysis and visualization: “Using the billing data provided below, create a bar graph that shows the total spend by product category… After generating the graph, provide a brief interpretation of the results…”The agent takes the following actions:

  1. The agent receives the prompt and the data file containing the raw data.
  2. It invokes the AgentCore Code Interpreter, generating Python code with the pandas library to parse the data into a DataFrame. The agent then generates another code block to group the data by category and sum the costs, and asks the AgentCore Code Interpreter to execute it.
  3. The agent uses matplotlib to generate a bar chart and the AgentCore Code Interpreter saves it as an image file.
  4. The agent returns both a textual summary of the findings and the generated PNG image of the graph.

Use case 2: Interactive data science assistant

The AgentCore Code Interpreter’s stateful session supports a conversational and iterative workflow for data analysis. For this example, a data scientist uses an agent for exploratory data analysis. The workflow is as follows:

  1. The user provides a prompt: “Load dataset.csv and provide descriptive statistics.”
  2. The agent generates and executes pandas.read_csv('dataset.csv') followed by .describe()and returns the statistics table.
  3. The user prompts, “Plot a scatter plot of column A versus column B.”
  4. The agent, using the dataset already loaded in its session, generates code with matplotlib.pyplot.scatter() and returns the plot.
  5. The user prompts, “Run a simple linear regression and provide the R^2 value.”
  6. The agent generates code using the scikit-learn library to fit a model and calculate the R^2 metric.

This demonstrates iterative code execution capabilities, which allow agents to work through complex data science problems in a turn-by-turn manner with the user.

Solution overview

To get started with the AgentCore Code Interpreter, clone the GitHub repo:

git clone https://github.com/awslabs/amazon-bedrock-agentcore-samples.git

In the following sections, we show how to create a question answering agent that validates answers through code and reasoning. We build it using the Strands SDK, but you can use a framework of your choice.

Prerequisites

Make sure you have the following prerequisites:

  • An AWS account with AgentCore Code Interpreter access
  • The necessary IAM permissions to create and manage AgentCore Code Interpreter resources and invoke models on Amazon Bedrock
  • The required Python packages installed (including boto3, bedrock-agentcore, and strands)
  • Access to Anthropic’s Claude 4 Sonnet model in the us-west-2 AWS Region (Anthropic’s Claude 4 is the default model for Strands SDK, but you can override and use your preferred model as described in the Strands SDK documentation)

Configure your IAM role

Your IAM role should have appropriate permissions to use the AgentCore Code Interpreter:

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "bedrock-agentcore:CreateCodeInterpreter",
            "bedrock-agentcore:StartCodeInterpreterSession",
            "bedrock-agentcore:InvokeCodeInterpreter",
            "bedrock-agentcore:StopCodeInterpreterSession",
            "bedrock-agentcore:DeleteCodeInterpreter",
            "bedrock-agentcore:ListCodeInterpreters",
            "bedrock-agentcore:GetCodeInterpreter"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
        ],
        "Resource": "arn:aws:logs:*:*:log-group:/aws/bedrock-agentcore/code-interpreter*"
    }
]
}

Set up and configure the AgentCore Code Interpreter

Complete the following setup and configuration steps:

  1. Install the bedrock-agentcore Python SDK:
pip install bedrock-agentcore
  1. Import the AgentCore Code Interpreter and other libraries:
from bedrock_agentcore.tools.code_interpreter_client import code_session
from strands import Agent, tool
import json
  1. Define the system prompt:
SYSTEM_PROMPT  """You are a helpful AI assistant that validates all answers through code execution.

TOOL AVAILABLE:
- execute_python: Run Python code and see output
  1. Define the code execution tool for the agent. Within the tool definition, we use the invoke method to execute the Python code generated by the LLM-powered agent. It automatically starts a serverless AgentCore Code Interpreter session if one doesn’t exist.
@tool
def execute_python(code: str, description: str = "") -> str:
    """Execute Python code in the sandbox."""
    
    if description:
        code = f"# {description}\n{code}"
    
    print(f"\n Generated Code: {code}")
        
    for event in response["stream"]:
        return json.dumps(event["result"])
  1. Configure the agent:
agent  Agent(
tools[execute_python],
system_promptSYSTEM_PROMPT,
callback_handler
)

Invoke the agent

Test the AgentCore Code Interpreter powered agent with a simple prompt:

query  "Tell me the largest random prime number between 1 and 100, which is less than 84 and more that 9"
try:
    response_text = ""
    async for event in agent.stream_async(query):
        if "data" in event:
            chunk = event["data"]
            response_text += chunk
            print(chunk, end="")
except Exception as e:
    print(f"Error occurred: {str(e)}")

We get the following result:

I'll find the largest random prime number between 1 and 100 that is less than 84 and more than 9. To do this, I'll write code to:

1. Generate all prime numbers in the specified range
2. Filter to keep only those > 9 and < 84
3. Find the largest one

Let me implement this:
 Generated Code: import random

def is_prime(n):
    """Check if a number is prime"""
    if n <= 1:
        return False
    if n <= 3:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True

# Find all primes in the range
primes_in_range = [n for n in range(10, 84) if is_prime(n)]

print("All prime numbers between 10 and 83:")
print(primes_in_range)

# Get the largest prime in the range
largest_prime = max(primes_in_range)
print(f"\nThe largest prime number between 10 and 83 is: {largest_prime}")

# For verification, let's check that it's actually prime
print(f"Verification - is {largest_prime} prime? {is_prime(largest_prime)}")
Based on the code execution, I can tell you that the largest prime number between 1 and 100, which is less than 84 and more than 9, is **83**.

I verified this by:
1. Writing a function to check if a number is prime
2. Generating all prime numbers in the range 10-83
3. Finding the maximum value in that list

The complete list of primes in your specified range is: 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, and 83.

Since 83 is the largest among these primes, it is the answer to your question.

Pricing and availability

Amazon Bedrock AgentCore is available in multiple Regions and uses a consumption-based pricing model with no upfront commitments or minimum fees. Billing for the AgentCore Code Interpreter is calculated per second and is based on the highest watermark of CPU and memory resources consumed during that second, with a 1-second minimum charge.

Conclusion

The AgentCore Code Interpreter transforms the landscape of AI agent development by solving the critical challenge of secure, scalable code execution in production environments. This purpose-built service minimizes the complex infrastructure requirements, security vulnerabilities, and operational overhead that have historically prevented organizations from deploying sophisticated AI agents capable of complex computational tasks. The service’s architecture—featuring isolated sandbox environments, enterprise-grade security controls, and seamless framework integration—helps development teams focus on agent logic and business value rather than infrastructure complexity.

To learn more, refer to the following resources:

Try it out today or reach out to your AWS account team for a demo!


About the authors

Veda Raman is a Senior Specialist Solutions Architect for generative AI and machine learning at AWS. Veda works with customers to help them architect efficient, secure, and scalable machine learning applications. Veda specializes in generative AI services like Amazon Bedrock and Amazon SageMaker.

Rahul Sharma is a Senior Specialist Solutions Architect at AWS, helping AWS customers build and deploy, scalable Agentic AI solutions. Prior to joining AWS, Rahul spent more than decade in technical consulting, engineering, and architecture, helping companies build digital products, powered by data and machine learning. In his free time, Rahul enjoys exploring cuisines, traveling, reading books(biographies and humor) and binging on investigative documentaries, in no particular order.

Kishor Aher is a Principal Product Manager at AWS, leading the Agentic AI team responsible for developing first-party tools such as Browser Tool, and Code Interpreter. As a founding member of Amazon Bedrock, he spearheaded the vision and successful launch of the service, driving key features including Converse API, Managed Model Customization, and Model Evaluation capabilities. Kishor regularly shares his expertise through speaking engagements at AWS events, including re:Invent and AWS Summits. Outside of work, he pursues his passion for aviation as a general aviation pilot and enjoys playing volleyball.



Source link

Share.
Leave A Reply

Exit mobile version