Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    People trapped under rubble after explosion at Pennsylvania steel plant, rescue operation underway: Official

    August 11, 2025

    Video Trump puts DC police department under federal control, deploys National Guard

    August 11, 2025

    Judge rejects Trump administration’s request to unseal Ghislaine Maxwell grand jury testimony

    August 11, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » Process multi-page documents with human review using Amazon Bedrock Data Automation and Amazon SageMaker AI
    AI AWS

    Process multi-page documents with human review using Amazon Bedrock Data Automation and Amazon SageMaker AI

    adminBy adminAugust 11, 2025No Comments9 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Organizations across industries face challenges with high volumes of multi-page documents that require intelligent processing to extract accurate information. Although automation has improved this process, human expertise is still needed in specific scenarios to verify data accuracy and quality.

    In March 2025, AWS launched Amazon Bedrock Data Automation, which enables developers to automate the generation of valuable insights from unstructured multimodal content, including documents, images, video, and audio. Amazon Bedrock Data Automation streamlines document processing workflows by automating extraction, transformation, and generation of insights from unstructured content. It minimizes time-consuming tasks like data preparation, model management, fine-tuning, prompt engineering, and orchestration through a unified, multimodal inference API, delivering industry-leading accuracy at lower cost than alternative solutions.

    Amazon Bedrock Data Automation simplifies complex document processing tasks, including document splitting, classification, extraction, normalization, and validation, while incorporating visual grounding with confidence scores for explainability and built-in hallucination mitigation, providing trustworthy insights from unstructured data sources. However, although the advanced capabilities of Amazon Bedrock Data Automation deliver exceptional automation, there remain scenarios where human judgment is invaluable. This is where the integration with Amazon SageMaker AI creates a powerful end-to-end solution. By incorporating human review loops into the document processing workflow, organizations can maintain the highest levels of accuracy while maintaining processing efficiency. With a human review loop, organizations can:

    • Validate AI predictions when confidence is low
    • Handle edge cases and exceptions effectively
    • Maintain regulatory compliance through appropriate oversight
    • Maintain high accuracy while maximizing automation
    • Create feedback loops to improve model performance over time

    By implementing human loops strategically, organizations can focus human attention on uncertain portions of documents while allowing automated systems to handle routine extractions, creating an optimal balance between efficiency and accuracy. In this post, we show how to process multi-page documents with a human review loop using Amazon Bedrock Data Automation and SageMaker AI.

    Understanding confidence scores

    Confidence scores are crucial in determining when to invoke human review. Confidence scores are the percentage of certainty that Amazon Bedrock Data Automation has that your extraction is accurate.

    Our goal is to simplify intelligent document processing (IDP) by handling the heavy lifting of accuracy calculation within Amazon Bedrock Data Automation. This helps customers focus on solving their business challenges with Amazon Bedrock Data Automation rather than dealing with complex scoring mechanisms. Amazon Bedrock Data Automation optimizes its models for Expected Calibration Error (ECE), a metric that facilitates better calibration, leading to more reliable and accurate confidence scores.

    In document processing workflows, confidence scores are generally interpreted as:

    • High confidence (90–100%) – High certainty about its extraction
    • Medium confidence (70–89%) – Reasonable certainty with some potential for error
    • Low confidence (<70%) – High uncertainty, likely requiring human verification

    We recommend testing Amazon Bedrock Data Automation on your own specific datasets to determine the confidence threshold that triggers a human review workflow.

    Solution overview

    The following architecture provides a serverless solution for processing multi-page documents with human review loops using Amazon Bedrock Data Automation and SageMaker AI.

    The workflow consists of the following steps:

    1. Documents are uploaded to an Amazon Simple Storage Service (Amazon S3) input bucket, which serves as entry point for the documents processed through Amazon Bedrock Data Automation.
    2. An Amazon EventBridge rule automatically detects new objects in the S3 bucket and triggers the AWS Step Functions workflow that orchestrates the document processing pipeline.
    3. Within the Step Functions workflow, the bda-document-processor AWS Lambda function is executed, which invokes Amazon Bedrock Data Automation with the appropriate blueprint. Amazon Bedrock Data Automation uses these preconfigured instructions to extract and process information from the document.
    4. Amazon Bedrock Data Automation analyzes the document, extracts key fields with associated confidence scores, and stores the processed output in another S3 bucket. This output contains the extracted information and corresponding confidence levels.
    5. The Step Functions workflow invokes the bda-classifier Lambda function, which retrieves the Amazon Bedrock Data Automation output from Amazon S3. This function evaluates the confidence scores against predefined thresholds for the extracted fields.
    6. For fields with confidence scores below the threshold, the workflow routes the document to SageMaker AI for human review. Using the custom UI, humans review the tasks and validate the fields from the pages. Reviewers can correct fields that were incorrectly extracted by the automated process.
    7. The validated and corrected form data from human review is stored in an S3 bucket.
    8. Once Sagemaker AI output is written to Amazon S3, it executes the bda-a2i-aggregator AWS Lambda which updates the payload of Amazon Bedrock Data Automation output with the new value which was reviewed by human. This aggregated output is stored in Amazon S3. This provides the final, high-confidence output ready for downstream systems.

    Prerequisites

    To deploy this solution, you need the AWS Cloud Development Kit (AWS CDK), Node.js, and Docker installed on your deployment machine. A build script performs the packaging and deployment of the solution.

    Deploy the solution

    Complete the following steps to deploy the solution:

    1. Clone the solution repository to your deployment machine.
    2. Navigate to the project directory and run the build script:

    ./build.sh

    The deployment creates the following resources in your AWS account:

    • Two new S3 buckets: one for the initial upload of documents and one for the output of documents
    • An Amazon Bedrock Data Automation project and five blueprints used to process the test document
    • An Amazon Cognito user pool for the private workforce that Amazon SageMaker Ground Truth provides to SageMaker AI for data that is below a confidence score
    • Two Lambda functions and a Step Function workflow used to process the test documents
    • Two Amazon Elastic Container Registry (Amazon ECR) container images used for the Lambda functions to process the test documents

    Add a new worker to the private workforce

    After the build is complete, you must add a worker to the private workforce in SageMaker Ground Truth. Complete the following steps:

    1. On the SageMaker AI console, under Ground Truth in the navigation pane, choose Labeling workforces, then choose the Private tab.

    1. In the Workers section, choose Invite new workers.

    1. For Email addresses, enter the email addresses of the workers you want to invite. For this example, use an email you have access to.
    2. Choose Invite new workers.

    After the worker has been added, they will receive an email with a temporary password. This process might take up to 5 minutes before the email is received.

    1. On the Labeling workforces page, in the Private workforce summary section, choose the link for Labeling portal sign-in URL.

    1. In the prompt, enter the email address you used earlier to set up a worker and provide the temporary password from the email, then choose Sign In.

    1. Provide a new password when prompted.

    You will be redirected to a job queue page for the private labeling workforce. At the top of the page, a notice states that you are not a member of a work team yet. You must complete that process in the next step in order to make sure that jobs are properly assigned.

    1. On the Labeling workforces page, open the private team (for this post, bda-workforce).

    1. On the Workers tab, choose Add workers to team.

    1. Add the recently verified worker to the team.

    Test the solution

    To test the solution, upload the test document located in the assets folder of the project to the S3 bucket used for incoming documents. You can monitor the progress of the system on the Step Functions console or by reviewing the logs through Amazon CloudWatch. After the document is processed, you can see a new job queued up for the user in SageMaker AI. To view this job, navigate back to the Labeling workforces page and choose the link for Labeling portal sign-in URL.

    Log in using the email address and updated password from earlier. You will see a page that displays the jobs to be reviewed. Select the job and choose Start working.

    In the UI, you can review each item that was below a confidence score (defaulted to 70%) for the processed document.

    On this page, you can modify the data to the corrected values. The updated data will be saved in the S3 output bucket in the a2i-output/bda-review-flow-definition//review-loop-/output.json file. This data can then be processed and used to provide the corrected values for information retrieved from the document.

    Clean up

    To terminate all resources created in this solution, run the flowing command from the project root directory

    cdk destroy

    Conclusion

    In this post, we demonstrated how the combination of Amazon Bedrock Data Automation and SageMaker AI delivers automation efficiency and human-level accuracy for both single-page and multi-page document processing.

    We encourage you to explore this pattern with your own document processing challenges. The solution is designed to be adaptable across various document types and can be customized to meet specific business requirements. Try out the complete implementation available in our GitHub repository , where you’ll find all the code and configuration needed to get started.

    To learn more about document intelligence solutions on AWS, visit the Amazon Bedrock Data Automation documentation  and SageMaker AI documentation .

    Please share your experiences in the comments or reach out to the authors with questions. Happy building!


    About the authors

    Joe Morotti is a Solutions Architect at Amazon Web Services (AWS), working with Financial Services customers across the US. He has held a wide range of technical roles and enjoy showing customer’s art of the possible. He is an active member of the AWS Technical Field Communities for Generative AI and Amazon Connect. In his free time, he enjoys spending quality time with his family exploring new places and over analyzing his sports team’s performance.

    Prashanth Ramanathan is a Senior Solutions Architect at AWS, passionate about Generative AI, Serverless and Database technologies. He is a former Senior Principal Engineer at a major financial services firm and has led large-scale cloud migrations and modernization efforts.

    Andy Hall is a Senior Solutions Architect with AWS and is focused on helping Financial Services customers with their digital transformation to AWS. Andy has helped companies to architect, migrate, and modernize large-scale applications to AWS. Over the past 30 years, Andy has led efforts around Software Development, System Architecture, Data Processing, and Development Workflows for large enterprises.

    Vikas Shah is a Solutions Architect at Amazon Web Services who specializes in document intelligence and AI-powered solutions. A technology enthusiast, he combines his expertise in document processing, intelligent search, and generative AI to help enterprises modernize their operations. His innovative approach to solving complex business challenges spans across document management, robotics, and emerging technologies.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Responsible AI for the payments industry – Part 2

    August 10, 2025

    Responsible AI for the payments industry – Part 1

    August 10, 2025

    How Amazon Bedrock powers next-generation account planning at AWS

    August 9, 2025

    Automate enterprise workflows by integrating Salesforce Agentforce with Amazon Bedrock Agents

    August 8, 2025

    The DIVA logistics agent, powered by Amazon Bedrock

    August 8, 2025

    Pioneering AI workflows at scale: A deep dive into Asana AI Studio and Amazon Q index collaboration

    August 7, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Laws, norms, and ethics for AI in health

    May 1, 20252 Views
    Don't Miss

    People trapped under rubble after explosion at Pennsylvania steel plant, rescue operation underway: Official

    August 11, 2025

    A rescue operation is underway for people trapped under rubble following an explosion at a…

    Video Trump puts DC police department under federal control, deploys National Guard

    August 11, 2025

    Judge rejects Trump administration’s request to unseal Ghislaine Maxwell grand jury testimony

    August 11, 2025

    Process multi-page documents with human review using Amazon Bedrock Data Automation and Amazon SageMaker AI

    August 11, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    People trapped under rubble after explosion at Pennsylvania steel plant, rescue operation underway: Official

    August 11, 2025

    Video Trump puts DC police department under federal control, deploys National Guard

    August 11, 2025

    Judge rejects Trump administration’s request to unseal Ghislaine Maxwell grand jury testimony

    August 11, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.