Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    1.2M immigrants are gone from the US labor force under Trump, preliminary data shows

    September 1, 2025

    A strong earthquake in eastern Afghanistan kills at least 250 people and injures 500

    September 1, 2025

    Do Heart Attack Patients Need Beta-Blockers? New Studies Generate Contradictory Results, ETHealthworld

    September 1, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » Empowering air quality research with secure, ML-driven predictive analytics
    AI AWS

    Empowering air quality research with secure, ML-driven predictive analytics

    adminBy adminAugust 31, 2025No Comments16 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Air pollution remains one of Africa’s most pressing environmental health crises, causing widespread illness across the continent. Organizations like sensors.AFRICA have deployed hundreds of air quality sensors to address this challenge, but face a critical data problem: significant gaps in PM2.5 (particulate matter with diameter less than or equal to 2.5 micrometers) measurement records because of power instability and connectivity issues in high-risk regions where physical maintenance is limited. Missing data in PM2.5 datasets reduces statistical power and introduces bias into parameter estimates, leading to unreliable trend detection and flawed conclusions about air quality patterns. These data gaps ultimately compromise evidence-based decision-making for pollution control strategies, health impact assessments, and regulatory compliance.

    In this post, we demonstrate the time-series forecasting capability of Amazon SageMaker Canvas, a low-code no-code (LCNC) machine learning (ML) platform to predict PM2.5 from incomplete datasets. PM2.5 exposure contributes to millions of premature deaths globally through cardiovascular disease, respiratory illness, and systemic health effects, making accurate air quality forecasting a critical public health tool. A key advantage of the forecasting capability of SageMaker Canvas is its robust handling of incomplete data. Traditional air quality monitoring systems often require complete datasets to function properly, meaning they can’t be relied on when sensors malfunction or require maintenance. In contrast, SageMaker Canvas can generate reliable predictions even when faced with gaps in sensor data. This resilience enables continuous operation of air quality monitoring networks despite inevitable sensor failures or maintenance periods, eliminating costly downtime and data gaps. Environmental agencies and public health officials benefit from uninterrupted access to critical air quality information, enabling timely pollution alerts and more comprehensive long-term analysis of air quality trends. By maintaining operational continuity even with imperfect data inputs, SageMaker Canvas significantly enhances the reliability and practical utility of environmental monitoring systems.

    In this post, we provide a data imputation solution using Amazon SageMaker AI, AWS Lambda, and AWS Step Functions. This solution is designed for environmental analysts, public health officials, and business intelligence professionals who need reliable PM2.5 data for trend analysis, reporting, and decision-making. We sourced our sample training dataset from openAFRICA. Our solution predicts PM2.5 values using time-series forecasting. The sample training dataset contained over 15 million records from March 2022 to Oct 2022 in various parts of Kenya and Nigeria—data coming from 23 sensor devices from 15 unique locations. The sample code and workflows can be adapted to create prediction models for your PM2.5 datasets. See our solution’s README for detailed instructions.

    Solution overview

    The solution consists of two main ML components: a training workflow and an inference workflow. These workflows are built using the following services:

    • SageMaker Canvas is used to prepare data and train the prediction model through its no-code interface
    • Batch Transform for inference with Amazon SageMaker AI is used for inference, processing the dataset in bulk to generate predictions
    • Step Functions orchestrates the inferencing process by coordinating the workflow between data retrieval, batch transforming, and database updates, managing workflow state transitions, and making sure that data flows properly through each processing stage
    • Lambda functions perform critical operations at each workflow step: retrieving sensor data from the database in required format, transforming data for model input, sending batches to SageMaker for inferencing, and updating the database with prediction results after processing is complete

    At a high level, the solution works by taking a set of PM2.5 data with gaps and predicts the missing values within the range of plus or minus 4.875 micrograms per cubic meter of the actual PM2.5 concentration. It does this by first training a model on the data using inputs for the specific schema and a historical set of values from the user to guide the training process, which is completed with SageMaker Canvas. After the model is trained on a representative dataset and schema, SageMaker Canvas exports the model for use with batch processing. The Step Functions orchestration calls a Lambda function every 24 hours that takes a dataset of new sensor data that has gaps and initiates a SageMaker batch transform job to predict the missing values. The batch transform job processes the entire dataset at once, and the Lambda function then updates the existing dataset with the results. The new completed dataset with predicted values can now be distributed to public health decision-makers who need complete datasets to effectively analyze the patterns of PM2.5 data.

    We dive into each of these steps in later sections of this post.

    Solution walkthrough

    The following diagram shows our solution architecture:

    Architecture diagram of the solution

    Let’s explore the architecture step by step:

    1. To systematically collect, identify, and fill PM2.5 data gaps caused by sensor limitations and connectivity issues, Amazon EventBridge Scheduler invokes a Step Functions state machine every 24 hours. Step Functions orchestrates the calling of various Lambda functions to perform different steps without handling the complexities of error handling, retries, and state management, providing a serverless workflow that seamlessly coordinates the PM2.5 data imputation process.
    2. The State Machine invokes a Lambda function in your Amazon Virtual Private Cloud (Amazon VPC) that retrieves records containing missing air quality values from the user’s air quality database on Amazon Aurora PostgreSQL-Compatible Edition and stores the records in a CSV file in an Amazon Simple Storage Service (Amazon S3) bucket.
    3. The State Machine then runs a Lambda function that retrieves the records from Amazon S3 and initiates the SageMaker batch transform job in your VPC using your SageMaker model created from your SageMaker Canvas predictive model trained on historical PM2.5 data.
    4. To streamline the batch transform workflow, this solution uses an event-driven approach with EventBridge and Step Functions. EventBridge captures completion events from SageMaker batch transform jobs, while the task token functionality of Step Functions enables extended waiting periods beyond the time limits of Lambda. After processing completes, SageMaker writes the prediction results directly to an S3 bucket.
    5. The final step in the state machine retrieves the predicted values from the S3 bucket and then updates the database in Aurora PostgreSQL-Compatible with the values including a predicted label set to true.

    Prerequisites

    To implement the PM2.5 data imputation solution, you must have the following:

    1. An AWS account with AWS Identity and Access Management (IAM) permissions sufficient to deploy the solution and interact with the database.
    2. The following AWS services:
    3. A local desktop set up with AWS Command Line Interface (AWS CLI) version 2, Python 3.10, AWS Cloud Development Kit (AWS CDK) v2.x, and Git version 2.x.
    4. The AWS CLI set up with the necessary credentials in the desired AWS Region.
    5. Historical air quality sensor data. Note that our solution requires a fixed schema described in the GitHub repo’s README.

    Deploy the solution

    You will run the following steps to complete the deployment:

    1. Prepare your environment by building Python modules locally for Lambda layers, deploying infrastructure using the AWS CDK, and initializing your Aurora PostgreSQL database with sensor data.
    2. Perform steps in the Build your air quality prediction model section to configure a SageMaker Canvas application, followed by training and registering your model in Amazon SageMaker Model Registry.
    3. Create SageMaker model using your registered SageMaker Canvas model by updating infrastructure using the AWS CDK.
    4. Manage future configuration changes using the AWS CDK.

    Step 1: Deploy AWS infrastructure and upload air quality sensor data

    Complete the following steps to deploy the PM2.5 data imputation solution AWS Infrastructure and upload air quality sensor data to Amazon Aurora RDS:

    1. Clone the repository to your local desktop environment using the following command:

    git clone git@github.com:aws-samples/sample-empowering-air-quality-research-secure-machine-learning-predictive-analytics.git

    1. Change to the project directory:

    cd

    1. Follow the deployment steps in the README file up to Model Setup for Batch Transform Inference.

    Step 2: Build your air quality prediction model

    After you create the SageMaker AI domain and the SageMaker AI user profile as part of the CDK deployment steps, follow these steps to build your air quality prediction model

    Configure your SageMaker Canvas application

    1. On the AWS Management Console, go to the SageMaker AI console and select the domain and the user profile that was created under Admin, Configurations, and Domains.
    2. Choose the App Configurations tab, scroll down to the Canvas section, and select Edit.
    3. In Canvas storage configuration, select Encryption and select the dropdown for aws/s3.
    4. In the ML Ops Configuration, turn on the option to Enable Model Registry registration permissions for this user profile.
      • Optionally, in the Local file upload configuration section in your domain’s Canvas App Configuration, you can turn on Enable local file upload.
    5. Choose Submit to save your configuration choices.
    6. In your Amazon SageMaker AI home page, go to the Applications and IDEs section and select Canvas.
    7. Select the SageMaker AI user profile that was created for you by the CDK deployment and choose Open Canvas.
    8. In a new tab, SageMaker Canvas will start creating your application. This takes a few minutes.

    Create and register your prediction model

    In this phase, you develop a prediction model using your historical air quality sensor data.

    Machine learning model training workflow

    The preceding architecture diagram illustrates the end-to-end process for training the SageMaker Canvas prediction model, registering that model and creating a SageMaker model for running inference on newly found PM2.5 data gaps. The training process starts by extracting air quality sensor dataset from the database. The dataset is imported into SageMaker Canvas for predictive analysis. This training dataset is transformed and prepared through data wrangling steps implemented by SageMaker Canvas for building and training ML models.

    Prepare data

    Our solution supports a SageMaker Canvas model trained for a single-target variable prediction based on historical data and performs corresponding data imputation for PM2.5 data gaps. To train your model for predictive analysis, follow the comprehensive End to End Machine Learning workflow in the AWS Canvas Immersion Day workshop, adapting each step to prepare your air quality sensor dataset. Begin with the standard workflow until you reach the data preparation section. Here, you can make several customizations:

    • Filter dataset for single-target value prediction: Your air quality dataset might contain multiple sensor parameters. For single-target value prediction using this solution, filter the dataset to include only PM2.5 measurements.
    • Clean sensor data: Remove records containing sensor fault values. For example, we filtered out values that equal 65535, because 65535 is a common error code for malfunctioning sensors. Adjust this filtering based on the specific error codes your air quality monitoring equipment produces.

    The following image shows our data wrangling Data Flow implemented using above guidance:

    Data Wrangler > Data Flow

    Data wrangler data flow

    • Review generated insights and remove irrelevant data: Review the SageMaker Canvas generated insights and analyses. Evaluate them based on time-series forecasting and geospatial temporal data for air quality patterns and relationships between other columns of impact. See chosen columns of impact in GitHub for guidance. Analyze your dataset to identify rows and columns that impact the prediction and remove data that can reduce prediction accuracy.

    The following image shows our data wrangling Analyses obtained with implementing the above guidance:

    Data Wrangler > Analyses

    Data wrangler analyses obtained from feature engineering report

    Training your prediction model

    After completing your data preparation, proceed to the Train the Model section of the workshop and continue with these specifications:

    1. Select problem type: Select Predictive Analysis as your ML approach. Because our dataset is tabular and contains a timestamp, a target column that has values we’re using to forecast future values, and a device ID column, SageMaker Canvas will choose time series forecasting.
    2. Define target column: Set Value as your target column for predicting PM2.5 values.
    3. Build configuration: Use the Standard Build option for model training because it generally has a higher accuracy. See What happens when you build a model in How custom models work for more information.

    By following these steps, you can create a model optimized for PM2.5 dataset predictive analysis, capable of generating valuable insights. Note that SageMaker Canvas supports retraining the ML model for updated PM2.5 datasets.

    Evaluate the model

    After training your model, proceed to Evaluate the model and review column impact, root mean square error (RMSE) score and other advanced metrics to understand your model’s performance for generating predictions for PM2.5.

    The following image shows our model evaluation statistics achieved.

    A graph showing predicted and actual value correlation

    Add the model to the registry

    Once you are satisfied with your model performance, follow the steps in Register a model version to the SageMaker AI model registry. Make sure to change the approval status to Approved before continuing to run this solution. At the time of this post’s publication, the approval must be updated in Amazon SageMaker Studio.

    Log out of SageMaker Canvas

    After completing your work in SageMaker Canvas, you can log out or configure your application to automatically terminate the workspace instance. A workspace instance is dedicated for your use every time you launch a Canvas application, and you are billed for as long as the instance runs. Logging out or terminating the workspace instance stops the workspace instance billing. For more information, see billing and cost in SageMaker Canvas.

    Step 3: Create a SageMaker model using your registered SageMaker Canvas model

    In the previous steps, you created a SageMaker domain and user profile through CDK deployment (Step 1) and successfully registered your model (Step 2). Now, it’s time to create the SageMaker model in your VPC using the SageMaker Canvas model you registered. Follow Model Setup for Batch Inference and Re-Deploy with Updated Configuration sections in the code README for creating SageMaker model.

    Step 4: Manage future configuration changes

    The same deployment pattern applies to any future configuration modifications you might require, including:

    • Batch transform instance type optimizations
    • Transform job scheduling changes

    Update the relevant parameters in your configuration and run cdk deploy to propagate these changes throughout your solution architecture.

    For a comprehensive list of configurable parameters and their default values, see the configuration file in the repository.

    Execute cdk deploy again to update your infrastructure stack with the your model ID for batch transform operations, replacing the placeholder value initially deployed. This infrastructure-as-code approach helps ensure consistent, version-controlled updates to your data imputation workflow.

    Security best practices

    Security and compliance is a shared responsibility between AWS and the customer, as outlined in the Shared Responsibility Model. We encourage you to review this model for a comprehensive understanding of the respective responsibilities.

    In this solution, we enhanced security by implementing encryption at rest for Amazon S3, Aurora PostgreSQL-Compatible database, and the SageMaker Canvas application. We also enabled encryption in transit by requiring SSL/TLS for all connections from the Lambda functions. We implemented secure database access by providing temporary dynamic credentials through IAM authentication for Amazon RDS, eliminating the need for static passwords. Each Lambda function operates with least privilege access, receiving only the minimal permissions required for its specific function. Finally, we deployed the Lambda functions, Aurora PostgreSQL-Compatible instance, and SageMaker Batch Transform jobs in private subnets of the VPC that do not traverse the public internet. This private network architecture is enabled through VPC endpoints for Amazon S3, SageMaker AI, and AWS Secrets Manager.

    Results

    As shown in the following image, our model, developed using SageMaker Canvas, predicts PM2.5 values with an R-squared of 0.921. Because ML models for PM2.5 prediction frequently achieve R-squared values between 0.80 and 0.98 (see this example from ScienceDirect), our solution is within the range of higher-performing PM2.5 prediction models available today. SageMaker Canvas delivers this performance through its no-code experience, automatically handling model training and optimization without requiring ML expertise from users.

    Machine learning model performance results

    Clean up

    Complete the following steps to clean up your resources:

    1. SageMaker Canvas application cleanup:
      1. On the go to the SageMaker AI console and select the domain that was created under Admin Configurations, and Domains.
      2. Select the user created under User Profiles for that domain.
      3. On the User Details page, navigate to Spaces and Apps, and choose Delete to manually delete your SageMaker AI canvas application and clean up resources.
    2. SageMaker Domain EFS storage cleanup:
      1. Open Amazon EFS and in File systems, delete filesystem tagged as ManagedByAmazonSageMakerResource.
      2. Open VPC and under Security, navigate to Security groups.
      3. On Security groups, select security-group-for-inbound-nfs- and delete all Inbound rules associated with that group.
      4. On Security groups, select security-group-for-outbound-nfs- and delete all associated Outbound rules.
      5. Finally, delete both the security groups: security-group-for-inbound-nfs- and security-group-for-outbound-nfs-.
    3. Use the AWS CDK to clean up the remaining AWS resources:
      1. After the preceding steps are complete, return to your local desktop environment where the GitHub repo was cloned, and change to the project’s infra directory:
        cd /infra
      2. Destroy the resources created with AWS CloudFormation using the AWS CDK:
        cdk destroy
      3. Monitor the AWS CDK process deleting resources created by the solution. If there are any errors, troubleshoot using the CloudFormation console and then retry deletion.

    Conclusion

    The development of accurate PM2.5 prediction models has traditionally required extensive technical expertise, presenting significant challenges for public health researchers studying air pollution’s impact on disease outcomes. From data preprocessing and feature engineering to model selection and hyperparameter tuning, these technical requirements diverted substantial time and effort away from researchers’ core work of analyzing health outcomes and developing evidence-based interventions.SageMaker Canvas transforms this landscape by dramatically reducing the effort required to develop high-performing PM2.5 prediction models. Public health researchers can now generate accurate predictions without mastering complex ML algorithms, iterate quickly through an intuitive interface, and validate models across regions without manual hyperparameter tuning. With this shift to streamlined, accessible prediction capabilities, researchers can dedicate more time to interpreting results, understanding air pollution’s impact on community health, and developing protective interventions for vulnerable populations. The result is more efficient research that responds quickly to emerging air quality challenges and informs timely public health decisions. We invite you to implement this solution for your air quality research or ML-based predictive analytics projects. Our comprehensive deployment steps and customization guidance will help you launch quickly and efficiently. As we continue enhancing this solution, your feedback is invaluable for improving its capabilities and maximizing its impact.


    About the authors

    Nehal Sangoi is a Senior Technical Account Manager at Amazon Web Services. She provides strategic technical guidance to help independent software vendors plan and build solutions using AWS best practices. Connect with Nehal on LinkedIn.

    Ben Peterson is a Senior Technical Account Manager with AWS. He is passionate about enhancing the developer experience and driving customer success. In his role, he provides strategic guidance on using the comprehensive AWS suite of services to modernize legacy systems, optimize performance, and unlock new capabilities. Connect with Ben on LinkedIn.

    Shashank Shrivastava is a Senior Delivery Consultant and Serverless TFC member at AWS. He is passionate about helping customers and developers build modern applications on serverless architecture. As a pragmatic developer and blogger, he promotes community-driven learning and sharing of technology. His interests are software architecture, developer tools, GenAI, and serverless computing. Connect with Shashank on LinkedIn.

    Akshay Singhal is a Senior Technical Account Manager at Amazon Web Services supporting Enterprise Support customers focusing on the Security ISV segment. He provides technical guidance for customers to implement AWS solutions, with expertise spanning serverless architectures and cost optimization. Outside of work, Akshay enjoys traveling, Formula 1, making short movies, and exploring new cuisines. Connect with Akshay on LinkedIn.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Set up custom domain names for Amazon Bedrock AgentCore Runtime agents

    August 31, 2025

    Detect Amazon Bedrock misconfigurations with Datadog Cloud Security

    August 30, 2025

    Introducing auto scaling on Amazon SageMaker HyperPod

    August 29, 2025

    Meet Boti: The AI assistant transforming how the citizens of Buenos Aires access government information with Amazon Bedrock

    August 28, 2025

    Mercury foundation models from Inception Labs are now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

    August 28, 2025

    Speed up delivery of ML workloads using Code Editor in Amazon SageMaker Unified Studio

    August 27, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    How to Enable Remote Access on Windows 10 [Allow RDP]

    May 15, 20252 Views
    Don't Miss

    1.2M immigrants are gone from the US labor force under Trump, preliminary data shows

    September 1, 2025

    It’s tomato season and Lidia is harvesting on farms in California’s Central Valley.She is also…

    A strong earthquake in eastern Afghanistan kills at least 250 people and injures 500

    September 1, 2025

    Do Heart Attack Patients Need Beta-Blockers? New Studies Generate Contradictory Results, ETHealthworld

    September 1, 2025

    Portugal Unites with Spain, Iceland, Slovenia, Greece, Austria and More in a New Record Breaking Tourism Boom in Europe in the First Six Months

    September 1, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    1.2M immigrants are gone from the US labor force under Trump, preliminary data shows

    September 1, 2025

    A strong earthquake in eastern Afghanistan kills at least 250 people and injures 500

    September 1, 2025

    Do Heart Attack Patients Need Beta-Blockers? New Studies Generate Contradictory Results, ETHealthworld

    September 1, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.