Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    LA immigration protests live updates: Trump deploys 2,000 National Guard members

    June 8, 2025

    Trump attends UFC championship fight in NJ, taking a break from politics, Musk feud

    June 8, 2025

    What to know about the much-anticipated Nintendo Switch 2 on launch day

    June 8, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » Map Earth’s vegetation in under 20 minutes with Amazon SageMaker
    AI AWS

    Map Earth’s vegetation in under 20 minutes with Amazon SageMaker

    adminBy adminOctober 21, 2024No Comments13 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    In today’s rapidly changing world, monitoring the health of our planet’s vegetation is more critical than ever. Vegetation plays a crucial role in maintaining an ecological balance, providing sustenance, and acting as a carbon sink. Traditionally, monitoring vegetation health has been a daunting task. Methods such as field surveys and manual satellite data analysis are not only time-consuming, but also require significant resources and domain expertise. These traditional approaches are cumbersome. This often leads to delays in data collection and analysis, making it difficult to track and respond swiftly to environmental changes. Furthermore, the high costs associated with these methods limit their accessibility and frequency, hindering comprehensive and ongoing global vegetation monitoring efforts at a planetary scale. In light of these challenges, we have developed an innovative solution to streamline and enhance the efficiency of vegetation monitoring processes on a global scale.

    Transitioning from the traditional, labor-intensive methods of monitoring vegetation health, Amazon SageMaker geospatial capabilities offer a streamlined, cost-effective solution. Amazon SageMaker supports geospatial machine learning (ML) capabilities, allowing data scientists and ML engineers to build, train, and deploy ML models using geospatial data. These geospatial capabilities open up a new world of possibilities for environmental monitoring. With SageMaker, users can access a wide array of geospatial datasets, efficiently process and enrich this data, and accelerate their development timelines. Tasks that previously took days or even weeks to accomplish can now be done in a fraction of the time.

    In this post, we demonstrate the power of SageMaker geospatial capabilities by mapping the world’s vegetation in under 20 minutes. This example not only highlights the efficiency of SageMaker, but also its impact how geospatial ML can be used to monitor the environment for sustainability and conservation purposes.

    Identify areas of interest

    We begin by illustrating how SageMaker can be applied to analyze geospatial data at a global scale. To get started, we follow the steps outlined in Getting Started with Amazon SageMaker geospatial capabilities. We start with the specification of the geographical coordinates that define a bounding box covering the areas of interest. This bounding box acts as a filter to select only the relevant satellite images that cover the Earth’s land masses.

    import os
    import json
    import time
    import boto3
    import geopandas
    from shapely.geometry import Polygon
    import leafmap.foliumap as leafmap
    import sagemaker
    import sagemaker_geospatial_map
    
    session = boto3.Session()
    execution_role = sagemaker.get_execution_role()
    sg_client = session.client(service_name="sagemaker-geospatial")
    cooridinates =[
        [-179.034845, -55.973798],
        [179.371094, -55.973798],
        [179.371094, 83.780085],
        [-179.034845, 83.780085],
        [-179.034845, -55.973798]
    ]           
    polygon = Polygon(cooridinates)
    world_gdf = geopandas.GeoDataFrame(index=[0], crs="epsg:4326", geometry=[polygon])
    m = leafmap.Map(center=[37, -119], zoom=4)
    m.add_basemap('Esri.WorldImagery')
    m.add_gdf(world_gdf, layer_name="AOI", style={"color": "red"})
    m

    Sentinel 2 coverage of Earth's land mass

    Data acquisition

    SageMaker geospatial capabilities provide access to a wide range of public geospatial datasets, including Sentinel-2, Landsat 8, Copernicus DEM, and NAIP. For our vegetation mapping project, we’ve selected Sentinel-2 for its global coverage and update frequency. The Sentinel-2 satellite captures images of Earth’s land surface at a resolution of 10 meters every 5 days. We pick the first week of December 2023 in this example. To make sure we cover most of the visible earth surface, we filter for images with less than 10% cloud coverage. This way, our analysis is based on clear and reliable imagery.

    search_rdc_args = {
        "Arn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8", # sentinel-2 L2A
        "RasterDataCollectionQuery": {
            "AreaOfInterest": {
                "AreaOfInterestGeometry": {
                    "PolygonGeometry": {
                        "Coordinates": [
                            [
                                [-179.034845, -55.973798],
                                [179.371094, -55.973798],
                                [179.371094, 83.780085],
                                [-179.034845, 83.780085],
                                [-179.034845, -55.973798]
                            ]
                        ]
                    }
                }
            },
            "TimeRangeFilter": {
                "StartTime": "2023-12-01T00:00:00Z",
                "EndTime": "2023-12-07T23:59:59Z",
            },
            "PropertyFilters": {
                "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0, "UpperBound": 10}}}],
                "LogicalOperator": "AND",
            },
        }
    }
    
    s2_items = []
    s2_tile_ids = []
    s2_geometries = {
        'id': [],
        'geometry': [],
    }
    while search_rdc_args.get("NextToken", True):
        search_result = sg_client.search_raster_data_collection(**search_rdc_args)
        for item in search_result["Items"]:
            s2_id = item['Id']
            s2_tile_id = s2_id.split('_')[1]
            # filtering out tiles cover the same area
            if s2_tile_id not in s2_tile_ids:
                s2_tile_ids.append(s2_tile_id)
                s2_geometries['id'].append(s2_id)
                s2_geometries['geometry'].append(Polygon(item['Geometry']['Coordinates'][0]))
                del item['DateTime']
                s2_items.append(item)  
    
        search_rdc_args["NextToken"] = search_result.get("NextToken")
    
    print(f"{len(s2_items)} unique Sentinel-2 images found.")

    By utilizing the search_raster_data_collection function from SageMaker geospatial, we identified 8,581 unique Sentinel-2 images taken in the first week of December 2023. To validate the accuracy in our selection, we plotted the footprints of these images on a map, confirming that we had the correct images for our analysis.

    s2_gdf = geopandas.GeoDataFrame(s2_geometries)
    m = leafmap.Map(center=[37, -119], zoom=4)
    m.add_basemap('OpenStreetMap')
    m.add_gdf(s2_gdf, layer_name="Sentinel-2 Tiles", style={"color": "blue"})
    m

    Sentinel 2 image footprints

    SageMaker geospatial processing jobs

    When querying data with SageMaker geospatial capabilities, we received comprehensive details about our target images, including the data footprint, properties around spectral bands, and hyperlinks for direct access. With these hyperlinks, we can bypass traditional memory and storage-intensive methods of first downloading and subsequently processing images locally—a task made even more daunting by the size and scale of our dataset, spanning over 4 TB. Each of the 8,000 images are large in size, have multiple channels, and are individually sized at approximately 500 MB. Processing multiple terabytes of data on a single machine would be time-prohibitive. Although setting up a processing cluster is an alternative, it introduces its own set of complexities, from data distribution to infrastructure management. SageMaker geospatial streamlines this with Amazon SageMaker Processing. We use the purpose-built geospatial container with SageMaker Processing jobs for a simplified, managed experience to create and run a cluster. With just a few lines of code, you can scale out your geospatial workloads with SageMaker Processing jobs. You simply specify a script that defines your workload, the location of your geospatial data on Amazon Simple Storage Service (Amazon S3), and the geospatial container. SageMaker Processing provisions cluster resources for you to run city-, country-, or continent-scale geospatial ML workloads.

    For our project, we’re using 25 clusters, with each cluster comprising 20 instances, to scale out our geospatial workload. Next, we divided the 8,581 images into 25 batches for efficient processing. Each batch contains approximately 340 images. These batches are then evenly distributed across the machines in a cluster. All batch manifests are uploaded to Amazon S3, ready for the processing job, so each segment is processed swiftly and efficiently.

    def s2_item_to_relative_metadata_url(item):
        parts = item["Assets"]["visual"]["Href"].split("/")
        tile_prefix = parts[4:-1]
        return "{}/{}.json".format("/".join(tile_prefix), item["Id"])
    
    
    num_jobs = 25
    num_instances_per_job = 20 # maximum 20
    
    manifest_list = {}
    for idx in range(num_jobs):
        manifest = [{"prefix": "s3://sentinel-cogs/sentinel-s2-l2a-cogs/"}]
        manifest_list[idx] = manifest
    # split the manifest for N processing jobs
    for idx, item in enumerate(s2_items):
        job_idx = idx%num_jobs
        manifest_list[job_idx].append(s2_item_to_relative_metadata_url(item))
        
    # upload the manifest to S3
    sagemaker_session = sagemaker.Session()
    s3_bucket_name = sagemaker_session.default_bucket()
    s3_prefix = 'processing_job_demo'
    s3_client = boto3.client("s3")
    s3 = boto3.resource("s3")
    
    manifest_dir = "manifests"
    os.makedirs(manifest_dir, exist_ok=True)
    
    for job_idx, manifest in manifest_list.items():
        manifest_file = f"{manifest_dir}/manifest{job_idx}.json"
        s3_manifest_key = s3_prefix + "/" + manifest_file
        with open(manifest_file, "w") as f:
            json.dump(manifest, f)
    
        s3_client.upload_file(manifest_file, s3_bucket_name, s3_manifest_key)
        print("Uploaded {} to {}".format(manifest_file, s3_manifest_key))

    With our input data ready, we now turn to the core analysis that will reveal insights into vegetation health through the Normalized Difference Vegetation Index (NDVI). NDVI is calculated from the difference between Near-infrared (NIR) and Red reflectances, normalized by their sum, yielding values that range from -1 to 1. Higher NDVI values signal dense, healthy vegetation, a value of zero indicates no vegetation, and negative values usually point to water bodies. This index serves as a critical tool for assessing vegetation health and distribution. The following is an example of what NDVI looks like.

    Sentinel 2 true color image and NDVI

    %%writefile scripts/compute_vi.py
    
    import os
    import rioxarray
    import json
    import gc
    import warnings
    
    warnings.filterwarnings("ignore")
    
    if __name__ == "__main__":
        print("Starting processing")
    
        input_path = "/opt/ml/processing/input"
        output_path = "/opt/ml/processing/output"
        input_files = []
        items = []
        for current_path, sub_dirs, files in os.walk(input_path):
            for file in files:
                if file.endswith(".json"):
                    full_file_path = os.path.join(input_path, current_path, file)
                    input_files.append(full_file_path)
                    with open(full_file_path, "r") as f:
                        items.append(json.load(f))
    
        print("Received {} input files".format(len(input_files)))
    
        for item in items:
            print("Computing NDVI for {}".format(item["id"]))
            red_band_url = item["assets"]["red"]["href"]
            nir_band_url = item["assets"]["nir"]["href"]
            scl_mask_url = item["assets"]["scl"]["href"]
            red = rioxarray.open_rasterio(red_band_url, masked=True)
            nir = rioxarray.open_rasterio(nir_band_url, masked=True)
            scl = rioxarray.open_rasterio(scl_mask_url, masked=True)
            scl_interp = scl.interp(
                x=red["x"], y=red["y"]
            )  # interpolate SCL to the same resolution as Red and NIR bands
    
            # mask out cloudy pixels using SCL (https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm-overview)
            # class 8: cloud medium probability
            # class 9: cloud high probability
            # class 10: thin cirrus
            red_cloud_masked = red.where((scl_interp != 8) & (scl_interp != 9) & (scl_interp != 10))
            nir_cloud_masked = nir.where((scl_interp != 8) & (scl_interp != 9) & (scl_interp != 10))
    
            ndvi = (nir_cloud_masked - red_cloud_masked) / (nir_cloud_masked + red_cloud_masked)
            # save the ndvi as geotiff
            s2_tile_id = red_band_url.split("/")[-2]
            file_name = f"{s2_tile_id}_ndvi.tif"
            output_file_path = f"{output_path}/{file_name}"
            ndvi.rio.to_raster(output_file_path)
            print("Written output: {}".format(output_file_path))
    
            # keep memory usage low
            del red
            del nir
            del scl
            del scl_interp
            del red_cloud_masked
            del nir_cloud_masked
            del ndvi
    
            gc.collect()

    Now we have the compute logic defined, we’re ready to start the geospatial SageMaker Processing job. This involves a straightforward three-step process: setting up the compute cluster, defining the computation specifics, and organizing the input and output details.

    First, to set up the cluster, we decide on the number and type of instances required for the job, making sure they’re well-suited for geospatial data processing. The compute environment itself is prepared by selecting a geospatial image that comes with all commonly used packages for processing geospatial data.

    Next, for the input, we use the previously created manifest that lists all image hyperlinks. We also designate an S3 location to save our results.

    With these elements configured, we’re able to initiate multiple processing jobs at once, allowing them to operate concurrently for efficiency.

    from multiprocessing import Process
    import sagemaker
    import boto3 
    from botocore.config import Config
    from sagemaker import get_execution_role
    from sagemaker.sklearn.processing import ScriptProcessor
    from sagemaker.processing import ProcessingInput, ProcessingOutput
    
    role = get_execution_role()
    geospatial_image_uri = '081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest'
    # use the retry behaviour of boto3 to avoid throttling issue
    sm_boto = boto3.client('sagemaker', config=Config(connect_timeout=5, read_timeout=60, retries={'max_attempts': 20}))
    sagemaker_session = sagemaker.Session(sagemaker_client = sm_boto)
    
    def run_job(job_idx):
        s3_manifest = f"s3://{s3_bucket_name}/{s3_prefix}/{manifest_dir}/manifest{job_idx}.json"
        s3_output = f"s3://{s3_bucket_name}/{s3_prefix}/output"
        script_processor = ScriptProcessor(
            command=['python3'],
            image_uri=geospatial_image_uri,
            role=role,
            instance_count=num_instances_per_job,
            instance_type="ml.m5.xlarge",
            base_job_name=f'ca-s2-nvdi-{job_idx}',
            sagemaker_session=sagemaker_session,
        )
    
        script_processor.run(
            code="scripts/compute_vi.py",
            inputs=[
                ProcessingInput(
                    source=s3_manifest,
                    destination='/opt/ml/processing/input/',
                    s3_data_type="ManifestFile",
                    s3_data_distribution_type="ShardedByS3Key"
                ),
            ],
            outputs=[
                ProcessingOutput(
                    source="/opt/ml/processing/output/",
                    destination=s3_output,
                    s3_upload_mode="Continuous"
                )
            ],
        )
        time.sleep(2)
    
    processes = []
    for idx in range(num_jobs):
        p = Process(target=run_job, args=(idx,))
        processes.append(p)
        p.start()
        
    for p in processes:
        p.join()

    After you launch the job, SageMaker automatically spins up the required instances and configures the cluster to process the images listed in your input manifest. This entire setup operates seamlessly, without needing your hands-on management. To monitor and manage the processing jobs, you can use the SageMaker console. It offers real-time updates on the status and completion of your processing tasks. In our example, it took under 20 minutes to process all 8,581 images with 500 instances. The scalability of SageMaker allows for faster processing times if needed, simply by increasing the number of instances.

    Sagemaker processing job portal

    Conclusion

    The power and efficiency of SageMaker geospatial capabilities have opened new doors for environmental monitoring, particularly in the realm of vegetation mapping. Through this example, we showcased how to process over 8,500 satellite images in less than 20 minutes. We not only demonstrated the technical feasibility, but also showcased the efficiency gains from using the cloud for environmental analysis. This approach illustrates a significant leap from traditional, resource-intensive methods to a more agile, scalable, and cost-effective approach. The flexibility to scale processing resources up or down as needed, combined with the ease of accessing and analyzing vast datasets, positions SageMaker as a transformative tool in the field of geospatial analysis. By simplifying the complexities associated with large-scale data processing, SageMaker enables scientists, researchers, and businesses stakeholders to focus more on deriving insights and less on infrastructure and data management.

    As we look to the future, the integration of ML and geospatial analytics promises to further enhance our understanding of the planet’s ecological systems. The potential to monitor changes in real time, predict future trends, and respond with more informed decisions can significantly contribute to global conservation efforts. This example of vegetation mapping is just the beginning for running planetary-scale ML. See Amazon SageMaker geospatial capabilities to learn more.


    About the Author

    Xiong Zhou is a Senior Applied Scientist at AWS. He leads the science team for Amazon SageMaker geospatial capabilities. His current area of research includes LLM evaluation and data generation. In his spare time, he enjoys running, playing basketball and spending time with his family.

    Anirudh Viswanathan is a Sr Product Manager, Technical – External Services with the SageMaker geospatial ML team. He holds a Masters in Robotics from Carnegie Mellon University, an MBA from the Wharton School of Business, and is named inventor on over 40 patents. He enjoys long-distance running, visiting art galleries and Broadway shows.

    Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions and building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in domains such as autonomous driving.

    Li Erran Li is the applied science manager at humain-in-the-loop services, AWS AI, Amazon. His research interests are 3D deep learning, and vision and language representation learning. Previously he was a senior scientist at Alexa AI, the head of machine learning at Scale AI and the chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and the machine learning platform team at Uber working on machine learning for autonomous driving, machine learning systems and strategic initiatives of AI. He started his career at Bell Labs and was adjunct professor at Columbia University. He co-taught tutorials at ICML’17 and ICCV’19, and co-organized several workshops at NeurIPS, ICML, CVPR, ICCV on machine learning for autonomous driving, 3D vision and robotics, machine learning systems and adversarial machine learning. He has a PhD in computer science at Cornell University. He is an ACM Fellow and IEEE Fellow.

    Amit Modi is the product leader for SageMaker MLOps, ML Governance, and Responsible AI at AWS. With over a decade of B2B experience, he builds scalable products and teams that drive innovation and deliver value to customers globally.

    Kris Efland is a visionary technology leader with a successful track record in driving product innovation and growth for over 20 years. Kris has helped create new products including consumer electronics and enterprise software across many industries, at both startups and large companies. In his current role at Amazon Web Services (AWS), Kris leads the Geospatial AI/ML category. He works at the forefront of Amazon’s fastest-growing ML service, Amazon SageMaker, which serves over 100,000 customers worldwide. He recently led the launch of Amazon SageMaker’s new geospatial capabilities, a powerful set of tools that allow data scientists and machine learning engineers to build, train, and deploy ML models using satellite imagery, maps, and location data. Before joining AWS, Kris was the Head of Autonomous Vehicle (AV) Tools and AV Maps for Lyft, where he led the company’s autonomous mapping efforts and toolchain used to build and operate Lyft’s fleet of autonomous vehicles. He also served as the Director of Engineering at HERE Technologies and Nokia and has co-founded several startups..



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Implement semantic video search using open source large vision models on Amazon SageMaker and Amazon OpenSearch Serverless

    June 7, 2025

    Build a serverless audio summarization solution with Amazon Bedrock and Whisper

    June 7, 2025

    Modernize and migrate on-premises fraud detection machine learning workflows to Amazon SageMaker

    June 6, 2025

    How climate tech startups are building foundation models with Amazon SageMaker HyperPod

    June 5, 2025

    Impel enhances automotive dealership customer experience with fine-tuned LLMs on Amazon SageMaker

    June 4, 2025

    Deploy Amazon SageMaker Projects with Terraform Cloud

    June 2, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Laws, norms, and ethics for AI in health

    May 1, 20252 Views
    Don't Miss

    LA immigration protests live updates: Trump deploys 2,000 National Guard members

    June 8, 2025

    The Trump administration is deploying the California National Guard in response to protests in Los…

    Trump attends UFC championship fight in NJ, taking a break from politics, Musk feud

    June 8, 2025

    What to know about the much-anticipated Nintendo Switch 2 on launch day

    June 8, 2025

    Musk appears to delete X posts claiming Trump was in Epstein files

    June 8, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    LA immigration protests live updates: Trump deploys 2,000 National Guard members

    June 8, 2025

    Trump attends UFC championship fight in NJ, taking a break from politics, Musk feud

    June 8, 2025

    What to know about the much-anticipated Nintendo Switch 2 on launch day

    June 8, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.