Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    How AI could speed the development of RNA vaccines and other RNA therapies | MIT News

    August 15, 2025

    2 easyJet planes clip wings at Manchester Airport in UK

    August 15, 2025

    Trump and Putin’s changing relationship to take center stage in Alaska

    August 15, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » NVIDIA Releases Open Dataset, Models for Multilingual Speech AI
    AI Trends

    NVIDIA Releases Open Dataset, Models for Multilingual Speech AI

    adminBy adminAugust 15, 2025No Comments4 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email



    Of around 7,000 languages in the world, a tiny fraction are supported by AI language models. NVIDIA is tackling the problem with a new dataset and models that support the development of high-quality speech recognition and translation AI for 25 European languages — including languages with limited available data like Croatian, Estonian and Maltese.

    These tools will enable developers to more easily scale AI applications to support global users with fast, accurate speech technology for production-scale use cases such as multilingual chatbots, customer service voice agents and near-real-time translation services. They include:

    • Granary, a massive, open-source corpus of multilingual speech datasets that contains around a million hours of audio, including nearly 650,000 hours for speech recognition and over 350,000 hours for speech translation.
    • NVIDIA Canary-1b-v2, a billion-parameter model trained on Granary for high-quality transcription of European languages, plus translation between English and two dozen supported languages.
    • NVIDIA Parakeet-tdt-0.6b-v3, a streamlined, 600-million-parameter model designed for real-time or large-volume transcription of Granary’s supported languages.

    The paper behind Granary will be presented at Interspeech, a language processing conference taking place in the Netherlands, Aug. 17-21. The dataset, as well as the new Canary and Parakeet models, are now available on Hugging Face.

    How Granary Addresses Data Scarcity

    To develop the Granary dataset, the NVIDIA speech AI team collaborated with researchers from Carnegie Mellon University and Fondazione Bruno Kessler. The team passed unlabeled audio through an innovative processing pipeline powered by NVIDIA NeMo Speech Data Processor toolkit that turned it into structured, high-quality data.

    This pipeline allowed the researchers to enhance public speech data into a usable format for AI training, without the need for resource-intensive human annotation. It’s available in open source on GitHub.

    With Granary’s clean, ready-to-use data, developers can get a head start building models that tackle transcription and translation tasks in nearly all of the European Union’s 24 official languages, plus Russian and Ukrainian.

    For European languages underrepresented in human-annotated datasets, Granary provides a critical resource to develop more inclusive speech technologies that better reflect the linguistic diversity of the continent — all while using less training data.

    The team demonstrated in their Interspeech paper that, compared to other popular datasets, it takes around half as much Granary training data to achieve a target accuracy level for automatic speech recognition (ASR) and automatic speech translation (AST).

    Tapping NVIDIA NeMo to Turbocharge Transcription

    The new Canary and Parakeet models offer examples of the kinds of models developers can build with Granary, customized to their target applications. Canary-1b-v2 is optimized for accuracy on complex tasks, while parakeet-tdt-0.6b-v3 is designed for high-speed, low-latency tasks.

    By sharing the methodology behind the Granary dataset and these two models, NVIDIA is enabling the global speech AI developer community to adapt this data processing workflow to other ASR or AST models or additional languages, accelerating speech AI innovation.

    Canary-1b-v2, available under a permissive license, expands the Canary family’s supported languages from four to 25. It offers transcription and translation quality comparable to models 3x larger while running inference up to 10x faster.

    NVIDIA NeMo, a modular software suite for managing the AI agent lifecycle, accelerated speech AI model development. NeMo Curator, part of the software suite, enabled the team to filter out synthetic examples from the source data so that only high-quality samples were used for model training. The team also harnessed the NeMo Speech Data Processor toolkit for tasks like aligning transcripts with audio files and converting data into the required formats.

    Parakeet-tdt-0.6b-v3 prioritizes high throughput and is capable of transcribing 24-minute audio segments in a single inference pass. The model automatically detects the input audio language and transcribes without additional prompting steps.

    Both Canary and Parakeet models provide accurate punctuation, capitalization and word-level timestamps in their outputs.

    Read more on GitHub and get started with Granary on Hugging Face.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Applications Now Open for $60,000 NVIDIA Graduate Fellowship Awards

    August 13, 2025

    Applications Now Open for $60,000 NVIDIA Graduate Fellowship Awards

    August 13, 2025

    FLUX.1 Kontext NIM Microservice Now Available

    August 13, 2025

    Amazon Devices & Services Achieves Major Step Toward Zero-Touch Manufacturing With NVIDIA AI and Digital Twins

    August 11, 2025

    NVIDIA Blackwell Architecture Powers AI Acceleration in Compact Workstations

    August 11, 2025

    CrowdStrike, Uber, Zoom Among Industry Pioneers Building Smarter Agents With NVIDIA Nemotron and Cosmos Reasoning Models for Enterprise and Physical AI Applications

    August 11, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Laws, norms, and ethics for AI in health

    May 1, 20252 Views
    Don't Miss

    How AI could speed the development of RNA vaccines and other RNA therapies | MIT News

    August 15, 2025

    Using artificial intelligence, MIT researchers have come up with a new way to design nanoparticles…

    2 easyJet planes clip wings at Manchester Airport in UK

    August 15, 2025

    Trump and Putin’s changing relationship to take center stage in Alaska

    August 15, 2025

    NVIDIA Releases Open Dataset, Models for Multilingual Speech AI

    August 15, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    How AI could speed the development of RNA vaccines and other RNA therapies | MIT News

    August 15, 2025

    2 easyJet planes clip wings at Manchester Airport in UK

    August 15, 2025

    Trump and Putin’s changing relationship to take center stage in Alaska

    August 15, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.