Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Build a Text-to-SQL solution for data consistency in generative AI using Amazon Nova

    June 9, 2025

    Select list of winners at the 2025 Tony Awards

    June 9, 2025

    Judge blocks Florida from enforcing social media ban for kids while lawsuit continues

    June 9, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » AWS and DXC collaborate to deliver customizable, near real-time voice-to-voice translation capabilities for Amazon Connect
    AI AWS

    AWS and DXC collaborate to deliver customizable, near real-time voice-to-voice translation capabilities for Amazon Connect

    adminBy adminFebruary 22, 2025No Comments9 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Providing effective multilingual customer support in global businesses presents significant operational challenges. Through collaboration between AWS and DXC Technology, we’ve developed a scalable voice-to-voice (V2V) translation prototype that transforms how contact centers handle multi-lingual customer interactions.

    In this post, we discuss how AWS and DXC used Amazon Connect and other AWS AI services to deliver near real-time V2V translation capabilities.

    Challenge: Serving customers in multiple languages

    In Q3 2024, DXC Technology approached AWS with a critical business challenge: their global contact centers needed to serve customers in multiple languages without the exponential cost of hiring language-specific agents for the lower volume languages. Previously, DXC had explored several existing alternatives but found limitations in each approach – from communication constraints to infrastructure requirements that impacted reliability, scalability, and operational costs. DXC and AWS decided to organize a focused hackathon where DXC and AWS Solution Architects collaborated to:

    • Define essential requirements for real-time translation
    • Establish latency and accuracy benchmarks
    • Create seamless integration paths with existing systems
    • Develop a phased implementation strategy
    • Prepare and test an initial proof of concept setup

    Business impact

    For DXC, this prototype was used as an enabler, allowing technical talent maximization, operational transformation, and cost improvements through:

    • Best technical expertise delivery – Hiring and matching agents based on technical knowledge rather than spoken language, making sure customers get top technical support regardless of language barriers
    • Global operational flexibility – Removing geographical and language constraints in hiring, placement, and support delivery while maintaining consistent service quality across all languages
    • Cost reduction – Eliminating multi-language expertise premiums, specialized language training, and infrastructure costs through pay-per-use translation model
    • Similar experience to native speakers – Maintaining natural conversation flow with near real-time translation and audio feedback, while delivering premium technical support in customer’s preferred language

    Solution overview

    The Amazon Connect V2V translation prototype uses AWS advanced speech recognition and machine translation technologies to enable real-time conversation translation between agents and customers, allowing them to speak in their preferred languages while having natural conversations. It consists of the following key components:

    • Speech recognition – The customer’s spoken language is captured and converted into text using Amazon Transcribe, which serves as the speech recognition engine. The transcript (text) is then fed into the machine translation engine.
    • Machine translation – Amazon Translate, the machine translation engine, translates the customer’s transcript into the agent’s preferred language in near real time. The translated transcript is converted back into speech using Amazon Polly, which serves as the text-to-speech engine.
    • Bidirectional translation – The process is reversed for the agent’s response, translating their speech into the customer’s language and delivering the translated audio to the customer.
    • Seamless integration – The V2V translation sample project integrates with Amazon Connect, enabling agents to handle customer interactions in multiple languages without any additional effort or training, using the Amazon Connect Streams JS and Amazon Connect RTC JS libraries.

    The prototype can be extended with other AWS AI services to further customize the translation capabilities. It’s open source and ready for customization to meet your specific needs.

    The following diagram illustrates the solution architecture.

    The following screenshot illustrates a sample agent web application.

    The user interface consists of three sections:

    • Contact Control Panel – A softphone client using Amazon Connect
    • Customer Controls – Customer-to-agent interaction controls, including Transcribe Customer Voice, Translate Customer Voice, and Synthesize Customer Voice
    • Agent controls – Agent-to-customer interaction controls, including Transcribe Agent Voice, Translate Agent Voice, and Synthesize Agent Voice

    Challenges when implementing near real-time voice translation

    The Amazon Connect V2V sample project was designed to minimize the audio processing time from the moment the customer or agent finishes speaking until the translated audio stream is started. However, even with the shortest audio processing time, the user experience still doesn’t match the experience of a real conversation when both are speaking the same language. This is due to the specific pattern of the customer only hearing the agent’s translated speech, and the agent only hearing the customer’s translated speech. The following diagram displays that pattern.

    The example workflow consists of the following steps:

    1. The customer starts speaking in their own language, and speaks for 10 seconds.
    2. Because the agent only hears the customer’s translated speech, the agent first hears 10 seconds of silence.
    3. When customer finishes speaking, the audio processing time takes 1–2 seconds, during which time both the customer and agent hear silence.
    4. The customer’s translated speech is streamed to the agent. During that time, the customer hears silence.
    5. When the customer’s translated speech playback is complete, the agent starts speaking, and speaks for 10 seconds.
    6. Because customer only hears the agent’s translated speech, the customer hears 10 seconds of silence.
    7. When the agent finishes speaking, the audio processing time takes 1–2 seconds, during which time both the customer and agent hear silence.
    8. The agent’s translated speech is streamed to the agent. During that time, the agent hears silence.

    In this scenario, the customer hears a single block of 22–24 seconds of a complete silence, from the moment they finished speaking until they hear the agent’s translated voice. This creates a suboptimal experience, because the customer might not be certain what is happening during these 22–24 seconds—for instance, if the agent was able to hear them, or if there was a technical issue.

    Audio streaming add-ons

    In a face-to-face conversation scenario between two people that don’t speak the same language, they might have another person as a translator or interpreter. An example workflow consists of the following steps:

    1. Person A speaks in their own language, which is heard by Person B and the translator.
    2. The translator translates what Person A said to Person B’s language. The translation is heard by Person B and Person A.

    Essentially, Person A and Person B hear each other speaking their own language, and they also hear the translation (from the translator). There’s no waiting in silence, which is even more important in non-face-to-face conversations (such as contact center interactions).

    To optimize the customer/agent experience, the Amazon Connect V2V sample project implements audio streaming add-ons to simulate a more natural conversation experience. The following diagram illustrates an example workflow.

    The workflow consists of the following steps:

    1. The customer starts speaking in their own language, and speaks for 10 seconds.
    2. The agent hears the customer’s original voice, at a lower volume (“Stream Customer Mic to Agent” enabled).
    3. When the customer finishes speaking, the audio processing time takes 1–2 seconds. During that time, the customer and agent hear subtle audio feedback—contact center background noise—at a very low volume (“Audio Feedback” enabled).
    4. The customer’s translated speech is then streamed to the agent. During that time, the customer hears their translated speech, at a lower volume (“Stream Customer Translation to Customer” enabled).
    5. When the customer’s translated speech playback is complete, the agent starts speaking, and speaks for 10 seconds.
    6. The customer hears the agent’s original voice, at a lower volume (“Stream Agent Mic to Customer” enabled).
    7. When the agent finishes speaking, the audio processing time takes 1–2 seconds. During that time, the customer and agent hear subtle audio feedback—contact center background noise—at a very low volume (“Audio Feedback” enabled).
    8. The agent’s translated speech is then streamed to the agent. During that time, the agent hears their translated speech, at a lower volume (“Stream Agent Translation to Agent” enabled).

    In this scenario, the customer hears two short blocks (1–2 seconds) of subtle audio feedback, instead of a single block of 22–24 seconds of complete silence. This pattern is much closer to a face-to-face conversation that includes a translator.

    The audio streaming add-ons provide additional benefits, including:

    • Voice characteristics – In cases when the agent and customer only hear their translated and synthesized speech, the actual voice characteristics are lost. For instance, the agent can’t hear if the customer was talking slow or fast, if the customer was upset or calm, and so on. The translated and synthesized speech doesn’t carry over that information.
    • Quality assurance – In cases when call recording is enabled, only the customer’s original voice and the agent’s synthesized speech are recorded, because the translation and the synthetization are done on the agent (client) side. This makes it difficult for QA teams to properly evaluate and audit the conversations, including the many silent blocks within it. Instead, when the audio streaming add-ons are enabled, there are no silent blocks, and the QA team can hear the agent’s original voice, the customer’s original voice, and their respective translated and synthesized speech, all in a single audio file.
    • Transcription and translation accuracy – Having both the original and translated speech available in the call recording makes it straightforward to detect specific words that would improve transcription accuracy (by using Amazon Transcribe custom vocabularies) or translation accuracy (using Amazon Translate custom terminologies), to make sure that your brand names, character names, model names, and other unique content are transcribed and translated to the desired result.

    Get started with Amazon Connect V2V

    Ready to transform your contact center’s communication? Our Amazon Connect V2V sample project is now available on GitHub. We invite you to explore, deploy, and experiment with this powerful prototype. You can it as a foundation for developing innovative multi-lingual communication solutions in your own contact center, through the following key steps:

    1. Clone the GitHub repository.
    2. Test different configurations for audio streaming add-ons.
    3. Review the sample project’s limitations in the README.
    4. Develop your implementation strategy:
      1. Implement robust security and compliance controls that meet your organization’s standards.
      2. Collaborate with your customer experience team to define your specific use case requirements.
      3. Balance between automation and the agent’s manual controls (for example, use an Amazon Connect contact flow to automatically set contact attributes for preferred languages and audio streaming add-ons).
      4. Use your preferred transcribe, translate, and text-to-speech engines, based on specific language support requirements and business, legal, and regional preferences.
      5. Plan a phased rollout, starting with a pilot group, then iteratively optimize your transcription custom vocabularies and translation custom terminologies.

    Conclusion

    The Amazon Connect V2V sample project demonstrates how Amazon Connect and advanced AWS AI services can break down language barriers, enhance operational flexibility, and reduce support costs. Get started now and revolutionize how your contact center communicates across language barriers!


    About the Authors

    Milos Cosic is a Principal Solutions Architect at AWS.

    EJ Ferrell is a Senior Solutions Architect at AWS.

    Adam El Tanbouli is a Technical Program Manager for Prototyping and Support Services at DXC Modern Workplace.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Build a Text-to-SQL solution for data consistency in generative AI using Amazon Nova

    June 9, 2025

    Multi-account support for Amazon SageMaker HyperPod task governance

    June 8, 2025

    Implement semantic video search using open source large vision models on Amazon SageMaker and Amazon OpenSearch Serverless

    June 7, 2025

    Build a serverless audio summarization solution with Amazon Bedrock and Whisper

    June 7, 2025

    Modernize and migrate on-premises fraud detection machine learning workflows to Amazon SageMaker

    June 6, 2025

    How climate tech startups are building foundation models with Amazon SageMaker HyperPod

    June 5, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Laws, norms, and ethics for AI in health

    May 1, 20252 Views
    Don't Miss

    Build a Text-to-SQL solution for data consistency in generative AI using Amazon Nova

    June 9, 2025

    Businesses rely on precise, real-time insights to make critical decisions. However, enabling non-technical users to…

    Select list of winners at the 2025 Tony Awards

    June 9, 2025

    Judge blocks Florida from enforcing social media ban for kids while lawsuit continues

    June 9, 2025

    ‘AI Maker, Not an AI Taker’: UK Builds Its Vision With NVIDIA Infrastructure

    June 8, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Build a Text-to-SQL solution for data consistency in generative AI using Amazon Nova

    June 9, 2025

    Select list of winners at the 2025 Tony Awards

    June 9, 2025

    Judge blocks Florida from enforcing social media ban for kids while lawsuit continues

    June 9, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.