Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Ukraine drone barrage targets Moscow as Zelenskyy demands accountability for Putin

    June 8, 2025

    Multi-account support for Amazon SageMaker HyperPod task governance

    June 8, 2025

    Ukraine drone barrage targets Moscow as Zelenskyy demands accountability for Putin

    June 8, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » Research Focus: Week of October 28, 2024
    AI Features

    Research Focus: Week of October 28, 2024

    adminBy adminNovember 1, 2024No Comments7 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

    Research Focus: October 28, 2024

    NEW RESEARCH

    FLASH: A Workflow Automation Agent for Diagnosing Recurring Incidents

    Cloud incidents such as unplanned interruptions or performance degradation can reduce customer satisfaction and revenue. Recurring incidents, typically raised by system monitors, allow for timely resolution, but also demand significant human effort for troubleshooting. Automating the diagnosis of recurring incidents would help minimize service downtime, reduce customer impact, and decrease manual labor.

    In a recent paper: FLASH: A Workflow Automation Agent for Diagnosing Recurring Incidents, researchers from Microsoft present an approach that significantly improves diagnostic accuracy. LLM-based agent approaches have proven effective in handling complex tasks requiring multiple logical steps, but still present reliability issues, because they lack specific diagnostic knowledge. FLASH incorporates status supervision to break down complex instructions into manageable pieces aligned with identified status. The researchers generate hindsight using LLMs from past failure experiences, progressively enhancing diagnostic reliability for subsequent incidents. An extensive study of over 250 production incidents from Microsoft in five different workflow automation scenarios shows that the FLASH agent approach outperforms state-of-the-art agent models by an average of 13.2% in terms of accuracy. This underscores the viability of automating the diagnostic process for recurring incidents. 


    NEW RESEARCH

    METAREFLECTION: Learning Instructions for Language Agents using Past Reflections

    Language agents are AI systems that can understand, reason and respond in natural language to complete various tasks. While the latest LLMs are capable enough to power reasonably good language agents, the closed-API model makes it hard to improve them when they perform sub-optimally. Recent studies have explored using techniques like self-reflection and prompt optimization to improve performance. Unfortunately, self-reflection can be used only during the agent’s current run, while contemporary prompt optimization techniques are designed and tested to work on simple single-step agents.

    In a recent paper: METAREFLECTION: Learning Instructions for Language Agents using Past Reflections, researchers from Microsoft introduce a novel offline reinforcement learning technique that enhances the performance of language agents by augmenting a semantic memory based on experiential learnings from past trials. They demonstrate the efficacy of METAREFLECTION across multiple domains, including complex logical reasoning, biomedical semantic similarity, open world question answering, and vulnerability threat detection, in Infrastructure-as-Code, spanning different agent designs. METAREFLECTION boosts language agents’ performance by 4% to 16.82% over the baseline agent implementations and performs on par with existing state-of-the-art prompt optimization techniques while requiring fewer LLM calls. 

    Spotlight: Event Series

    Microsoft Research Forum

    Join us for a continuous exchange of ideas about research in the era of general AI. Watch the first four episodes on demand.


    Opens in a new tab

    NEW RESEARCH

    Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

    Generative AI applications rely on large, foundation models, particularly LLMs. LLMs often have tens to hundreds of billions of parameters, making them too large for a single graphics processing unit (GPU) to handle in terms of both memory and computation. Because of their size, training these models requires distributing the workload across hundreds or even thousands of GPUs. This can lead to significant communication overhead, a challenge that arises when data needs to be shared between different GPUs. 

    In a recent paper: Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping, researchers from Microsoft introduce a system designed to enhance the efficiency of LLM training by reducing the time lost to communication between GPUs. 

    Domino breaks down data dependencies in a single batch of training into smaller, independent pieces. These smaller pieces are processed in parallel, and communication between GPUs happens simultaneously with computation, minimizing delays. 

    Test results comparing Domino to Megatron-LM show that Domino speeds up the training process by up to 1.3x on Nvidia DGX-H100 GPUs. 


    NEW RESEARCH

    Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition

    Data science involves large datasets, source code, domain expertise, and unwritten assumptions. Data scientists describe the need to “have a conversation” with their data to extract information from it. The natural language processing and code generation capabilities of large language models (LLMs) could help tackle the challenging task of data analysis, which requires expertise in data processing, programming, and statistics.  AI chat interfaces for data analysis have grown in popularity. However, in a recent paper: Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition, researchers from Microsoft and the University of Toronto show serious challenges in verifying AI-generated results and guiding AI systems to produce the desired output. 

    The researchers developed two contrasting approaches to address these challenges. The first (Stepwise) decomposes the problem into step-by-step subgoals with pairs of editable assumptions and code until task completion. The second approach (Phasewise) decomposes the entire problem into three editable, logical phases: structured input/output assumptions, execution plan, and code. A controlled, within-subjects experiment compared these systems against a conversational baseline. Users reported significantly greater control with the Stepwise and Phasewise systems, and found intervention, correction, and verification easier, compared to the baseline. The results suggest design guidelines and trade-offs for AI-assisted data analysis tools. 


    NEW RESEARCH

    OmniParser for pure vision-based GUI agent

    Large vision-language models (VLMs) such as GPT-4V and GPT-4o show promise in driving intelligent agent systems that operate within user interfaces (UI). However, VLMs’ full potential remains underexplored in real-world applications, particularly when it comes to acting as general agents across diverse operating systems and applications with only vision input. One limiting factor is the absence of a robust technique for screen parsing which is capable of 1) reliably identifying interactable icons within the user interface, and 2) understanding the semantics of various elements in a screenshot and accurately associating the intended action with the corresponding region on the screen. 

    In a recent article: OmniParser for pure vision-based GUI agent, researchers from Microsoft present a compact screen parsing module that can convert UI screenshots into structured elements. OmniParser can be used with a variety of models to create agents capable of taking actions on UIs. When used with GPT-4V, OmniParser significantly improves the agent capability to generate precisely grounded actions for interface regions. 

    OmniParser with GPT-4V agent achieved the best performance on the recently released  WindowsAgentArena (opens in new tab) benchmark. 

    Microsoft Research in the news


    AI Dreams: Microsoft @ 50, Chapter 1 

    GeekWire | October 16, 2024

    Since the early 1990s, the promise of AI has been a driving force at Microsoft Research, which has a track record of breakthroughs in speech recognition, computer vision, machine learning, and other research that continues to advance the state of the art in AI. 


    Podcast: What’s next for AI, with Peter Lee 

    GeekWire | October 19, 2024

    The weekly GeekWire Podcast features comments from Microsoft Research President Peter Lee on what’s next in AI, including the top three technical challenges. It’s a bonus feature that came from the AI Dreams: Microsoft @ 50 series. Peter’s comments begin at 29:20.


    AI-powered productivity tools that can make life harder 

    Financial Times | October 22, 2024

    Technology used to summarize notes or generate transcripts does not always work for deaf employees. The problem is compounded by a historic lack of input from disabled people into AI products, even some that are marketed as assistive technologies.


    Prompts are Programs 

    ACM SIGPLAN Blog | October 22, 2024

    The challenges and effective strategies for creating robust prompts are not well understood and will evolve as rapidly as the underlying LLM models and systems evolve. The programming languages and software engineering communities must be agile and eager to bring the decades of research and experience building languages and tools for robust software development to this new and important domain.


    Edge 440: Interested in AI Evaluation? Meet Microsoft’s EUREKA 

    TheSequence | October 17, 2024

    This podcast explores EUREKA, a reusable, open evaluation framework designed to standardize evaluations of large foundation models (LFMs). The framework goes beyond single-score reporting and rankings to offer a more comprehensive analysis of LFM capabilities.


    View more news and awards

    Opens in a new tab





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    BenchmarkQED: Automated benchmarking of RAG systems – Microsoft Research

    June 5, 2025

    What AI’s impact on individuals means for the health workforce and industry

    June 2, 2025

    FrodoKEM: A conservative quantum-safe cryptographic algorithm

    May 27, 2025

    Abstracts: Zero-shot models in single-cell biology with Alex Lu

    May 22, 2025

    Abstracts: Aurora with Megan Stanley and Wessel Bruinsma

    May 21, 2025

    Collaborators: Healthcare Innovation to Impact

    May 20, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Laws, norms, and ethics for AI in health

    May 1, 20252 Views
    Don't Miss

    Ukraine drone barrage targets Moscow as Zelenskyy demands accountability for Putin

    June 8, 2025

    LONDON — At least 10 Ukrainian drones were shot down on their approach to Moscow…

    Multi-account support for Amazon SageMaker HyperPod task governance

    June 8, 2025

    Ukraine drone barrage targets Moscow as Zelenskyy demands accountability for Putin

    June 8, 2025

    Canada’s first astronaut and former Foreign Minister Marc Garneau dies at 76

    June 8, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Ukraine drone barrage targets Moscow as Zelenskyy demands accountability for Putin

    June 8, 2025

    Multi-account support for Amazon SageMaker HyperPod task governance

    June 8, 2025

    Ukraine drone barrage targets Moscow as Zelenskyy demands accountability for Putin

    June 8, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.