Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Allahabad High Court Condemns Private Hospitals for Exploiting Patients, ETHealthworld

    July 26, 2025

    Meta will cease political ads in European Union by fall, blaming bloc’s new rules

    July 26, 2025

    Lumbee tribe of North Carolina sees politics snarl recognition by Washington

    July 26, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » Microsoft Research Forum Episode 4: The future of multimodal models, a new “small” language model, and other AI updates
    AI Features

    Microsoft Research Forum Episode 4: The future of multimodal models, a new “small” language model, and other AI updates

    adminBy adminOctober 7, 2024No Comments6 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Microsoft Research Forum is a continuous exchange of ideas about science and technology research in the era of general AI. In the latest episode (opens in new tab), researchers discussed the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization. Researchers at Microsoft are working to explore breakthrough technology that can help advance everything from weather prediction to materials design. 

    Below is a brief recap of the event, including select quotes from the presentations. Register to join future Research Forum episodes and view previous sessions. Transcripts and additional resources can be found in the Research Forum briefing book.

    Keynote

    Phi-3-Vision: A highly capable and “small” language vision model (opens in new tab)

    Research Forum | Episode 4 Keynote | Jianfeng Gao

    Jianfeng Gao introduced Phi-3-Vision, an advanced and economical open-source multimodal model. As a member of the Phi-3 model family, Phi-3-Vision enhances language models by integrating multisensory skills, seamlessly combining language and vision capabilities.

    “Phi-3-Vision is the first multimodal model in the Phi small model family. It matches and sometimes exceeds some of the capabilities of much larger models … at a much lower cost. And to help everyone build more affordable and accessible AI systems, we have released the model weights into the open-source community.”

    — Jianfeng Gao, Distinguished Scientist and Vice President, Microsoft Research Redmond


    Panel Discussion

    Beyond language: The future of multimodal models in healthcare, gaming, and AI (opens in new tab)

    Research Forum | Episode 4 Panel | John Langford, Hoifung Poon, Katja Hofmann, Jianwei Yang

    This discussion examined the transformative potential and core challenges of multimodal models across various domains, including precision health, game intelligence, and foundation models. Microsoft researchers John Langford, Hoifung Poon, Katja Hofmann, and Jianwei Yang shared their thoughts on future directions, bridging gaps, and fostering synergies within the field. 

    “One of the really cutting-edge treatments for cancer these days is immunotherapy. That works by mobilizing the immune system to fight the cancer. And then one of the blockbuster drugs is a KEYTRUDA, that really can work miracles for some of the late- stage cancers … Unfortunately, only 20 to 30 percent of the patients actually respond. So that’s … a marquee example of what are the growth opportunity in precision health.”
    — Hoifung Poon, General Manager, Microsoft Research Health Futures

    “We experience the world through vision, touch, and all our other senses before we start to make sense of any of the language that is spoken around us. So, it’s really, really interesting to think through the implications of that, and potentially, as we start to understand more about the different modalities that we can model and the different ways in which we combine them.”
    — Katja Hofmann, Senior Principal Researcher, Microsoft Research

    “To really have a capable multimodal model, we need to encode different information from different modalities, for example, from vision, from language, from even audio, speech, etc. We need to develop a very capable encoder for each of these domains and then … tokenize each of these raw data.”
    — Jianwei Yang, Principal Researcher, Microsoft Research Redmond


    Lightning Talks

    Analog optical computing for sustainable AI and beyond (opens in new tab)

    Research Forum | Episode 4 Talk 1 | Francesca Parmigiani and Jiaqi Chu

    This talk presented a new kind of computer—an analog optical computer—that has the potential to accelerate AI inference and hard optimization workloads by 100x, leveraging hardware-software co-design to improve the efficiency and sustainability of real-world applications. 

    “Most likely, you or your loved ones have been inside an MRI scan — not really a great place to be in. Imagine if you can reduce that amount of time from 20 to 40 minutes to less than five minutes.”
    — Francesca Parmigiani, Principal Researcher, Microsoft Research Cambridge 

    “I’m really excited to share that we have just completed the second generation of [this] computer. It is much smaller in physical size, and this is a world first in that exactly the same computer is simultaneously solving hard optimization problems and accelerating machine learning inference. Looking ahead, we estimate that at scale, this computer can achieve around 450 tera operations per second per watt, which is a 100-times improvement as compared to state-of-the-art GPUs.”
    — Jiaqi Chu, Principal Researcher, Microsoft Research Cambridge


    Direct Nash Optimization: Teaching language models to self-improve with general preferences (opens in new tab)

    Research Forum | Episode 4 Talk 2 | Corby Rosset

    This talk explored teaching language models to self-improve using AI preference feedback, challenging the model to play against itself and a powerful teacher until it arrives at a Nash equilibrium, resulting in state-of-the-art win rates against GPT-4 Turbo on benchmarks such as AlpacaEval and MT-Bench. 

    “The traditional way to fine-tune an LLM for post-training … basically tells the model to emulate good behaviors, but it does not target or correct any mistakes or bad behaviors that it makes explicitly. … Self-improving post-training explicitly identifies and tries to correct bad behaviors or mistakes that the model makes.”
    — Corby Rosset, Senior Researcher, Microsoft Research AI Frontiers


    Project Aurora: The first large-scale foundation model of the atmosphere (opens in new tab)

    Research Forum | Episode 4 Talk 3 | Megan Stanley

    This talk presented Aurora, a cutting-edge foundation model that offers a new approach to weather forecasting that could transform our ability to predict and mitigate the impacts of extreme events, air pollution, and the changing climate.

    “If we look at Aurora’s ability to predict pollutants such as nitrogen dioxide that are strongly related to emissions from human activity, we can see that the model has learned to make these predictions with no emissions data provided. It’s learned the implicit patterns that cause the gas concentrations, which is very impressive.”
    — Megan Stanley, Senior Researcher, Microsoft Research AI for Science


    A generative model of biology for in-silico experimentation and discovery (opens in new tab)

    Research Forum | Episode 4 Talk 4 | Kevin Yang

    This talk explored how deep learning enables generation of novel and useful biomolecules, allowing researchers and practitioners to better understand biology. This includes EvoDiff, a general-purpose diffusion framework that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models to generate new proteins, given a protein sequence.

    “Often, protein engineers want proteins that perform a similar function to a natural protein, or they want to produce a protein that performs the same function but has other desirable properties, such as stability. By conditioning EvoDiff with a family of related sequences, we can generate new proteins that are very different in sequence space to the natural proteins but are predicted to fold into similar three-dimensional structures. These may be good starting points for finding new functions or for discovering versions of a protein with desirable properties.”
    — Kevin Yang, Senior Researcher, Microsoft Research New England


    Fostering appropriate reliance on AI (opens in new tab)

    Research Forum | Episode 4 Talk 5 | Mihaela Vorvoreanu

    Since AI systems are probabilistic, they can make mistakes. One of the main challenges in human-AI interaction is to avoid overreliance on AI and empower people to determine when to accept or not accept an AI system’s recommendation. This talk explores Microsoft’s work in this area.

    “This is where I think it is our responsibility as people working in UX disciplines—as people researching UX and human-computer interaction—to really, really step up to the front and see how it is our moment to shine and to address this problem.”
    — Mihaela Vorvoreanu, Director UX Research and Responsible AI Education, Microsoft AI Ethics and Effects in Engineering and Research (Aether)

    Opens in a new tab





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Navigating medical education in the era of generative AI

    July 24, 2025

    Xinxing Xu bridges AI research and real-world impact at Microsoft Research Asia – Singapore

    July 24, 2025

    Technical approach for classifying human-AI interactions at scale

    July 23, 2025

    AI Testing and Evaluation: Reflections

    July 21, 2025

    CollabLLM: Teaching LLMs to collaborate with users

    July 15, 2025

    AI Testing and Evaluation: Learnings from cybersecurity

    July 14, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Laws, norms, and ethics for AI in health

    May 1, 20252 Views
    Don't Miss

    Allahabad High Court Condemns Private Hospitals for Exploiting Patients, ETHealthworld

    July 26, 2025

    Prayagraj: The Allahabad High Court has expressed displeasure over patients being treated as “guinea pigs…

    Meta will cease political ads in European Union by fall, blaming bloc’s new rules

    July 26, 2025

    Lumbee tribe of North Carolina sees politics snarl recognition by Washington

    July 26, 2025

    Over 200 cancer day care centres to be established across India: Government, ETHealthworld

    July 26, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Allahabad High Court Condemns Private Hospitals for Exploiting Patients, ETHealthworld

    July 26, 2025

    Meta will cease political ads in European Union by fall, blaming bloc’s new rules

    July 26, 2025

    Lumbee tribe of North Carolina sees politics snarl recognition by Washington

    July 26, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.