Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    2 troopers shot while responding to call in Pennsylvania: Police

    August 7, 2025

    A top Republican in the Georgia governor’s race is suing his rival over campaign financing

    August 7, 2025

    Hubble Space Telescope takes picture of comet visiting from another solar system

    August 7, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » Technique improves the reasoning capabilities of large language models | MIT News
    Aritifical Intelligence

    Technique improves the reasoning capabilities of large language models | MIT News

    adminBy adminOctober 12, 2024No Comments6 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email



    Large language models like those that power ChatGPT have shown impressive performance on tasks like drafting legal briefs, analyzing the sentiment of customer reviews, or translating documents into different languages.

    These machine-learning models typically use only natural language to process information and answer queries, which can make it difficult for them to perform tasks that require numerical or symbolic reasoning.

    For instance, a large language model might be able to memorize and recite a list of recent U.S. presidents and their birthdays, but that same model could fail if asked the question “Which U.S. presidents elected after 1950 were born on a Wednesday?” (The answer is Jimmy Carter.)

    Researchers from MIT and elsewhere have proposed a new technique that enables large language models to solve natural language, math and data analysis, and symbolic reasoning tasks by generating programs.

    Their approach, called natural language embedded programs (NLEPs), involves prompting a language model to create and execute a Python program to solve a user’s query, and then output the solution as natural language.

    They found that NLEPs enabled large language models to achieve higher accuracy on a wide range of reasoning tasks. The approach is also generalizable, which means one NLEP prompt can be reused for multiple tasks.

    NLEPs also improve transparency, since a user could check the program to see exactly how the model reasoned about the query and fix the program if the model gave a wrong answer.

    “We want AI to perform complex reasoning in a way that is transparent and trustworthy. There is still a long way to go, but we have shown that combining the capabilities of programming and natural language in large language models is a very good potential first step toward a future where people can fully understand and trust what is going on inside their AI model,” says Hongyin Luo PhD ’22, an MIT postdoc and co-lead author of a paper on NLEPs.

    Luo is joined on the paper by co-lead authors Tianhua Zhang, a graduate student at the Chinese University of Hong Kong; and Jiaxin Ge, an undergraduate at Peking University; Yoon Kim, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); senior author James Glass, senior research scientist and head of the Spoken Language Systems Group in CSAIL; and others. The research will be presented at the Annual Conference of the North American Chapter of the Association for Computational Linguistics.

    Problem-solving with programs

    Many popular large language models work by predicting the next word, or token, given some natural language input. While models like GPT-4 can be used to write programs, they embed those programs within natural language, which can lead to errors in the program reasoning or results.

    With NLEPs, the MIT researchers took the opposite approach. They prompt the model to generate a step-by-step program entirely in Python code, and then embed the necessary natural language inside the program.

    An NLEP is a problem-solving template with four steps. First, the model calls the necessary packages, or functions, it will need to solve the task. Step two involves importing natural language representations of the knowledge the task requires (like a list of U.S. presidents’ birthdays). For step three, the model implements a function that calculates the answer. And for the final step, the model outputs the result as a line of natural language with an automatic data visualization, if needed.

    “It is like a digital calculator that always gives you the correct computation result as long as the program is correct,” Luo says.

    The user can easily investigate the program and fix any errors in the code directly rather than needing to rerun the entire model to troubleshoot.

    The approach also offers greater efficiency than some other methods. If a user has many similar questions, they can generate one core program and then replace certain variables without needing to run the model repeatedly.

    To prompt the model to generate an NLEP, the researchers give it an overall instruction to write a Python program, provide two NLEP examples (one with math and one with natural language), and one test question.

    “Usually, when people do this kind of few-shot prompting, they still have to design prompts for every task. We found that we can have one prompt for many tasks because it is not a prompt that teaches LLMs to solve one problem, but a prompt that teaches LLMs to solve many problems by writing a program,” says Luo.

    “Having language models reason with code unlocks many opportunities for tool use, output validation, more structured understanding into model’s capabilities and way of thinking, and more,” says Leonid Karlinsky, principal scientist at the MIT-IBM Watson AI Lab.

    “No magic here”

    NLEPs achieved greater than 90 percent accuracy when prompting GPT-4 to solve a range of symbolic reasoning tasks, like tracking shuffled objects or playing a game of 24, as well as instruction-following and text classification tasks. The researchers found that NLEPs even exhibited 30 percent greater accuracy than task-specific prompting methods. The method also showed improvements over open-source LLMs. 

    Along with boosting the accuracy of large language models, NLEPs could also improve data privacy. Since NLEP programs are run locally, sensitive user data do not need to be sent to a company like OpenAI or Google to be processed by a model.

    In addition, NLEPs can enable small language models to perform better without the need to retrain a model for a certain task, which can be a costly process.

    “There is no magic here. We do not have a more expensive or fancy language model. All we do is use program generation instead of natural language generation, and we can make it perform significantly better,” Luo says.

    However, an NLEP relies on the program generation capability of the model, so the technique does not work as well for smaller models which have been trained on limited datasets. In the future, the researchers plan to study methods that could make smaller language models generate more effective NLEPs. In addition, they want to investigate the impact of prompt variations on NLEPs to enhance the robustness of the model’s reasoning processes.

    This research was supported, in part, by the Center for Perceptual and Interactive Intelligence of Hong Kong. 



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Reimagining healthcare delivery and public health with AI

    August 7, 2025

    Pioneering AI workflows at scale: A deep dive into Asana AI Studio and Amazon Q index collaboration

    August 7, 2025

    Eco-driving measures could significantly reduce vehicle emissions | MIT News

    August 7, 2025

    School of Architecture and Planning welcomes new faculty for 2025 | MIT News

    August 6, 2025

    Self-adaptive reasoning for science – Microsoft Research

    August 6, 2025

    Build an AI assistant using Amazon Q Business with Amazon S3 clickable URLs

    August 6, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Laws, norms, and ethics for AI in health

    May 1, 20252 Views
    Don't Miss

    2 troopers shot while responding to call in Pennsylvania: Police

    August 7, 2025

    Two state troopers were shot while responding to a call in northeastern Pennsylvania on Thursday,…

    A top Republican in the Georgia governor’s race is suing his rival over campaign financing

    August 7, 2025

    Hubble Space Telescope takes picture of comet visiting from another solar system

    August 7, 2025

    Reimagining healthcare delivery and public health with AI

    August 7, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    2 troopers shot while responding to call in Pennsylvania: Police

    August 7, 2025

    A top Republican in the Georgia governor’s race is suing his rival over campaign financing

    August 7, 2025

    Hubble Space Telescope takes picture of comet visiting from another solar system

    August 7, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.