Here, the automatically generated prompt using the sample content from Code Sample 2 identifies fourteen entity examples across six different entity types (person, location, group, concept, field, and geography) and eight relationship examples.

    To assess how this impacts the extraction of the entire dataset, we used both the default and the auto-tuned prompt to generate the entity and relationship outputs. Before we explain the results, let’s review the default prompt’s outputs, which produced seven entities and six relationships, as shown in Code Sample 4. 

    Code Sample 4: Default extraction output


    Using the auto-tuned, domain-specific, automatically generated prompt, we achieved a deeper extraction, producing nine entities and eight relationships, as shown below in Code Sample 5. 

    Code Sample 5: Auto-tuned extraction output


    Compared with the default prompt, the auto-tuned prompt is an improvement, with more entities and more relationships, providing a more comprehensive view of our data. One key difference between this output and the output from the default prompt is the expansion in entity types being extracted. The default prompt is limited to three example types: organization, geography, and person. However, the auto-tuned prompt expands to more example types derived from the sample input text: organization, person, location, and music genre. 

    Putting it all together 

    We can observe a clear difference in the final outputs after using these auto-tuned prompts for indexing the podcast source data. To measure this difference, we compared the size of the resulting knowledge graphs using default with auto-tuned prompts. The following results were achieved while keeping all parameters constant between both runs and using GPT4-Turbo: 

      Entities  Relationships  Communities 
    Default prompt  1796  2851  352 
    Auto-tuned prompt  4896  8210  1027 

    As shown, the use of auto-tuning yields a significantly larger knowledge graph. For example, a prompt that looks for molecules will extract much more from a chemistry dataset than one that looks for people and places. More communities in the knowledge graph means that it can better serve global search queries. While the size of the knowledge graph can be a proxy for its utility, the true measure of utility is observed in end-to-end queries. To demonstrate this, we provide the following side-by-side comparison of a question using the default and auto-tuned prompts.


    We asked GraphRAG the following question: “What’s the relationship between Ashley Llorens and Chicago?” 

    First, we got the following results using the knowledge graph built with the default prompt. We then used the knowledge graph built with our auto-tuned prompt:

    Default query results 

    I am sorry but I am unable to answer this question given the provided data.

    Auto-tuned query results 

    **Origin and Influence**: Ashley Llorens has a significant connection to Chicago, which has deeply influenced his personal and professional life. Growing up in a city known for its vibrant music scene, particularly hip-hop, has shaped his musical pursuits and artistic expression.

    “**Professional Impact**: His background and experiences in Chicago have carried over into his professional environment, notably at the Applied Physics Laboratory, where he has integrated elements of hip-hop culture [Data: Reports (940)].

    With auto-tuned indexing prompts, our knowledge graph became more representative of the dataset’s entities and relationships, enabling it to yield a valid response to the query. 

    Advancing auto-tuning and expanding GraphRag’s capabilities

    Looking ahead, we’re exploring other methods to build on this auto-tuning work. We’re excited to investigate new approaches for creating the core GraphRAG knowledge graph and are also studying ways to measure and evaluate the quality of these graph structures. Additionally, we’re researching methods to better assess performance so that we can identify the types of queries where GraphRAG provides unique value. This includes evaluating human-generated versus auto-tuned prompts, as well as exploring potential improvements to the auto-tuner. 

    Overall, these new auto-tuner developments make GraphRAG much more accessible and turnkey. We hope this auto-tuning work removes many of the challenges involved when working with new datasets. We invite you to try out these capabilities yourself using GraphRAG’s core library (opens in new tab) and our Azure-based solution accelerator, available on GitHub (opens in new tab).





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Leave A Reply