Follow @DigEventHorizon |
GraphRAG uses large language models (LLMs) to create a comprehensive knowledge graph that details entities and their relationships from any collection of text documents. This graph enables GraphRAG to leverage the semantic structure of the data and generate responses to complex queries that require a broad understanding of the entire text. In previous blog posts, we […] The post GraphRAG auto-tuning provides rapid adaptation to new domains appeared first on Microsoft Research. Code Sample 3: Auto-tuned prompt Goal Given a text document that is potentially relevant to this activity, first identify all entities needed from the text in order to capture the information and ideas in the text. Next, report all relationships among the identified entities. Steps Identify all entities. For each identified entity, extract the following information: entity_name: Name of the entity, capitalized entity_type: Suggest several labels or categories for the entity. The categories should not be specific, but should be as general as possible. entity_description: Comprehensive description of the entity’s attributes and activities Format each entity as (““entity”“, , , ) From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other. For each pair of related entities, extract the following information: source_entity: name of the source entity, as identified in step 1 target_entity: name of the target entity, as identified in step 1 relationship_description: explanation as to why you think the source entity and the target entity are related to each other relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity Format each relationship as (““relationship”“, , , , ) Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use {record_delimiter} as the list delimiter. When finished, output Examples ############################# Example 1: Text: CHRIS URMSON: Yeah, no, and it is, right? I think one of the things that people outside of Silicon Valley who haven’t been here don’t realize is that it’s not really. That, like, you know, people talk about Silicon Valley engineers being risk-takers. I think it’s actually the opposite. It’s the realization that if you go and try one of these things and you’re actually good at what you do, if it fails, it fails. You’ll have a job the next day at somewhere else, right? And you’ll have this wealth of experience that people will value. And I think that is something that it’s hard, you know, I’ll categorize this as you know east coast people but, you know, kind of more conventional business folks haven’t — don’t kind of have that sense of the opportunities that are around. And maybe we’ve just been here during a particularly Output: (“entity”, CHRIS URMSON, PERSON, Chris Urmson is a speaker discussing the culture and dynamics of Silicon Valley, particularly the attitude towards risk and failure) (“entity”, SILICON VALLEY, LOCATION, A region in California known for its technology industry and innovative environment, where engineers are perceived as risk-takers) (“entity”, SILICON VALLEY ENGINEERS, GROUP, Engineers working in Silicon Valley, characterized by a culture that values risk-taking and resilience in the face of failure) (“entity”, EAST COAST PEOPLE, GROUP, People from the East Coast of the United States, implied to have a more conventional and less risk-tolerant approach to business compared to those in Silicon Valley) (“relationship”, CHRIS URMSON – SILICON VALLEY, Chris Urmson discusses the culture and dynamics of Silicon Valley, emphasizing the local attitude towards risk and failure, 8) (“relationship”, SILICON VALLEY ENGINEERS – SILICON VALLEY, Silicon Valley Engineers are part of the Silicon Valley ecosystem, embodying the local culture of risk-taking and resilience, 9) (“relationship”, EAST COAST PEOPLE – SILICON VALLEY, East Coast People are contrasted with Silicon Valley individuals in terms of business culture and risk tolerance, 7) Example 2: Text: to ask Dr. Jemison that I think for her, and for me, space was this idea that really inspired us, I think, to go explore new frontiers. You know, it was this imagination of this thing that, you know, for me at least, like made me want to study computer science, because like that was the most interesting terrestrial frontier I could go explore. And like you know, the thing that I wonder about is like what that frontier is, like what that inspiration will be for the next generation of scientists, and engineers and explorers. You know, like maybe it’s synthetic biology, but it’s going to be interesting to see whatever it is. [MUSIC] CHRISTINA WARREN: I couldn’t agree more. I look forward to watching and learning from all of that. All right, well, that’s a wrap. Thank you so much to Mae for joining us today. And to our listeners. Thank you for joining us and Output: (“entity”, SPACE, CONCEPT, Space is described as an inspiring concept that motivates exploration and study in new frontiers, particularly in science and technology) (“entity”, COMPUTER SCIENCE, FIELD, Computer science is highlighted as an interesting terrestrial frontier that the speaker was motivated to explore due to the inspiration from space) (“entity”, SYNTHETIC BIOLOGY, FIELD, Synthetic biology is mentioned as a potential inspiring frontier for the next generation of scientists, engineers, and explorers) (“entity”, CHRISTINA WARREN, PERSON, Christina Warren is the speaker who expresses agreement and looks forward to learning from the developments in new scientific frontiers) (“entity”, MAE, PERSON, Mae is mentioned as a guest who joined Christina Warren in the discussion about future scientific frontiers) (“relationship”, SPACE – COMPUTER SCIENCE, Space as a concept inspired the speaker to study computer science, 8) (“relationship”, CHRISTINA WARREN – MAE, Christina Warren thanks Mae for joining the discussion, 7) Example 3: Text: educational outcomes for kids. And if you look at the children of immigrants in East San Jose or East Palo Alto here in the Silicon Valley, like often, the parents are working two, three jobs. Like, they’re so busy that they have a hard time being engaged with their kids. And sometimes they don’t speak English. And so, like, they don’t even have the linguistic ability. And you can just imagine what a technology like this could do, where it really doesn’t care what language you speak. It can bridge that gap between the parents and the teacher, and it can be there to help the parent understand where the roadblocks are for the child and to even potentially get very personalized to the child’s needs and sort of help them on the things that they’re struggling with. I think it’s really, really very exciting. BILL GATES: Yeah, just the language barriers, we often forget about that. And that comes up in the developing world. India has Output: (“entity”, EAST SAN JOSE, GEO, A region in Silicon Valley where many immigrant families reside, and parents often work multiple jobs) (“entity”, EAST PALO ALTO, GEO, A region in Silicon Valley known for its significant immigrant population and economic challenges) (“entity”, SILICON VALLEY, GEO, A major hub for technology and innovation in California, USA) (“entity”, BILL GATES, PERSON, Prominent technology leader and philanthropist who discusses the impact of technology on overcoming language barriers) (“entity”, TECHNOLOGY, CONCEPT, Refers to new technological solutions that can assist in bridging language gaps between parents and teachers, and provide personalized support to children) (“relationship”, EAST SAN JOSE – SILICON VALLEY, East San Jose is a part of Silicon Valley, 9) (“relationship”, EAST PALO ALTO – SILICON VALLEY, East Palo Alto is located within Silicon Valley, 9) (“relationship”, TECHNOLOGY – BILL GATES, Bill Gates discusses the potential of technology to solve language barriers and educational challenges, 8) Real Data ###################### Text: {input_text} Output: Here, the automatically generated prompt using the sample content from Code Sample 2 identifies fourteen entity examples across six different entity types (person, location, group, concept, field, and geography) and eight relationship examples. To assess how this impacts the extraction of the entire dataset, we used both the default and the auto-tuned prompt to generate the entity and relationship outputs. Before we explain the results, let’s review the default prompt’s outputs, which produced seven entities and six relationships, as shown in Code Sample 4. Code Sample 4: Default extraction output Entities and Relationships: (“entity”, ASHLEY LLORENS, PERSON, Ashley Llorens is a scientist, engineer, hip-hop artist, and vice president at Microsoft. He has worked at Johns Hopkins Applied Physics Laboratory and was nominated by the White House to serve as an AI expert for the Global Partnership on AI. He is also a voting member of the Recording Academy for the Grammy Awards.) (“entity”, JOHNS HOPKINS APPLIED PHYSICS LABORATORY, ORGANIZATION, An institution where Ashley Llorens worked for two decades developing novel AI technologies and served as the founding chief of the intelligent systems center.) (“entity”, WHITE HOUSE OFFICE OF SCIENCE AND TECHNOLOGY POLICY, ORGANIZATION, A U.S. government office that nominated Ashley Llorens to serve as an AI expert for the Global Partnership on AI.) (“entity”, GLOBAL PARTNERSHIP ON AI, ORGANIZATION, An international initiative focused on AI where Ashley Llorens serves as an expert.) (“entity”, RECORDING ACADEMY, ORGANIZATION, An organization responsible for the Grammy Awards, of which Ashley Llorens is a voting member.) (“entity”, MICROSOFT, ORGANIZATION, A multinational technology company where Ashley Llorens joined as a vice president, distinguished scientist, and managing director for Microsoft Research.) (“entity”, KEVIN SCOTT, PERSON, The host of the podcast where Ashley Llorens was interviewed and welcomed to Microsoft.) (“relationship”, ASHLEY LLORENS – JOHNS HOPKINS APPLIED PHYSICS LABORATORY, Ashley Llorens worked at Johns Hopkins Applied Physics Laboratory for two decades, 9) (“relationship”, ASHLEY LLORENS – WHITE HOUSE OFFICE OF SCIENCE AND TECHNOLOGY POLICY, Ashley Llorens was nominated by the White House Office of Science and Technology Policy to serve as an AI expert, 8) (“relationship”, ASHLEY LLORENS – GLOBAL PARTNERSHIP ON AI, Ashley Llorens serves as an AI expert for the Global Partnership on AI, 8) (“relationship”, ASHLEY LLORENS – RECORDING ACADEMY, Ashley Llorens is a voting member of the Recording Academy for the Grammy Awards, 7) (“relationship”, ASHLEY LLORENS – MICROSOFT, Ashley Llorens joined Microsoft as a vice president and managing director for Microsoft Research, 9) (“relationship”, KEVIN SCOTT – ASHLEY LLORENS, Kevin Scott interviewed Ashley Llorens on the podcast and welcomed him to Microsoft, 8) Using the auto-tuned, domain-specific, automatically generated prompt, we achieved a deeper extraction, producing nine entities and eight relationships, as shown below in Code Sample 5. Code Sample 5: Auto-tuned extraction output Entities and Relationships: (“entity”, ASHLEY LLORENS, PERSON, Ashley Llorens is a scientist, engineer, hip-hop artist, and executive at Microsoft. He has a background in AI technology development and has been involved in the music industry as a hip-hop artist and Grammy voting member) (“entity”, KEVIN SCOTT, PERSON, Kevin Scott is the host of the podcast where Ashley Llorens is a guest. He engages with Ashley about his background and career) (“entity”, JOHNS HOPKINS APPLIED PHYSICS LABORATORY, ORGANIZATION, An institution where Ashley Llorens worked for two decades, developing novel AI technologies and leading the intelligent systems center) (“entity”, WHITE HOUSE OFFICE OF SCIENCE AND TECHNOLOGY POLICY, ORGANIZATION, A governmental office that nominated Ashley Llorens to serve as an AI expert for the Global Partnership on AI) (“entity”, GLOBAL PARTNERSHIP ON AI, ORGANIZATION, An international initiative focused on AI where Ashley Llorens serves as an expert.) (“entity”, RECORDING ACADEMY, ORGANIZATION, An organization responsible for the Grammy Awards, of which Ashley Llorens is a voting member.) (“entity”, MICROSOFT, ORGANIZATION, A major technology company where Ashley Llorens recently joined as a vice president, distinguished scientist, and managing director for Microsoft Research) (“entity”, CHICAGO, LOCATION, The city where Ashley Llorens grew up, specifically mentioned as the south side and south suburbs, which influenced his interest in music and technology) (“entity”, HIP-HOP, MUSIC GENRE, A music genre that significantly influenced Ashley Llorens during his childhood in Chicago, leading him to pursue a career in music alongside his technical career) (“relationship”, ASHLEY LLORENS – JOHNS HOPKINS APPLIED PHYSICS LABORATORY, Ashley Llorens worked at Johns Hopkins Applied Physics Laboratory for two decades, developing AI technologies, 9) (“relationship”, ASHLEY LLORENS – WHITE HOUSE OFFICE OF SCIENCE AND TECHNOLOGY POLICY, Ashley Llorens was nominated by the White House Office of Science and Technology Policy to serve as an AI expert, 9) (“relationship”, ASHLEY LLORENS – GLOBAL PARTNERSHIP ON AI, Ashley Llorens serves as an AI expert for the Global Partnership on AI, 9) (“relationship”, ASHLEY LLORENS – RECORDING ACADEMY, Ashley Llorens is a voting member of the Recording Academy for the Grammy Awards, 7) (“relationship”, ASHLEY LLORENS – MICROSOFT, Ashley Llorens recently joined Microsoft as a vice president and managing director for Microsoft Research, 9) (“relationship”, ASHLEY LLORENS – CHICAGO, Ashley Llorens grew up in Chicago, which influenced his early interest in music, particularly hip-hop, 7) (“relationship”, ASHLEY LLORENS – HIP-HOP, Ashley Llorens was deeply influenced by hip-hop music during his upbringing in Chicago, leading him to pursue a career in music, 8) (“relationship”, KEVIN SCOTT – ASHLEY LLORENS, Kevin Scott hosts Ashley Llorens on the podcast, discussing his background and career transitions, 7) Compared with the default prompt, the auto-tuned prompt is an improvement, with more entities and more relationships, providing a more comprehensive view of our data. One key difference between this output and the output from the default prompt is the expansion in entity types being extracted. The default prompt is limited to three example types: organization, geography, and person. However, the auto-tuned prompt expands to more example types derived from the sample input text: organization, person, location, and music genre. Putting it all together We can observe a clear difference in the final outputs after using these auto-tuned prompts for indexing the podcast source data. To measure this difference, we compared the size of the resulting knowledge graphs using default with auto-tuned prompts. The following results were achieved while keeping all parameters constant between both runs and using GPT4-Turbo: Entities Relationships Communities Default prompt 1796 2851 352 Auto-tuned prompt 4896 8210 1027 As shown, the use of auto-tuning yields a significantly larger knowledge graph. For example, a prompt that looks for molecules will extract much more from a chemistry dataset than one that looks for people and places. More communities in the knowledge graph means that it can better serve global search queries. While the size of the knowledge graph can be a proxy for its utility, the true measure of utility is observed in end-to-end queries. To demonstrate this, we provide the following side-by-side comparison of a question using the default and auto-tuned prompts. We asked GraphRAG the following question: “What’s the relationship between Ashley Llorens and Chicago?” First, we got the following results using the knowledge graph built with the default prompt. We then used the knowledge graph built with our auto-tuned prompt: Default query results “I am sorry but I am unable to answer this question given the provided data.“ Auto-tuned query results “**Origin and Influence**: Ashley Llorens has a significant connection to Chicago, which has deeply influenced his personal and professional life. Growing up in a city known for its vibrant music scene, particularly hip-hop, has shaped his musical pursuits and artistic expression.“ “**Professional Impact**: His background and experiences in Chicago have carried over into his professional environment, notably at the Applied Physics Laboratory, where he has integrated elements of hip-hop culture [Data: Reports (940)].“ With auto-tuned indexing prompts, our knowledge graph became more representative of the dataset’s entities and relationships, enabling it to yield a valid response to the query. Advancing auto-tuning and expanding GraphRag’s capabilities Looking ahead, we’re exploring other methods to build on this auto-tuning work. We’re excited to investigate new approaches for creating the core GraphRAG knowledge graph and are also studying ways to measure and evaluate the quality of these graph structures. Additionally, we’re researching methods to better assess performance so that we can identify the types of queries where GraphRAG provides unique value. This includes evaluating human-generated versus auto-tuned prompts, as well as exploring potential improvements to the auto-tuner. Overall, these new auto-tuner developments make GraphRAG much more accessible and turnkey. We hope this auto-tuning work removes many of the challenges involved when working with new datasets. We invite you to try out these capabilities yourself using GraphRAG’s core library (opens in new tab) and our Azure-based solution accelerator, available on GitHub (opens in new tab). Try out GraphRAG Try out GraphRAG Accelerator Opens in a new tabThe post GraphRAG auto-tuning provides rapid adaptation to new domains appeared first on Microsoft Research.
Published: 2024-09-09T16:15:55
Follow @DigEventHorizon |