Skip to content

Conversation

@natoverse
Copy link
Collaborator

Ensure there are guards to prevent transient failure if a given text unit has no entities or relationships. Cleans up thegraph extraction code to remove significant redundancy.

Fixes #1881

@natoverse natoverse requested a review from a team as a code owner November 11, 2025 00:08
@natoverse natoverse changed the base branch from main to v3/main November 11, 2025 00:08
@dworthen dworthen requested a review from Copilot November 11, 2025 14:22
Copilot finished reviewing on behalf of dworthen November 11, 2025 14:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the graph extraction code to improve robustness and reduce redundancy. It adds guards to prevent transient failures when text units have no entities or relationships, and simplifies the extraction flow by removing NetworkX graph intermediates in favor of direct DataFrame output.

Key changes:

  • Graph extraction now returns DataFrames directly instead of NetworkX graphs, simplifying the data flow
  • Empty entity/relationship guards added to prevent failures on text units with no extracted data
  • Error handling simplified from errors: list[BaseException] | None to error: BaseException | None

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test_gi_entity_extraction.py Simplified tests to use single document extraction with shared test data constant
pipeline_run_result.py Changed error field from list to single exception
run_pipeline.py Updated to use singular error field instead of errors list
summarize_communities/typing.py Removed unused type aliases
extract_graph/typing.py Removed entire file as types are no longer needed
graph_extractor.py Refactored to return DataFrames directly with empty guards for no entities/relationships
extract_graph.py Simplified to work with DataFrames instead of NetworkX graphs
index.py Updated error checking to use singular error field
workflow_callbacks*.py Added pipeline_error callback method across callback implementations
api/index.py Updated to use singular error field and call new error callback
Comments suppressed due to low confidence (2)

packages/graphrag/graphrag/index/operations/extract_graph/extract_graph.py:100

  • The pd.concat() call will raise a ValueError if entity_dfs is an empty list. This can occur if all text units fail to extract any entities. Add a guard to return an empty DataFrame with the correct schema when the list is empty.
def _merge_entities(entity_dfs) -> pd.DataFrame:
    all_entities = pd.concat(entity_dfs, ignore_index=True)

packages/graphrag/graphrag/index/operations/extract_graph/extract_graph.py:113

  • The pd.concat() call will raise a ValueError if relationship_dfs is an empty list. This can occur if all text units fail to extract any relationships. Add a guard to return an empty DataFrame with the correct schema when the list is empty.
def _merge_relationships(relationship_dfs) -> pd.DataFrame:
    all_relationships = pd.concat(relationship_dfs, ignore_index=False)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@natoverse natoverse merged commit 4512ce0 into v3/main Nov 11, 2025
24 checks passed
@natoverse natoverse deleted the empty-graph-guards branch November 11, 2025 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Handling text without any entities and relationships

3 participants