Skip to content

Conversation

@ayomide2021
Copy link

@ayomide2021 ayomide2021 commented Nov 6, 2025

Description

A test has been added for calculate_njklm_values in test_matrix_a_star.py.

This is a feature change and does not break any existing functionality

Type of change

You can delete options that are not relevant.

  • Bug fix - non-breaking change
  • New feature - non-breaking change
  • Breaking change - backwards incompatible change, changes expected behaviour
  • Non-user facing change, structural change, dev functionality, docs ...

Checklist:

  • I have performed a self-review of my own code.
  • I have commented my code appropriately, focusing on explaining my design decisions (explain why, not how).
  • I have made corresponding changes to the documentation (comments, docstring, etc.. )
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have updated the change log.

Peer review

Any new code includes all the following:

  • Documentation: docstrings, comments have been added/ updated.
  • Style guidelines: New code conforms to the project's contribution guidelines.
  • Functionality: The code works as expected, handles expected edge cases and exceptions are handled appropriately.
  • Complexity: The code is not overly complex, logic has been split into appropriately sized functions, etc..
  • Test coverage: Unit tests cover essential functions for a reasonable range of inputs and conditions. Added and existing tests pass on my machine.

Review comments

Suggestions should be tailored to the code that you are reviewing. Provide context.
Be critical and clear, but not mean. Ask questions and set actions.

These might include:
  • bugs that need fixing (does it work as expected? and does it work with other code
    that it is likely to interact with?)
  • alternative methods (could it be written more efficiently or with more clarity?)
  • documentation improvements (does the documentation reflect how the code actually works?)
  • additional tests that should be implemented
    • Do the tests effectively assure that it
      works correctly? Are there additional edge cases/ negative tests to be considered?
  • code style improvements (could the code be written more clearly?)

Further reading: code review best practices

@ayomide2021 ayomide2021 requested a review from a team as a code owner November 6, 2025 14:50
@ayomide2021 ayomide2021 added the enhancement New feature or request label Nov 6, 2025
@ayomide2021 ayomide2021 added the python Pull requests that update python code label Nov 6, 2025
@ayomide2021 ayomide2021 linked an issue Nov 6, 2025 that may be closed by this pull request
Comment on lines 71 to 78
expected_output = spark.createDataFrame(
[
(1.0, 5.0, 4.0, 0.0),
],
["col1", "col2", "col3", "col4"],
)

expected_output = expected_output.toPandas()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test will be much more efficient if this was defined in Pandas rather than defined in PySpark and converted to Pandas.

To hard-code a dataset in Pandas, do this:

expected_output = pd.DataFrame(
  [
    list_of_row_1_values,
    list_of_row_2_values,
    etc.
  ],
  list_of_column_names
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw in the docstring/comments that the function is taking a Spark DataFrame. Then processed it and then return Pandas df?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you don't have to make the test inputs and outputs the way that the function makes them. :)

Comment on lines 128 to 142
def test_get_scaled_labelled_x_star():
"""
Tests that makes get_scaled_labelled_x_star returns the correct output
with appropriate inputs
"""

# Arrange

# Act

# Assert


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? It shouldn't be in this branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New branch created for this test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you removed this test entirely when tidying this up. It should remain as a test shell in this branch.

Tests that label_x_star() gives the correct output when provided with
appropriate inputs.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been altered but shouldn't have been.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. I need to create another branch for the Tests that label_x_star

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that on this branch this test is just a test shell.

Comment on lines 61 to 69
test_input = spark.createDataFrame(
[
(1.0, 2.0, 1.0, 0.0),
(0.0, 1.0, 1.0, 0.0),
(0.0, 1.0, 1.0, 0.0),
(0.0, 1.0, 1.0, 0.0),
],
["col1", "col2", "col3", "col4"],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any test input should reflect a real (though small and artificial) test case. This doesn't really do this for two reasons:

  1. The column names are uninformative and don't reflect what real column names would look like in the input.
  2. This function takes a table of delta comparisons as it's input. It's impossible for a delta comparison table to contain 2.0 as it's an indicator matrix (of an indicator matrix). So, the test_input here is actually impossible.

Please take a look at the output of compare_deltas function in .../src/indicator_matrix/indicator_matrix.py to see what the input of this function should look like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better!

If this is exactly the same dataset as that used as the test output for test_compare_deltas, it could be defined once as a fixture in conftest.py and re-used.

Copy link
Contributor

@mary-cleaton mary-cleaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see code-specific comments and address these before resubmitting for review. Thanks!

@mary-cleaton mary-cleaton requested review from a team November 14, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add unit test: matrix_a_star: calculate_njklm_values

3 participants