Skip to content

Reduce complexity of CSVParser.createHeaders() #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sigee
Copy link
Contributor

@sigee sigee commented Sep 19, 2024

  • Refactor Headers class to encapsulate header mapping logic
  • Extract headerRecord creation

@sigee sigee force-pushed the reduce-complexity branch from 1eaceb1 to 9868b24 Compare June 20, 2025 13:26
@ppkarwasz
Copy link

Hi @sigee,

Thanks for the PR. After reviewing the proposed change, I'm not convinced it simplifies the current implementation. The method name buildHeaderNames doesn't clearly convey its purpose—namely, analyzing and processing the header record. Additionally, its design is somewhat ambiguous: it returns a value like a pure function but also mutates one of its parameters, which makes the method harder to reason about.

Overall, this change appears to increase the cognitive complexity instead of reducing it. It feels more like refactoring for refactoring's sake rather than a meaningful improvement to readability or maintainability.

@sigee
Copy link
Contributor Author

sigee commented Jun 21, 2025

Hi @ppkarwasz ,

Thank you for taking the time to review my PR. I'd like to clarify my approach and reasoning.
My primary goal was to reduce cognitive complexity as defined in SonarSource's Cognitive Complexity whitepaper. The original method contained multiple levels of nested blocks, which significantly increases cognitive load according to this metric.
I extracted the entire nested if-statement block that follows the comment // build the name to index mappings into a separate method. I chose the name buildHeaderNames() based on that existing comment, though I'm open to suggestions for a more descriptive name.
While I acknowledge that I haven't yet simplified the extracted logic itself, I believe this refactoring does provide measurable benefits:

  • The original createHeaders() method now has reduced nesting levels, making the high-level flow easier to follow
  • The extracted buildHeaderNames() method, while containing the same logic, has fewer nested levels than the original method had overall
  • This separation makes the complex header processing logic more focused

I understand your concerns about the method's dual responsibility (returning a value while mutating a parameter).
I appreciate your feedback and am happy to iterate on this approach.

@garydgregory
Copy link
Member

garydgregory commented Jun 21, 2025

I agree with @ppkarwasz, spreading this code around does not help understand or maintain it IMO.

@ppkarwasz
Copy link

Hi @sigee,

My primary goal was to reduce cognitive complexity as defined in SonarSource's Cognitive Complexity whitepaper. The original method contained multiple levels of nested blocks, which significantly increases cognitive load according to this metric.

Metrics like SonarSource's can help highlight areas of higher complexity. However, I think it's more important to assess the resulting code from a human perspective: Does it become easier to understand and maintain? Personally, I found the refactored version a bit harder to follow and would likely need more time to understand it.

While I acknowledge that I haven't yet simplified the extracted logic itself, I believe this refactoring does provide measurable benefits:

  • The original createHeaders() method now has reduced nesting levels, making the high-level flow easier to follow
  • The extracted buildHeaderNames() method, while containing the same logic, has fewer nested levels than the original method had overall
  • This separation makes the complex header processing logic more focused

I don’t see the nesting as the primary challenge in understanding the method. The more significant cognitive load comes from unpacking the semantic steps of the process—what each block is doing and why.

Here’s an approach I believe could make the method clearer:

  1. Header Retrieval – Depending on the CSVFormat, the method extracts raw header names in different ways. This logic could be encapsulated in a getHeaderNames() method.

  2. Header Validation – The duplicate header name handling could live in a method like:

    private static Headers validateHeaderNames(String[] headerNames);

    This would cleanly isolate the logic for checking and preparing headers.

  3. Headers Construction – The current Headers class requires the manual creation of a list and a map that is then passed to the class constructor. It might be more intuitive and object-oriented to let the class build itself with an add(String name) method, abstracting that construction logic internally.

Let me know what you think—I’m happy to help iterate on this if you agree with some of these ideas.

@sigee
Copy link
Contributor Author

sigee commented Jun 21, 2025

Hi @ppkarwasz ,

I initially wanted to consolidate the dual header data structures - the headerNames List and headerMap Map - since maintaining two separate structures for the same information seemed redundant and confusing. While extracting the logic was a step toward this goal, I've realized that neither structure can be completely eliminated. Your third suggestion about refactoring into a Headers class that encapsulates both structures makes sense, and I'll implement that approach tomorrow to see how it works out.

I also explored your second suggestion, but the extraction isn't feasible due to the numerous dependencies involved - the resulting validate method would need around 6 parameters, making it unwieldy.

The first suggestion might be viable, though there's already a getHeaderNames method that serves a different purpose. Perhaps something like getHeaderRecord(final String[] formatHeader) would be more appropriate.

@sigee sigee force-pushed the reduce-complexity branch from 9868b24 to df0dd4b Compare June 22, 2025 10:36
@garydgregory
Copy link
Member

Instead of spinning our wheels on this "how many angels can dance on the head of a pin" PR which I think is a -1, I offer you all https://issues.apache.org/jira/browse/CSV-257 and org.apache.commons.csv.issues.JiraCsv257Test which is an actual issue worth thinking about IMO. What's the right thing to do here? Please comment in the Jira.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants