Skip to content

feat: add wrapper for MEGAHIT #4121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 23, 2025
Merged

Conversation

alienzj
Copy link
Contributor

@alienzj alienzj commented May 16, 2025

add wrapper for MEGAHIT, which is an ultra-fast and memory-efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly.

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test.py was updated to call any added or updated example rules in a Snakefile
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

  • New Features

    • Added support for the MEGAHIT assembler, including environment setup and metadata.
    • Introduced a Snakemake workflow to automate downloading test data and running MEGAHIT assembly.
    • Provided a wrapper script for seamless integration of MEGAHIT within Snakemake workflows.
  • Tests

    • Added automated tests to verify MEGAHIT wrapper functionality.

Copy link
Contributor

coderabbitai bot commented May 16, 2025

📝 Walkthrough

"""

Walkthrough

This change introduces a new Snakemake wrapper for the MEGAHIT assembler, including its environment and metadata configuration, a test workflow for validating the wrapper, and an associated automated test. The update adds all necessary files for reproducible deployment and testing of MEGAHIT within a Snakemake-based workflow.

Changes

File(s) Change Summary
bio/megahit/environment.yaml Added a Conda environment YAML specifying the megahit package (version 1.2.9) and required channels (conda-forge, bioconda, nodefaults).
bio/megahit/environment.linux-64.pin.txt Added a pinned Conda environment file for linux-64 platform listing exact package versions and URLs for reproducible environment creation including megahit 1.2.9 and dependencies.
bio/megahit/meta.yaml Added a metadata file describing the MEGAHIT package, including its purpose, input/output specifications, authors, and project URL.
bio/megahit/test/Snakefile Introduced a Snakemake test workflow with rules for downloading test reads and running MEGAHIT, including resource and container specifications, logging, and benchmarking.
bio/megahit/wrapper.py Added a Snakemake wrapper script for MEGAHIT. Handles input/output management, flexible input read formats, parameter parsing, memory handling, execution in a temporary directory, and output file organization according to Snakemake conventions.
test_wrappers.py Added a new test function test_megahit to run the MEGAHIT wrapper workflow with specified cores, conda enabled, and forced execution.

Sequence Diagram(s)

sequenceDiagram
    participant Tester as test_megahit (pytest)
    participant Snakemake as Snakemake Engine
    participant Wrapper as MEGAHIT Wrapper (wrapper.py)
    participant MEGAHIT as MEGAHIT Binary

    Tester->>Snakemake: Run workflow (target: run_megahit)
    Snakemake->>Wrapper: Execute wrapper with inputs, params, resources
    Wrapper->>MEGAHIT: Run MEGAHIT (fresh or --continue)
    MEGAHIT-->>Wrapper: Output contigs.fasta and logs
    Wrapper->>Snakemake: Organize outputs, signal completion
    Snakemake-->>Tester: Report test result
Loading

Suggested reviewers

  • fgvieira
  • johanneskoester
    """

Suggested labels

bio/igv-reports

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
bio/megahit/meta.yaml (1)

1-10: 🛠️ Refactor suggestion

Add critical wrapper metadata.
The meta.yaml is missing essential metadata fields such as url linking to MEGAHIT’s documentation (checked off in the QC checklist), maintainers, and license. Please include these to complete the wrapper metadata.

Apply this diff:

 name: "megahit"
+
+url: https://github.com/voutcn/megahit
+
 description: |
   MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized...
 authors:
   - Jie Zhu
+
+maintainers:
+  - alienzj
+
+license: MIT
🧹 Nitpick comments (8)
bio/megahit/environment.yaml (2)

4-4: Review disabling of default channels.
Listing nodefaults as a channel may not disable default channels as intended. If the goal is to prevent usage of default Conda channels, consider adding a top-level nodefaults: true key or instructing users to run with --no-default-channels.


6-6: Fix dependency version syntax.
Conda YAML expects no spaces around the = in version pinning. Update to:

-dependencies:
-  - megahit =1.2.9
+dependencies:
+  - megahit=1.2.9
bio/megahit/test/Snakefile (1)

24-31: Consider adding error handling to the download process.

The current implementation downloads and extracts the test data but doesn't handle potential failures or verify the downloaded file.

 rule download_test_reads:
     output:
         ["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
     log:
         "logs/download.log",
     shell:
-        " wget https://zenodo.org/record/3992790/files/test_reads.tar.gz >> {log} 2>&1 ; "
-        " tar -xzf test_reads.tar.gz >> {log} 2>&1"
+        " (wget -q --no-check-certificate https://zenodo.org/record/3992790/files/test_reads.tar.gz && "
+        " tar -xzf test_reads.tar.gz && "
+        " rm test_reads.tar.gz) >> {log} 2>&1 || "
+        " (echo 'Download or extraction failed' >> {log} 2>&1 && exit 1)"
bio/megahit/wrapper.py (5)

4-4: Update copyright year to current year.

The copyright year is set to 2025, which is in the future.

-__copyright__ = "Copyright 2025, Jie Zhu"
+__copyright__ = "Copyright 2024, Jie Zhu"

53-53: Check if output directory exists before removing.

The current implementation unconditionally removes the output directory, which could be risky.

-    shell("rm -rf {output_dir}")
+    if os.path.exists(output_dir):
+        shell("rm -rf {output_dir}")

89-90: Add checks for existing destination files.

The code should check if destination files already exist to avoid potential conflicts.

     if file_produced != file_renamed:
+        if os.path.exists(file_renamed):
+            os.remove(file_renamed)
         shutil.move(file_produced, file_renamed)

Similar changes should be applied at lines 97-98.


36-36: Clarify the default memory requirement.

The default value "0.9" is unclear - document whether this is a fraction of available memory.

-    memory_requirements = "0.9"
+    memory_requirements = "0.9"  # 0.9 fraction of total memory as per MEGAHIT defaults

71-71: Fix typo in restart message.

There's a typo in the word "pipeline" in the restart message.

-        "echo '\n\nRestart MEGAHIT \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
+        "echo '\n\nRestart MEGAHIT \n Remove pipeline_state file copy files to force copy files if necessary.' >> {log[0]}"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63f5e87 and d607766.

📒 Files selected for processing (5)
  • bio/megahit/environment.yaml (1 hunks)
  • bio/megahit/meta.yaml (1 hunks)
  • bio/megahit/test/Snakefile (1 hunks)
  • bio/megahit/wrapper.py (1 hunks)
  • test_wrappers.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
  • test_wrappers.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🪛 Ruff (0.8.2)
bio/megahit/wrapper.py

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


84-84: Undefined name snakemake

(F821)


87-87: Undefined name snakemake

(F821)


95-95: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (4)
bio/megahit/environment.yaml (1)

1-3: Channels ordering is appropriate.
Using conda-forge followed by bioconda correctly prioritizes the channels for Bioconda installations.

test_wrappers.py (1)

6401-6417: LGTM! The MEGAHIT test function is well structured.

The implementation correctly follows the established pattern for wrapper tests in this file, appropriately testing the new MEGAHIT wrapper with the necessary parameters (cores, conda environment, memory and time resources). The resemblance to the similar test_metaspades function is logical since both are metagenomic assemblers.

bio/megahit/test/Snakefile (1)

4-21: LGTM! Well-structured rule for MEGAHIT execution.

The rule properly configures inputs, outputs, benchmarking, parameters, logs, and resources. The k-mer list is comprehensive and the memory allocation is appropriate for metagenome assembly.

bio/megahit/wrapper.py (1)

45-51: ⚠️ Potential issue

Fix potential indexing issues in format strings.

The format strings use indices that may not align with the position of arguments in the reads list.

-    input_arg = " -1 {0} -2 {1} ".format(*reads)
+    input_arg = " -1 {} -2 {} ".format(reads[0], reads[1])

     if len(reads) >= 3:
-        input_arg += " --12 {2}".format(*reads)
+        input_arg += " --12 {}".format(reads[2])

         if len(reads) >= 4:
-            input_arg += " --read {3}".format(*reads)
+            input_arg += " --read {}".format(reads[3])

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
bio/megahit/wrapper.py (2)

101-106: ⚠️ Potential issue

Fix inconsistency in contigs file name.

The Output_key_mapping dictionary specifies MEGAHIT's output file as "final.contigs.fa" (line 82), but here you're using "contigs.fasta". This inconsistency could cause file not found errors.

 if not has_named_output:
-    file_produced = os.path.join(output_dir, "contigs.fasta")
+    file_produced = os.path.join(output_dir, "final.contigs.fa")
     file_renamed = snakemake.output[0]

     if file_produced != file_renamed:
         shutil.move(file_produced, file_renamed)
🧰 Tools
🪛 Ruff (0.8.2)

103-103: Undefined name snakemake

(F821)


90-106: 🛠️ Refactor suggestion

Refactor output file handling for clarity and consistency.

The output file handling code is somewhat complex and has the potential for errors. Consider refactoring for clarity and consistency, ensuring that the same file name is used throughout.

 has_named_output = False
 for key in Output_key_mapping:
     if hasattr(snakemake.output, key):
         has_named_output = True
         file_produced = os.path.join(output_dir, Output_key_mapping[key])
         file_renamed = getattr(snakemake.output, key)

         if file_produced != file_renamed:
             shutil.move(file_produced, file_renamed)


 if not has_named_output:
-    file_produced = os.path.join(output_dir, "contigs.fasta")
+    file_produced = os.path.join(output_dir, "final.contigs.fa")  # Use consistent filename with Output_key_mapping
     file_renamed = snakemake.output[0]

     if file_produced != file_renamed:
         shutil.move(file_produced, file_renamed)
🧰 Tools
🪛 Ruff (0.8.2)

92-92: Undefined name snakemake

(F821)


95-95: Undefined name snakemake

(F821)


103-103: Undefined name snakemake

(F821)

🧹 Nitpick comments (5)
bio/megahit/wrapper.py (5)

78-79: Remove unnecessary file path comment.

This comment appears to be a remnant from a previous review or edit and does not provide any useful information to the code.

-# bio/megahit/wrapper.py

80-80: Remove comment referring to line numbers.

This comment appears to reference line numbers which will change as the code evolves, making the comment misleading over time.

-# … around line 78 …

85-85: Remove comment referring to line numbers.

Similar to the previous comment, this references specific line numbers which will become inaccurate as the code evolves.

-# … later, around line 93 …

31-36: Improve memory handling clarity.

The current code uses "0.9" as a default value when mem_mb is not specified, but this is ambiguous - is it 0.9MB, 0.9GB, or a percentage? Consider adding a comment explaining what this default value means.

 if hasattr(snakemake.resources, "mem_mb"):
     mem_bp = snakemake.resources.mem_mb * 1000000
     memory_requirements = f"{mem_bp}"
 else:
-    memory_requirements = "0.9"
+    memory_requirements = "0.9"  # Default to 90% of available memory as per MEGAHIT documentation
🧰 Tools
🪛 Ruff (0.8.2)

32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


71-71: Fix typo in restart message.

The word "pipeline" is misspelled as "pipline" in the restart message.

-    "echo '\n\nRestart MEGAHIT \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
+    "echo '\n\nRestart MEGAHIT \n Remove pipeline_state file copy files to force copy files if necessary.' >> {log[0]}"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d607766 and b5bafeb.

📒 Files selected for processing (1)
  • bio/megahit/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🪛 Ruff (0.8.2)
bio/megahit/wrapper.py

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


86-86: Undefined name has_named_output

(F821)


88-88: Undefined name snakemake

(F821)


92-92: Undefined name snakemake

(F821)


95-95: Undefined name snakemake

(F821)


103-103: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: docs
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (1)
bio/megahit/wrapper.py (1)

47-52:

❌ Incorrect review comment

Ensure correct input formatting for additional reads.

The code attempts to handle 3+ reads, but there's a potential issue with how arguments are formatted. The string format index doesn't reset between lines, so it will try to access the same index multiple times.


🏁 Script executed:

#!/bin/bash
# Verify that MEGAHIT documentation supports the input format used for additional reads
curl -s https://github.com/voutcn/megahit | grep -A 20 "For paired-end libraries"

Length of output: 83


Formatting logic is correct; no changes needed

The two .format(*reads) calls each apply to their own string literal, and the preceding len(reads) checks guarantee that reads[2] and reads[3] exist before use. There’s no carry-over of format indices between calls, so the original concern does not apply.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
bio/megahit/wrapper.py (1)

80-96: 🛠️ Refactor suggestion

Improve file management with better error handling.

The current implementation for handling output files doesn't check if the source file exists before moving it, which could lead to errors if MEGAHIT fails to produce the expected file.

 has_named_output = False
 for key in Output_key_mapping:
     if hasattr(snakemake.output, key):
         has_named_output = True
         file_produced = os.path.join(output_dir, Output_key_mapping[key])
         file_renamed = getattr(snakemake.output, key)
 
+        if not os.path.exists(file_produced):
+            raise FileNotFoundError(f"Expected output file {file_produced} not found. MEGAHIT may have failed.")
+
         if file_produced != file_renamed:
             shutil.move(file_produced, file_renamed)
 
 if not has_named_output:
     file_produced = os.path.join(output_dir, "final.contigs.fa")
     file_renamed = snakemake.output[0]
 
+    if not os.path.exists(file_produced):
+        raise FileNotFoundError(f"Expected output file {file_produced} not found. MEGAHIT may have failed.")
+
     if file_produced != file_renamed:
         shutil.move(file_produced, file_renamed)
🧰 Tools
🪛 Ruff (0.11.9)

82-82: Undefined name snakemake

(F821)


85-85: Undefined name snakemake

(F821)


92-92: Undefined name snakemake

(F821)

♻️ Duplicate comments (1)
bio/megahit/wrapper.py (1)

29-29: Use the log formatter consistently throughout the wrapper.

The log formatter is correctly defined but not properly used in shell commands. The shell commands are using manual redirections instead.

-        " > {snakemake.log[0]} 2>&1 "
+        " {log} "
-        "echo '\n\nRestart MEGAHIT \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
+        "echo '\n\nRestart MEGAHIT \n Remove pipeline_state file copy files if necessary.' {log}"
-    shell("megahit " " --continue " " -o {output_dir} " " >> {snakemake.log[0]} 2>&1 ")
+    shell("megahit " " --continue " " -o {output_dir} " " {log}")

Also applies to: 61-61, 69-69, 72-72

🧰 Tools
🪛 Ruff (0.11.9)

29-29: Undefined name snakemake

(F821)

🧹 Nitpick comments (2)
bio/megahit/wrapper.py (2)

4-4: Update the copyright year to current year.

The copyright year is set to 2025, which is in the future. Consider changing it to the current year (2024).

-__copyright__ = "Copyright 2025, Jie Zhu"
+__copyright__ = "Copyright 2024, Jie Zhu"

69-69: Fix typo in restart message.

There's a typo in the restart message: "pipline_state" should be "pipeline_state".

-        "echo '\n\nRestart MEGAHIT \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
+        "echo '\n\nRestart MEGAHIT \n Remove pipeline_state file copy files to force copy files if necessary.' >> {log[0]}"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5bafeb and 2da4319.

📒 Files selected for processing (4)
  • bio/megahit/meta.yaml (1 hunks)
  • bio/megahit/test/Snakefile (1 hunks)
  • bio/megahit/wrapper.py (1 hunks)
  • test_wrappers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • bio/megahit/meta.yaml
  • bio/megahit/test/Snakefile
  • test_wrappers.py
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


42-42: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)


85-85: Undefined name snakemake

(F821)


92-92: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Summary
🔇 Additional comments (1)
bio/megahit/wrapper.py (1)

60-60:

❌ Incorrect review comment

Add k-mer list parameter to MEGAHIT command.

MEGAHIT typically requires a k-mer list parameter (--k-list), which is missing from the command. The variable klist is referenced in the AI summary but not defined or used in the code.

+    # Define k-mer list, use default if not specified
+    klist = snakemake.params.get("k_list", "21,29,39,59,79,99,119,141")
+
     shell(
         "megahit "
         " -t {snakemake.threads} "
         " -m {memory_requirements} "
         " -o {output_dir} "
+        " --k-list {klist} "
         " {input_arg} "
         " {extra} "
         " > {snakemake.log[0]} 2>&1 "
     )

Please verify if MEGAHIT requires the k-mer list parameter:


🌐 Web query:

Does MEGAHIT require a k-mer list parameter?

💡 Result:

MEGAHIT does not require a k-mer list parameter as it has well-defined default settings. The assembler provides multiple ways to specify k-mer sizes according to your needs, but if you don't specify them, it will use its built-in default values.

K-mer Configuration Options in MEGAHIT

MEGAHIT offers several options for controlling k-mer sizes:

  1. Default k-mer settings: If no k-mer parameters are specified, MEGAHIT will use a default list of k-mer sizes: 21, 29, 39, 59, 79, 99, 119, 141[7].

  2. Direct k-mer list specification: You can explicitly set the k-mer sizes using the --k-list parameter with a comma-separated list of values. For example: --k-list 21,33,55,77,99,121[4]. All values must be odd numbers in the range of 15-255, with increments ≤ 28[5][7].

  3. Range-based k-mer specification: Alternatively, you can define a range using three parameters:

    • --k-min: Minimum k-mer size (≤ 255), must be an odd number, default is 21[4][5]
    • --k-max: Maximum k-mer size (≤ 255), must be an odd number, default is 141[4][7]
    • --k-step: Increment between k-mer sizes (≤ 28), must be an even number, default is 12[7]
  4. Preset configurations: MEGAHIT also provides predefined parameter presets for different types of data:

    • meta: --min-count 2 --k-list 21,41,61,81,99 (for generic metagenomes, default)
    • meta-sensitive: --min-count 2 --k-list 21,31,41,51,61,71,81,91,99 (more sensitive but slower)
    • meta-large: --min-count 2 --k-list 27,37,47,57,67,77,87 (for large & complex metagenomes)
    • bulk: --min-count 3 --k-list 31,51,71,91,99 --no-mercy (for standard bulk sequencing with ≥ 30x depth)
    • single-cell: --min-count 3 --k-list 21,33,55,77,99,121 --merge_level 20,0.96 (for single cell data)[4]

MEGAHIT uses a multiple k-mer strategy, which is part of what makes it efficient for metagenomic assembly[2]. By iterating through different k-mer sizes, it can better resolve complex genomic regions and repeats[10].

Note that when selecting k-mer sizes, it's generally recommended to use values that are approximately 1/2 to 2/3 of the read length, though this may vary depending on your specific dataset[9].

Citations:


No explicit k-list parameter required

MEGAHIT will automatically use its built-in default k-mer list (21,29,39,59,79,99,119,141) when no --k-list is supplied. Therefore, you can leave the command in bio/megahit/wrapper.py as-is. Adding an explicit --k-list is only necessary if you need to override the defaults via snakemake.params.k_list.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
bio/megahit/wrapper.py (6)

22-22: Add error handling for empty output.

This code assumes snakemake.output has at least one element, which could cause an IndexError if it's empty.

-        output_file = snakemake.output[0]
+        if not snakemake.output:
+            raise ValueError("No output files specified.")
+        output_file = snakemake.output[0]
🧰 Tools
🪛 Ruff (0.11.9)

22-22: Undefined name snakemake

(F821)


29-29: Use snakemake.log_fmt_shell in your shell calls instead of manual redirections.

The log formatting helper is defined but not used in the subsequent shell commands.

The log variable should be embedded in shell commands rather than using manual redirections like > {snakemake.log[0]} 2>&1.

🧰 Tools
🪛 Ruff (0.11.9)

29-29: Undefined name snakemake

(F821)


31-35: Improve memory handling.

MEGAHIT expects memory in GB or a fraction of total memory, but the current code:

  1. Converts MB to bytes without proper unit specification
  2. Uses "0.9" without units for the default case
 if hasattr(snakemake.resources, "mem_mb"):
-    mem_bp = snakemake.resources.mem_mb * 1000000
-    memory_requirements = f"{mem_bp}"
+    # Convert MB to GB
+    mem_gb = snakemake.resources.mem_mb / 1024
+    memory_requirements = f"{mem_gb:.2f}"
 else:
-    memory_requirements = "0.9"
+    memory_requirements = "0.9"  # Fraction of total memory
🧰 Tools
🪛 Ruff (0.11.9)

31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


73-73: Use the log variable for output redirection.

Replace the manual redirection with the log variable to ensure consistent logging.

-" > {snakemake.log[0]} 2>&1 "
+" {log} "

81-81: Fix log reference in restart message.

There's an inconsistency in how log is referenced. The restart message uses {log[0]} but the MEGAHIT command uses {snakemake.log[0]}.

-"echo '\n\nRestart MEGAHIT \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
+"echo '\n\nRestart MEGAHIT \n Remove pipline_state file copy files to force copy files if necessary.' {log}"

84-84: Use the log variable for output redirection.

Replace the manual redirection with the log variable to ensure consistent logging.

-" >> {snakemake.log[0]} 2>&1 "
+" {log} "
🧹 Nitpick comments (3)
bio/megahit/wrapper.py (3)

8-8: Use separate import statements for clarity.

For better readability and following Python style guidelines, use separate import statements for each module.

-import os, shutil
+import os
+import shutil

81-81: Fix typo in restart message.

There's a typo in the restart message: "pipline_state" should be "pipeline_state".

-"echo '\n\nRestart MEGAHIT \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
+"echo '\n\nRestart MEGAHIT \n Remove pipeline_state file copy files to force copy files if necessary.' >> {log[0]}"

88-90: Use lowercase for variable names to follow Python naming conventions.

Python conventions suggest using lowercase for variable names.

-Output_key_mapping = {
+output_key_mapping = {
     "contigs": "final.contigs.fa",  # MEGAHIT's output file name
 }

And update references to this variable at line 96:

-        file_produced = os.path.join(output_dir, Output_key_mapping[key])
+        file_produced = os.path.join(output_dir, output_key_mapping[key])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2da4319 and e7f9723.

📒 Files selected for processing (3)
  • bio/megahit/meta.yaml (1 hunks)
  • bio/megahit/test/Snakefile (1 hunks)
  • bio/megahit/wrapper.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • bio/megahit/meta.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • bio/megahit/test/Snakefile
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


42-42: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


48-48: Undefined name snakemake

(F821)


48-48: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


59-59: Undefined name snakemake

(F821)


60-60: Undefined name snakemake

(F821)


61-61: Undefined name snakemake

(F821)


94-94: Undefined name snakemake

(F821)


97-97: Undefined name snakemake

(F821)


104-104: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: testing
  • GitHub Check: Summary

@alienzj alienzj requested a review from fgvieira May 19, 2025 03:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
bio/megahit/wrapper.py (4)

21-21: 🛠️ Refactor suggestion

Use log variable in shell commands instead of manual redirection.

The log variable is correctly defined but not used in the shell commands, where manual redirection is still being used.

This issue was previously identified. The log variable should be used directly in the shell commands rather than manually redirecting stdout and stderr.

🧰 Tools
🪛 Ruff (0.11.9)

21-21: Undefined name snakemake

(F821)


59-59: 🛠️ Refactor suggestion

Add k-mer list parameter to MEGAHIT command.

The --k-list parameter is missing from the MEGAHIT command, which was identified in a previous review comment.

 " -m {memory_requirements} "
 " -o {output_temp_dir} "
+" --k-list {klist} "
 " {input_arg} "

You'll also need to define the klist variable before the command:

klist = snakemake.params.get("k_list", "21,29,39,59,79,99,119,141")

51-62: 🛠️ Refactor suggestion

Shell command needs to use the log variable.

The shell command uses manual redirection (> {snakemake.log[0]} 2>&1) instead of the log variable defined earlier.

 shell(
     "megahit "
     " -t {snakemake.threads} "
     " -m {memory_requirements} "
     " -o {output_temp_dir} "
     " {input_arg} "
     " {extra} "
-    " > {snakemake.log[0]} 2>&1 "
+    " {log} "
 )

12-18: 🛠️ Refactor suggestion

Add error handling for empty output list.

The code assumes that snakemake.output has at least one element. This could lead to an IndexError if the output list is empty.

 # get output_dir and files from output
+if not snakemake.output:
+    raise ValueError("No output files specified.")
 output_dir = os.path.split(snakemake.output[0])[0]
 contigs_file = snakemake.output.get("contigs", os.path.join(output_dir, "contigs.fa"))
 contigs_file_original = os.path.join(output_dir, "final.contigs.fa")
 options_file = snakemake.output.get("options", os.path.join(output_dir, "options.json"))
 log_file = snakemake.output.get("log", os.path.join(output_dir, "log"))
🧰 Tools
🪛 Ruff (0.11.9)

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)

🧹 Nitpick comments (1)
bio/megahit/wrapper.py (1)

25-28: Consider using a ternary operator for cleaner code.

The if-else block for reads assignment could be simplified for better readability.

-if hasattr(snakemake.input, "reads"):
-    reads = snakemake.input.reads
-else:
-    reads = snakemake.input
+reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input
🧰 Tools
🪛 Ruff (0.11.9)

25-28: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f95dcbb and bc89166.

📒 Files selected for processing (5)
  • bio/megahit/environment.yaml (1 hunks)
  • bio/megahit/meta.yaml (1 hunks)
  • bio/megahit/test/Snakefile (1 hunks)
  • bio/megahit/wrapper.py (1 hunks)
  • test_wrappers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • bio/megahit/environment.yaml
  • test_wrappers.py
  • bio/megahit/test/Snakefile
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🧠 Learnings (1)
bio/megahit/wrapper.py (1)
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


25-28: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (10)
bio/megahit/meta.yaml (5)

2-3: The URL formatting matches what was suggested earlier.

The GitHub repository URL is correctly provided, which is important for users to find documentation and source code.


5-7: Comprehensive description that accurately reflects MEGAHIT's capabilities.

The description clearly explains MEGAHIT's primary purpose (metagenome assembly) while also noting its applicability to other scenarios (single genome and single-cell assembly).


9-14: Input specification is well-structured and comprehensive.

The input section properly defines all supported input types including lists of reads, paired-end reads (r1, r2), interleaved reads, and unpaired reads, which aligns with MEGAHIT's capabilities.


16-19: Output specification matches wrapper implementation.

The output specification correctly lists the three main output files produced by MEGAHIT (contigs, log, and json), which corresponds to the file handling in the wrapper implementation.


21-23: Authors are correctly listed.

The authors section properly credits both the original contributor and additional contributor, which matches the previous suggestion in the review comments.

bio/megahit/wrapper.py (5)

8-10: Import statement is correct and includes necessary utilities.

The imports include the proper Snakemake shell functionality and memory utility function, which was previously suggested in review comments.


22-22: Memory handling is now correctly implemented.

The memory requirement calculation correctly uses the get_mem utility with appropriate unit conversion for MEGAHIT's expectations.

🧰 Tools
🪛 Ruff (0.11.9)

22-22: Undefined name snakemake

(F821)


32-48: Input handling is now flexible and comprehensive.

The code now properly handles different types of input reads with explicit checks for named inputs, which is more flexible than the previous positional approach.

🧰 Tools
🪛 Ruff (0.11.9)

33-33: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


51-52: Using a temporary directory as recommended.

Using a temporary directory in the same parent directory as the output is a good practice that avoids potential issues with space limitations in the system's default temporary directory.


64-74: Output file handling is correct.

The code correctly checks for the presence of the "done" file to confirm successful completion, removes any existing output directory, and moves the temporary directory and files to their final locations.

@alienzj alienzj requested a review from fgvieira May 22, 2025 06:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
bio/megahit/wrapper.py (2)

54-62: ⚠️ Potential issue

MEGAHIT command is missing k-mer list parameter and should use the log variable.

The MEGAHIT command is missing the k-mer list parameter and doesn't use the log variable for redirection.

 shell(
     "megahit "
     " -t {snakemake.threads} "
     " -m {memory_requirements} "
     " -o {output_temp_dir} "
+    " --k-list {klist} "
     " {input_arg} "
     " {extra} "
-    " > {snakemake.log[0]} 2>&1 "
+    " {log} "
 )

And add this before the shell call:

klist = snakemake.params.get("k_list", "21,29,39,59,79,99,119,141")

13-18: 🛠️ Refactor suggestion

Validate output handling and consider default file management.

The output directory and file paths are properly derived from snakemake.output, with appropriate fallbacks. However, there's no handling for when snakemake.output is empty.

-output_dir = os.path.split(snakemake.output[0])[0]
+if not snakemake.output:
+    raise ValueError("No output files specified for MEGAHIT.")
+output_dir = os.path.split(snakemake.output[0])[0]

Also note that contigs_file_original should use "final.contigs.fa" to match MEGAHIT's actual output filename.

🧰 Tools
🪛 Ruff (0.11.9)

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)

🧹 Nitpick comments (1)
bio/megahit/wrapper.py (1)

68-74: File moving logic looks good but could be more robust.

The code correctly checks that all expected files exist before moving the contigs file. However, you should ensure that the destination directory exists before moving files.

 if (
     os.path.exists(contigs_file_original)
     and os.path.exists(options_file)
     and os.path.exists(log_file)
 ):
+    os.makedirs(os.path.dirname(contigs_file), exist_ok=True)
     shutil.move(contigs_file_original, contigs_file)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc89166 and c5f43a0.

📒 Files selected for processing (5)
  • bio/megahit/environment.yaml (1 hunks)
  • bio/megahit/meta.yaml (1 hunks)
  • bio/megahit/test/Snakefile (1 hunks)
  • bio/megahit/wrapper.py (1 hunks)
  • test_wrappers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • test_wrappers.py
  • bio/megahit/environment.yaml
  • bio/megahit/test/Snakefile
  • bio/megahit/meta.yaml
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🧠 Learnings (1)
bio/megahit/wrapper.py (1)
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


25-28: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Summary
🔇 Additional comments (3)
bio/megahit/wrapper.py (3)

8-10: Correctly import and use Snakemake wrapper utilities.

The imports look good overall, with the appropriate utility functions imported from snakemake_wrapper_utils.snakemake.


24-49: Good handling of input read configurations.

The input read handling is well-structured, supporting both named inputs (r1, r2, interleaved, unpaired) and positional inputs with appropriate fallbacks. This approach provides good flexibility for different input configurations.

🧰 Tools
🪛 Ruff (0.11.9)

25-28: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


51-53: Good use of temporary directory for atomic output.

Using a temporary directory in the same parent directory as the output is a good approach for ensuring atomic operations and avoiding partial results if the assembly fails.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
bio/megahit/wrapper.py (1)

68-74: 🛠️ Refactor suggestion

Fix final output file move logic.

The current code only moves the contigs file if all three output files exist, but this may be too restrictive. Each output file should be handled separately.

-        if (
-            os.path.exists(contigs_file_original)
-            and os.path.exists(options_file)
-            and os.path.exists(log_file)
-        ):
-            shutil.move(contigs_file_original, contigs_file)
+        # Move output files if they exist and are different from the default locations
+        if os.path.exists(contigs_file_original) and contigs_file_original != contigs_file:
+            shutil.move(contigs_file_original, contigs_file)
+
+        # If options file is explicitly specified and different from default
+        if options_file != os.path.join(output_dir, "options.json") and os.path.exists(os.path.join(output_dir, "options.json")):
+            shutil.move(os.path.join(output_dir, "options.json"), options_file)
+
+        # If log file is explicitly specified and different from default
+        if log_file != os.path.join(output_dir, "log") and os.path.exists(os.path.join(output_dir, "log")):
+            shutil.move(os.path.join(output_dir, "log"), log_file)
♻️ Duplicate comments (4)
bio/megahit/wrapper.py (4)

51-62: Add error handling for missing k-mer list parameter in MEGAHIT command.

The MEGAHIT command is missing the --k-list parameter as noted earlier, and there's no error handling if the command fails.


64-67: 🛠️ Refactor suggestion

Add error handling if MEGAHIT fails.

The code checks if the "done" file exists but does not handle the case where MEGAHIT fails and the file doesn't exist.

 if os.path.exists(os.path.join(output_temp_dir, "done")):
     shell("rm -rf {output_dir}")
     shutil.move(output_temp_dir, output_dir)
+else:
+    raise ValueError("MEGAHIT assembly failed. Check the log file for details.")

20-22: ⚠️ Potential issue

Missing k-mer list parameter.

MEGAHIT requires a k-mer list parameter, but it's not being extracted from parameters nor provided in the command.

 # parse params
 extra = snakemake.params.get("extra", "")
+klist = snakemake.params.get("k_list", "21,29,39,59,79,99,119,141")
 log = snakemake.log_fmt_shell(stdout=True, stderr=True)
 memory_requirements = get_mem(snakemake, out_unit="KiB") * 1024

And then add the parameter to the MEGAHIT command at line 58:

 " -m {memory_requirements} "
 " -o {output_temp_dir} "
+" --k-list {klist} "
 " {input_arg} "
🧰 Tools
🪛 Ruff (0.11.9)

20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


12-17: 🛠️ Refactor suggestion

Add error handling for empty output list.

The code assumes that snakemake.output has at least one element without verification. If it's empty, line 13 will cause an IndexError.

 # get output_dir and files from output
+if not snakemake.output:
+    raise ValueError("No output files specified in the Snakemake rule.")
 output_dir = os.path.split(snakemake.output[0])[0]
 contigs_file = snakemake.output.get("contigs", os.path.join(output_dir, "contigs.fa"))
 contigs_file_original = os.path.join(output_dir, "final.contigs.fa")
 options_file = snakemake.output.get("options", os.path.join(output_dir, "options.json"))
 log_file = snakemake.output.get("log", os.path.join(output_dir, "log"))
🧰 Tools
🪛 Ruff (0.11.9)

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)

🧹 Nitpick comments (1)
bio/megahit/wrapper.py (1)

8-10: Imports can be improved for readability and efficiency.

The imports could be reorganized to follow standard Python style: separate standard library imports from third-party imports, and use separate lines for better readability.

-import os, tempfile, shutil
-from snakemake.shell import shell
-from snakemake_wrapper_utils.snakemake import get_mem
+import os
+import shutil
+import tempfile
+
+from snakemake.shell import shell
+from snakemake_wrapper_utils.snakemake import get_mem
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c5f43a0 and f0b0dc0.

📒 Files selected for processing (2)
  • bio/megahit/wrapper.py (1 hunks)
  • test_wrappers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test_wrappers.py
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🧠 Learnings (1)
bio/megahit/wrapper.py (1)
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


25-28: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Summary

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
bio/megahit/wrapper.py (3)

64-67: 🛠️ Refactor suggestion

Add error handling for MEGAHIT failure.

The code properly checks if the "done" file exists, but doesn't handle the case where MEGAHIT fails to complete successfully.

 if os.path.exists(os.path.join(output_temp_dir, "done")):
     shell("rm -rf {output_dir}")
     shutil.move(output_temp_dir, output_dir)
+else:
+    raise ValueError("MEGAHIT assembly failed. Check the log file for details.")

30-48: 🛠️ Refactor suggestion

Add validation for empty input arguments.

The code handles different input types well, but doesn't verify if any input reads were provided at all, which could result in an invalid MEGAHIT command.

Add this check after processing all possible inputs:

 # handle additional reads if specified
 if hasattr(snakemake.input, "unpaired"):
     input_arg += " --read {} ".format(snakemake.input.unpaired)
 elif len(reads) >= 4 and not hasattr(snakemake.input, "r1"):
     input_arg += " --read {} ".format(reads[3])

+# Ensure we have at least one input
+if not input_arg.strip():
+    raise ValueError("No input reads were provided. MEGAHIT requires at least one input file.")
🧰 Tools
🪛 Ruff (0.11.9)

33-33: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


13-17: 🛠️ Refactor suggestion

Fix inconsistency in contigs file name.

There's an inconsistency between the default contigs_file name ("contigs.fa") and MEGAHIT's actual output file name ("final.contigs.fa") which could cause confusion or errors.

-contigs_file = snakemake.output.get("contigs", os.path.join(output_dir, "contigs.fa"))
+contigs_file = snakemake.output.get("contigs", os.path.join(output_dir, "final.contigs.fa"))
 contigs_file_original = os.path.join(output_dir, "final.contigs.fa")
🧰 Tools
🪛 Ruff (0.11.9)

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0b0dc0 and 51c93f3.

📒 Files selected for processing (1)
  • bio/megahit/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🧠 Learnings (1)
bio/megahit/wrapper.py (1)
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


25-28: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: linting
  • GitHub Check: docs
  • GitHub Check: Summary
🔇 Additional comments (4)
bio/megahit/wrapper.py (4)

8-10: Good use of standard libraries and wrapper utilities.

The imports are well-organized, including essential modules for file operations and the appropriate Snakemake wrapper utilities. The use of get_mem from snakemake_wrapper_utils is a good practice for standardized memory handling.


20-22: Use the log variable in shell commands.

The log variable is correctly defined using snakemake.log_fmt_shell(), but it needs to be used consistently in all shell commands to properly capture stdout and stderr.

🧰 Tools
🪛 Ruff (0.11.9)

20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


51-53: Good use of temporary directory management.

Using Python's tempfile.TemporaryDirectory() in a context manager is an excellent practice for managing temporary files and ensuring proper cleanup.


68-74: Well-structured file verification and movement.

The code properly checks for the existence of all required files before attempting to move them, which is a good practice to prevent errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
bio/megahit/wrapper.py (2)

23-41: 🛠️ Refactor suggestion

Validate that input arguments are not empty.

The code correctly handles various input scenarios, but there's no validation to ensure input_arg isn't empty after processing all input sources. MEGAHIT requires at least one input source to run properly.

Add the following validation after the input processing:

elif len(reads) >= 4 and not hasattr(snakemake.input, "r1"):
    input_arg += " --read {} ".format(reads[3])

+# Ensure we have at least one input source
+if not input_arg.strip():
+    raise ValueError("No input reads were provided. MEGAHIT requires at least one input file.")
🧰 Tools
🪛 Ruff (0.11.9)

26-26: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


38-38: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


56-58: 🛠️ Refactor suggestion

Add error handling for MEGAHIT failure.

The code doesn't check if MEGAHIT completed successfully before proceeding to move output files.

Add validation to check for MEGAHIT's "done" file which indicates successful completion:

    )

+    # Check if MEGAHIT completed successfully
+    if not os.path.exists(os.path.join(output_temp_dir, "done")):
+        raise ValueError("MEGAHIT did not complete successfully. Check the log file for details.")

    # Ensure user can name each file according to their need
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51c93f3 and b893f9a.

📒 Files selected for processing (2)
  • bio/megahit/test/Snakefile (1 hunks)
  • bio/megahit/wrapper.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bio/megahit/test/Snakefile
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🧠 Learnings (1)
bio/megahit/wrapper.py (1)
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


18-21: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


38-38: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


44-44: Undefined name snakemake

(F821)


64-64: Loop control variable temp_file not used within loop body

Rename unused temp_file to _temp_file

(B007)


65-65: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: docs
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (4)
bio/megahit/wrapper.py (4)

8-10: Good approach using snakemake_wrapper_utils.

Using the get_mem utility from snakemake_wrapper_utils ensures proper memory calculations, consistent with best practices for Snakemake wrappers.


13-15: Properly configured logging and memory handling.

The code correctly:

  • Uses append=True for the log formatting, important for capturing multiple shell commands
  • Properly converts memory to bytes as required by MEGAHIT
🧰 Tools
🪛 Ruff (0.11.9)

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


45-46: Appropriate temporary directory placement.

Creating the temporary directory adjacent to the output directory is a good practice for metagenomic assembly, which often generates large intermediate files that might exceed capacity in the system's default temporary location.


58-67: Well-structured output file handling.

The mapping approach for output files is clean and flexible, allowing users to specify output locations for different MEGAHIT result files while providing sensible defaults.

🧰 Tools
🪛 Ruff (0.11.9)

64-64: Loop control variable temp_file not used within loop body

Rename unused temp_file to _temp_file

(B007)


65-65: Undefined name snakemake

(F821)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
bio/megahit/wrapper.py (1)

57-67: 🛠️ Refactor suggestion

Add error handling for MEGAHIT failure.

The code moves output files from the temporary directory but doesn't check if MEGAHIT completed successfully. Consider adding validation before moving files.

    )

+    # Verify MEGAHIT completed successfully
+    megahit_done_file = os.path.join(output_temp_dir, "done")
+    if not os.path.exists(megahit_done_file):
+        raise ValueError("MEGAHIT failed to complete. Check the log file for details.")
+
    # Ensure user can name each file according to their need
    output_mapping = {

Also, temp_file is flagged as unused by the linter but is actually used in the shell command through string formatting. This is a false positive that can be safely ignored.

🧰 Tools
🪛 Ruff (0.11.9)

63-63: Loop control variable temp_file not used within loop body

Rename unused temp_file to _temp_file

(B007)


64-64: Undefined name snakemake

(F821)

♻️ Duplicate comments (1)
bio/megahit/wrapper.py (1)

23-41: 🛠️ Refactor suggestion

Add validation for empty input arguments.

The wrapper correctly handles different input types (paired, interleaved, and unpaired reads), but doesn't verify that at least one input was provided.

# handle additional reads if specified
if hasattr(snakemake.input, "unpaired"):
    input_arg += " --read {} ".format(snakemake.input.unpaired)
elif len(reads) >= 4 and not hasattr(snakemake.input, "r1"):
    input_arg += " --read {} ".format(reads[3])

+# Verify we have at least one input argument
+if not input_arg.strip():
+    raise ValueError("No input reads provided. MEGAHIT requires at least one input file.")
🧰 Tools
🪛 Ruff (0.11.9)

26-26: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


38-38: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)

🧹 Nitpick comments (1)
bio/megahit/wrapper.py (1)

17-22: Consider using a ternary operator for conciseness.

While the current implementation works correctly, consider using a ternary operator for more concise code.

-if hasattr(snakemake.input, "reads"):
-    reads = snakemake.input.reads
-else:
-    reads = snakemake.input
+reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input
🧰 Tools
🪛 Ruff (0.11.9)

18-21: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b893f9a and 77a14a6.

📒 Files selected for processing (1)
  • bio/megahit/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🧠 Learnings (1)
bio/megahit/wrapper.py (1)
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


18-21: Use ternary operator reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input instead of if-else-block

Replace if-else-block with reads = snakemake.input.reads if hasattr(snakemake.input, "reads") else snakemake.input

(SIM108)


18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


38-38: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


63-63: Loop control variable temp_file not used within loop body

Rename unused temp_file to _temp_file

(B007)


64-64: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: docs
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (4)
bio/megahit/wrapper.py (4)

1-7: LGTM! Well-structured file header.

The file header contains all necessary metadata including author information, copyright, email, and license.


8-10: LGTM! Proper imports.

The imports look correct, bringing in the necessary modules for file operations, shell execution, and memory resource handling.


12-15: Excellent memory handling implementation.

The wrapper correctly:

  1. Uses the utility function get_mem to handle memory requirements
  2. Applies proper unit conversion (KiB * 1024 = bytes)
  3. Uses Snakemake's log formatting with append=True

This addresses previous suggestions about memory handling.

🧰 Tools
🪛 Ruff (0.11.9)

13-13: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


43-55: LGTM! Proper temporary directory handling and MEGAHIT execution.

The implementation:

  1. Uses Python's tempfile.TemporaryDirectory() for safe temporary storage
  2. Properly sets all MEGAHIT parameters (-t for threads, -m for memory, etc.)
  3. Uses the log formatter correctly

This addresses previous suggestions regarding temporary directory and log handling.

@alienzj alienzj requested a review from fgvieira May 23, 2025 01:31
@fgvieira
Copy link
Collaborator

Great work @alienzj !

Can you also add the environment.linux-64.pin.txt file? You can generate it by running snakedeploy pin-conda-envs environment.yaml.

@alienzj
Copy link
Contributor Author

alienzj commented May 23, 2025

Thanks. Added.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
bio/megahit/wrapper.py (1)

21-40: 🛠️ Refactor suggestion

Add input validation to prevent empty input arguments.

The input handling logic is comprehensive, but there's no validation to ensure that input_arg contains valid input after processing all possible input sources. If no valid inputs are found, MEGAHIT will fail with an unclear error.

Add validation after the input processing:

 elif len(reads) >= 4 and not hasattr(snakemake.input, "r1"):
     input_arg += f" --read {reads[3]}"

+# Ensure we have at least one input
+if not input_arg.strip():
+    raise ValueError("No input reads were provided. MEGAHIT requires at least one input file.")
🧰 Tools
🪛 Ruff (0.11.9)

24-24: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


25-25: Undefined name snakemake

(F821)


25-25: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


36-36: Undefined name snakemake

(F821)


37-37: Undefined name snakemake

(F821)


38-38: Undefined name snakemake

(F821)

🧹 Nitpick comments (1)
bio/megahit/wrapper.py (1)

55-64: Fix unused loop variable for better code clarity.

The output handling logic is sound and supports flexible file naming. However, the loop variable temp_file is not used within the loop body since the file path is constructed within the dictionary.

Consider restructuring to make the code clearer:

     # Ensure user can name each file according to their need
-    output_mapping = {
-        "contigs": f"{output_tmpdir}/final.contigs.fa",
-        "json": f"{output_tmpdir}/options.json",
-        "log": f"{output_tmpdir}/log",
-    }
-    for output_key, temp_file in output_mapping.items():
+    output_files = {
+        "contigs": "final.contigs.fa",
+        "json": "options.json", 
+        "log": "log",
+    }
+    for output_key, filename in output_files.items():
         output_path = snakemake.output.get(output_key)
         if output_path:
-            shell("cp --verbose {temp_file:q} {output_path:q} {log}")
+            temp_file = f"{output_tmpdir}/{filename}"
+            shell("cp --verbose {temp_file:q} {output_path:q} {log}")
🧰 Tools
🪛 Ruff (0.11.9)

61-61: Loop control variable temp_file not used within loop body

Rename unused temp_file to _temp_file

(B007)


62-62: Undefined name snakemake

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 521e109 and e7a9174.

📒 Files selected for processing (1)
  • bio/megahit/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/megahit/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/megahit/wrapper.py
🧠 Learnings (1)
bio/megahit/wrapper.py (1)
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
🪛 Ruff (0.11.9)
bio/megahit/wrapper.py

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


25-25: Undefined name snakemake

(F821)


25-25: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


36-36: Undefined name snakemake

(F821)


37-37: Undefined name snakemake

(F821)


38-38: Undefined name snakemake

(F821)


61-61: Loop control variable temp_file not used within loop body

Rename unused temp_file to _temp_file

(B007)


62-62: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: docs
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (2)
bio/megahit/wrapper.py (2)

13-17: Parameter parsing looks good.

The parameter parsing properly addresses previous review feedback:

  • Correct use of snakemake.log_fmt_shell() with append=True for logging
  • Proper memory conversion from KiB to bytes for MEGAHIT's -m parameter
  • Clean extraction of extra parameters
🧰 Tools
🪛 Ruff (0.11.9)

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


41-53: MEGAHIT execution is well implemented.

The execution section properly addresses previous review feedback:

  • Correct use of tempfile.TemporaryDirectory() for temporary storage
  • Proper use of the log variable for output redirection (instead of manual > {snakemake.log[0]} 2>&1)
  • Clean shell command construction with all necessary parameters

@fgvieira fgvieira merged commit be71b0b into snakemake:master May 23, 2025
7 checks passed
@fgvieira
Copy link
Collaborator

Thanks again for contributing to the wrapper! 👍

johanneskoester pushed a commit that referenced this pull request Jun 11, 2025
🤖 I have created a release *beep* *boop*
---


##
[7.0.0](v6.2.0...v7.0.0)
(2025-06-11)


### ⚠ BREAKING CHANGES

* trimmomatic wrappers: remove parallel gzip, merge wrappers, added
docs, and code refactoring.
([#4158](#4158))

### Features

* add param for mem_overhead
([#4122](#4122))
([50f8acb](50f8acb))
* add wrapper for MEGAHIT
([#4121](#4121))
([be71b0b](be71b0b))
* Add wrapper for Tandem Repeat Finder (TRF) utility
([#4160](#4160))
([be31100](be31100))
* enable gzip output for bedtools wrappers
([#3642](#3642))
([fc6df02](fc6df02))


### Bug Fixes

* fix 4118
([#4119](#4119))
([00f7aaf](00f7aaf))
* GATK variantrecalibrator wrapper path
([#4126](#4126))
([965df4d](965df4d))
* remove hardcoded shrink type, and fix typos
([#4123](#4123))
([7832d9f](7832d9f))
* trimmomatic wrappers: remove parallel gzip, merge wrappers, added
docs, and code refactoring.
([#4158](#4158))
([c914d68](c914d68))


### Performance Improvements

* autobump bio/bamtools/filter
([#4127](#4127))
([a5f375c](a5f375c))
* autobump bio/bamtools/filter_json
([#4131](#4131))
([da6cc46](da6cc46))
* autobump bio/bamtools/split
([#4130](#4130))
([bcb48e4](bcb48e4))
* autobump bio/bamtools/stats
([#4128](#4128))
([cab52d7](cab52d7))
* autobump bio/bbtools
([#4137](#4137))
([095c7cc](095c7cc))
* autobump bio/bcftools/call
([#4168](#4168))
([49a7824](49a7824))
* autobump bio/bcftools/concat
([#4187](#4187))
([c9e2d7a](c9e2d7a))
* autobump bio/bcftools/filter
([#4197](#4197))
([34f008f](34f008f))
* autobump bio/bcftools/index
([#4192](#4192))
([b9959e9](b9959e9))
* autobump bio/bcftools/merge
([#4174](#4174))
([bbb3c6c](bbb3c6c))
* autobump bio/bcftools/mpileup
([#4178](#4178))
([0c8a6d1](0c8a6d1))
* autobump bio/bcftools/norm
([#4175](#4175))
([7305a0b](7305a0b))
* autobump bio/bcftools/reheader
([#4177](#4177))
([a468bca](a468bca))
* autobump bio/bcftools/sort
([#4183](#4183))
([250b4bc](250b4bc))
* autobump bio/bcftools/stats
([#4194](#4194))
([7525cf1](7525cf1))
* autobump bio/bcftools/view
([#4162](#4162))
([879af38](879af38))
* autobump bio/bedtools/bamtobed
([#4188](#4188))
([7ef5b5d](7ef5b5d))
* autobump bio/bedtools/complement
([#4189](#4189))
([fbea34a](fbea34a))
* autobump bio/bedtools/coveragebed
([#4191](#4191))
([38f8b93](38f8b93))
* autobump bio/bedtools/genomecov
([#4163](#4163))
([923dc25](923dc25))
* autobump bio/bedtools/intersect
([#4198](#4198))
([48042ac](48042ac))
* autobump bio/bedtools/merge
([#4169](#4169))
([46bb04e](46bb04e))
* autobump bio/bedtools/slop
([#4170](#4170))
([c314c49](c314c49))
* autobump bio/bedtools/sort
([#4173](#4173))
([e72641d](e72641d))
* autobump bio/bedtools/split
([#4185](#4185))
([f305aeb](f305aeb))
* autobump bio/bellerophon
([#4180](#4180))
([1064bf7](1064bf7))
* autobump bio/benchmark/chm-eval-sample
([#4179](#4179))
([8b3ecfd](8b3ecfd))
* autobump bio/bgzip
([#4166](#4166))
([a5dcdad](a5dcdad))
* autobump bio/bismark/bam2nuc
([#4181](#4181))
([f88f3f8](f88f3f8))
* autobump bio/bismark/bismark
([#4193](#4193))
([868363e](868363e))
* autobump bio/bismark/bismark_genome_preparation
([#4182](#4182))
([6177626](6177626))
* autobump bio/bismark/bismark_methylation_extractor
([#4165](#4165))
([e509dea](e509dea))
* autobump bio/bismark/bismark2bedGraph
([#4195](#4195))
([1d17cae](1d17cae))
* autobump bio/bismark/bismark2report
([#4196](#4196))
([e884149](e884149))
* autobump bio/bismark/bismark2summary
([#4190](#4190))
([85e4883](85e4883))
* autobump bio/bismark/deduplicate_bismark
([#4161](#4161))
([6de357f](6de357f))
* autobump bio/bowtie2/align
([#4186](#4186))
([36ec656](36ec656))
* autobump bio/bustools/count
([#4138](#4138))
([60fd30a](60fd30a))
* autobump bio/bustools/sort
([#4139](#4139))
([1f5cf71](1f5cf71))
* autobump bio/bustools/text
([#4140](#4140))
([b53856a](b53856a))
* autobump bio/bwa-mem2/mem
([#4172](#4172))
([f77bd3b](f77bd3b))
* autobump bio/bwa/mem
([#4184](#4184))
([2212b96](2212b96))
* autobump bio/bwa/sampe
([#4164](#4164))
([2463d6e](2463d6e))
* autobump bio/bwa/samse
([#4176](#4176))
([3f13ef9](3f13ef9))
* autobump bio/bwa/samxe
([#4171](#4171))
([244efbd](244efbd))
* autobump bio/bwameth/memx
([#4129](#4129))
([0c8667d](0c8667d))
* autobump bio/bwameth/memx
([#4167](#4167))
([978cd77](978cd77))
* autobump bio/cnvkit/export
([#4200](#4200))
([e4b3659](e4b3659))
* autobump bio/cutadapt/pe
([#4142](#4142))
([e764619](e764619))
* autobump bio/cutadapt/se
([#4141](#4141))
([735bfda](735bfda))
* autobump bio/deepvariant
([#4143](#4143))
([4ad6bc7](4ad6bc7))
* autobump bio/delly
([#4203](#4203))
([1bd71f5](1bd71f5))
* autobump bio/diamond/blastp
([#4205](#4205))
([0a4b009](0a4b009))
* autobump bio/diamond/blastx
([#4201](#4201))
([0b8cd8f](0b8cd8f))
* autobump bio/diamond/makedb
([#4202](#4202))
([1bcad8a](1bcad8a))
* autobump bio/entrez/efetch
([#4144](#4144))
([2d09ee0](2d09ee0))
* autobump bio/fastp
([#4145](#4145))
([1e38402](1e38402))
* autobump bio/fastp
([#4207](#4207))
([6cd7e64](6cd7e64))
* autobump bio/freebayes
([#4132](#4132))
([c9749da](c9749da))
* autobump bio/freebayes
([#4206](#4206))
([a2650e2](a2650e2))
* autobump bio/gatk/applybqsr
([#4208](#4208))
([647384e](647384e))
* autobump bio/gatk/applybqsrspark
([#4209](#4209))
([df09bb5](df09bb5))
* autobump bio/gdc-api/bam-slicing
([#4146](#4146))
([68cbc01](68cbc01))
* autobump bio/gdc-api/bam-slicing
([#4211](#4211))
([b2b3643](b2b3643))
* autobump bio/gseapy/gsea
([#4210](#4210))
([5f38932](5f38932))
* autobump bio/hap.py/hap.py
([#4147](#4147))
([86efd6d](86efd6d))
* autobump bio/hisat2/align
([#4213](#4213))
([b349a8c](b349a8c))
* autobump bio/homer/makeTagDirectory
([#4212](#4212))
([fd9c4ff](fd9c4ff))
* autobump bio/lofreq/call
([#4216](#4216))
([d838abd](d838abd))
* autobump bio/lofreq/indelqual
([#4215](#4215))
([53e6cac](53e6cac))
* autobump bio/mapdamage2
([#4148](#4148))
([87d0a10](87d0a10))
* autobump bio/mapdamage2
([#4218](#4218))
([420835f](420835f))
* autobump bio/minimap2/aligner
([#4217](#4217))
([2eb12e1](2eb12e1))
* autobump bio/multiqc
([#4133](#4133))
([9ebdbe9](9ebdbe9))
* autobump bio/ngs-disambiguate
([#4134](#4134))
([015d8ef](015d8ef))
* autobump bio/ngsbits/sampleancestry
([#4149](#4149))
([dff8a93](dff8a93))
* autobump bio/ngsbits/samplesimilarity
([#4150](#4150))
([1362ea7](1362ea7))
* autobump bio/open-cravat/module
([#4219](#4219))
([14f3d59](14f3d59))
* autobump bio/open-cravat/run
([#4220](#4220))
([f68c4dc](f68c4dc))
* autobump bio/paladin/align
([#4224](#4224))
([60a4073](60a4073))
* autobump bio/paladin/index
([#4222](#4222))
([9a49585](9a49585))
* autobump bio/paladin/prepare
([#4226](#4226))
([5ffeda7](5ffeda7))
* autobump bio/picard/markduplicates
([#4221](#4221))
([67ca39e](67ca39e))
* autobump bio/pretext/map
([#4227](#4227))
([d9fef0b](d9fef0b))
* autobump bio/pretext/snapshot
([#4223](#4223))
([6287d9a](6287d9a))
* autobump bio/prinseq-plus-plus
([#4225](#4225))
([d720752](d720752))
* autobump bio/samtools/calmd
([#4244](#4244))
([d7ee375](d7ee375))
* autobump bio/samtools/collate
([#4240](#4240))
([a682845](a682845))
* autobump bio/samtools/depth
([#4231](#4231))
([a7a7101](a7a7101))
* autobump bio/samtools/faidx
([#4243](#4243))
([f7fc7f3](f7fc7f3))
* autobump bio/samtools/fastx
([#4238](#4238))
([17ec481](17ec481))
* autobump bio/samtools/fixmate
([#4248](#4248))
([be8b325](be8b325))
* autobump bio/samtools/flagstat
([#4237](#4237))
([917a816](917a816))
* autobump bio/samtools/idxstats
([#4235](#4235))
([b22e851](b22e851))
* autobump bio/samtools/index
([#4233](#4233))
([3f128ef](3f128ef))
* autobump bio/samtools/markdup
([#4242](#4242))
([58e7715](58e7715))
* autobump bio/samtools/merge
([#4246](#4246))
([a1b731a](a1b731a))
* autobump bio/samtools/mpileup
([#4251](#4251))
([9575f39](9575f39))
* autobump bio/samtools/sort
([#4228](#4228))
([3ce39f0](3ce39f0))
* autobump bio/samtools/stats
([#4229](#4229))
([a390efc](a390efc))
* autobump bio/samtools/view
([#4249](#4249))
([da8e2ad](da8e2ad))
* autobump bio/seqkit
([#4230](#4230))
([731cfb0](731cfb0))
* autobump bio/seqtk
([#4234](#4234))
([d83a202](d83a202))
* autobump bio/snpeff/annotate
([#4250](#4250))
([0d55cf9](0d55cf9))
* autobump bio/snpeff/download
([#4245](#4245))
([9ee55b8](9ee55b8))
* autobump bio/snpsift/annotate
([#4247](#4247))
([0d827da](0d827da))
* autobump bio/snpsift/dbnsfp
([#4232](#4232))
([266def8](266def8))
* autobump bio/snpsift/genesets
([#4241](#4241))
([6476dcc](6476dcc))
* autobump bio/snpsift/gwascat
([#4239](#4239))
([a3b9e06](a3b9e06))
* autobump bio/sortmerna
([#2985](#2985))
([574b4ac](574b4ac))
* autobump bio/sourmash/compute
([#4236](#4236))
([f3abaf2](f3abaf2))
* autobump bio/tabix/query
([#4253](#4253))
([c2937be](c2937be))
* autobump bio/ucsc/twoBitInfo
([#4135](#4135))
([c3cad4f](c3cad4f))
* autobump bio/umis/bamtag
([#4255](#4255))
([d5c415c](d5c415c))
* autobump bio/unicycler
([#4254](#4254))
([2085374](2085374))
* autobump bio/varlociraptor/call-variants
([#4156](#4156))
([7d3f8ef](7d3f8ef))
* autobump bio/varlociraptor/control-fdr
([#4153](#4153))
([ca393f5](ca393f5))
* autobump bio/varlociraptor/estimate-alignment-properties
([#4151](#4151))
([469076e](469076e))
* autobump bio/varlociraptor/preprocess-variants
([#4154](#4154))
([4217362](4217362))
* autobump bio/vembrane/filter
([#4258](#4258))
([be1d8ae](be1d8ae))
* autobump bio/vembrane/table
([#4257](#4257))
([93d47bb](93d47bb))
* autobump bio/vep/annotate
([#4155](#4155))
([e5240f1](e5240f1))
* autobump bio/vep/cache
([#4152](#4152))
([f8af0af](f8af0af))
* autobump bio/vg/giraffe
([#4256](#4256))
([60522de](60522de))
* autobump bio/whatshap/haplotag
([#4157](#4157))
([d55da33](d55da33))
* autobump utils/csvtk
([#4199](#4199))
([c61588a](c61588a))
* autobump utils/datavzrd
([#4204](#4204))
([24ff485](24ff485))
* bump deepvariant version
([#4136](#4136))
([2b488c6](2b488c6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants