Skip to content

Conversation

@cebtenzzre
Copy link
Member

@cebtenzzre cebtenzzre commented Nov 11, 2025

Important

Enhance upload_file() in nomic/client.py to support more file types using filetype, update tests and CI configuration accordingly.

  • Behavior:
    • upload_file() in nomic/client.py now supports more file types (Office documents, images) using _detect_content_type().
    • _detect_content_type() added to nomic/client.py for MIME type detection using filetype library.
  • Testing:
    • New test_ci target in makefile excludes tests marked with atlas.
    • Added atlas marker to multiple tests in tests/test_atlas_client.py.
  • CI/CD:
    • .circleci/config.yml updated to use make test_ci for unit tests.
  • Dependencies:
    • Added filetype to install_requires in setup.py.

This description was created by Ellipsis for 115095a. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@cebtenzzre cebtenzzre changed the title document client: more filetypes feat: document client: more filetypes Nov 11, 2025
@cebtenzzre cebtenzzre marked this pull request as ready for review November 11, 2025 00:31
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to bd231ae in 1 minute and 55 seconds. Click for details.
  • Reviewed 113 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. nomic/client.py:160
  • Draft comment:
    Ensure filetype.guess_mime() handles pathlib.Path; consider using str(path) for compatibility.
  • Reason this comment was not posted:
    Confidence changes required: 66% <= threshold 70% None
2. nomic/client.py:167
  • Draft comment:
    Clarify ZIP archive handling: non-Office ZIP files trigger a ValueError. Update the docstring if this behavior is intentional.
  • Reason this comment was not posted:
    Confidence changes required: 66% <= threshold 70% None
3. nomic/client.py:201
  • Draft comment:
    Rename variable 'pdf_file' to a more generic name (e.g., file_obj) since the file may not be a PDF.
  • Reason this comment was not posted:
    Comment was on unchanged code.
4. setup.py:41
  • Draft comment:
    Good explicit version constraint for 'filetype'. Consider adding a comment explaining the rationale behind the version range.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 70% None
5. nomic/client.py:145
  • Draft comment:
    Typo: "docling's approach" might be a misspelling. Please confirm if it should be "docile's approach" or another term.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 5% vs. threshold = 70% "Docling" is likely a reference to a specific library or tool used for document processing. A quick search would reveal that "docling" is indeed a real document processing library. The automated tool appears to have incorrectly flagged this as a typo when it's actually a proper noun/technical term. The suggestion to change it to "docile's approach" would be nonsensical. This comment is incorrect and should be deleted. I might be wrong if "docling" is truly a typo and the author meant something else. However, given the technical context and the fact that the code is implementing MIME type detection with specific handling for Office formats, it's highly likely this is a reference to an actual library or methodology. Even if I'm uncertain about whether "docling" is a real library, the suggested correction to "docile's approach" makes no sense in this technical context. The comment is not providing value and is likely incorrect. Additionally, this is just a comment/documentation change, not a code logic issue, and the rules state not to make comments that are obvious or unimportant. This comment should be deleted. It incorrectly flags "docling" as a typo when it appears to be a legitimate technical term or library reference. The suggested correction "docile's approach" would be nonsensical in this context.

Workflow ID: wflow_oBTF6zrEX7DZ2A1k

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 115095a in 1 minute and 57 seconds. Click for details.
  • Reviewed 188 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. .circleci/config.yml:42
  • Draft comment:
    Changed test command from 'make test' to 'make test_ci', which filters out tests marked with 'atlas'. Please confirm that skipping integration tests on CI is intentional and consider documenting this decision.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 30% <= threshold 70% The comment is asking the PR author to confirm their intention regarding skipping integration tests on CI. This violates the rule against asking for confirmation of intention. However, it also suggests documenting the decision, which is a specific and actionable suggestion. The comment is partially useful but mostly violates the rules.
2. makefile:40
  • Draft comment:
    New 'test_ci' target excludes 'atlas' integration tests using '-m "not atlas"'. It would be helpful to add a comment explaining that these tests depend on external services and are excluded from CI by design.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 70% This comment is asking the author to add a comment/documentation explaining the purpose of the test exclusion. However, the rules explicitly state "Do NOT comment unless there is clearly a code change required. Do not make purely informative comments." The code itself is fairly self-explanatory - it's a CI test target that excludes atlas tests. The naming convention (test_ci vs test) and the pytest marker usage make the intent reasonably clear. Additionally, looking at the full file context, I can see other CI-related targets like black_ci, isort_ci, pyright_ci which follow a similar naming pattern, suggesting this is a consistent pattern in the codebase. The comment is essentially asking for documentation/explanation rather than pointing out a functional issue. This violates the rule about not making purely informative comments. While the code might benefit from a comment explaining why atlas tests are excluded, the naming and context make it reasonably clear. The comment could be considered helpful for future maintainers, and the rule about "Extremely Complicated Code Needs Comments" might apply if this is confusing. The code is not "extremely complicated" - it's a simple pytest command with a marker exclusion. The naming convention (test_ci) and the marker usage are standard pytest practices. This is more about documentation preference than a necessary code change. The rules prioritize actionable code changes over documentation requests. This comment should be deleted because it's requesting documentation/explanation rather than pointing out a functional issue or necessary code change. The code is self-explanatory enough through naming conventions and standard pytest practices.
3. pyproject.toml:13
  • Draft comment:
    The addition of the 'atlas' marker in pytest config is good. Consider expanding the documentation (e.g., in the developer guide) on when to use the '@pytest.mark.atlas' marker.
  • Reason this comment was not posted:
    Confidence changes required: 20% <= threshold 70% None
4. tests/test_atlas_client.py:30
  • Draft comment:
    Integration test functions now include '@pytest.mark.atlas'. If the majority of tests in this file are integration tests, consider applying a module-level marker (e.g., 'pytestmark = pytest.mark.atlas') to reduce repetition and improve maintainability. Also, ensure consistent naming (e.g., differentiate between integration tests and mocked tests).
  • Reason this comment was not posted:
    Confidence changes required: 70% <= threshold 70% None
5. tests/test_atlas_client.py:49
  • Draft comment:
    Tests using 'pytest.raises(Exception)' might catch too broad an exception. If possible, consider using a more specific exception type to ensure that only the expected error is caught.
  • Reason this comment was not posted:
    Confidence changes required: 70% <= threshold 70% None

Workflow ID: wflow_F90OyLRY96tJKN1z

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants