-
Notifications
You must be signed in to change notification settings - Fork 394
[importer] Introduce new Importer module with separate Configs, API Endpoints, and Dependencies #4089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Test files were modified. Ensure that the tests cover all relevant changes. ✅ |
85a9c35
to
49f1b25
Compare
0923520
to
25527a7
Compare
5ed73c2
to
04bcf00
Compare
7771b36
to
f0d3ecb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new importer module with REST endpoints for file upload, metadata guessing, and previewing, while disabling legacy indexer format logic and updating dependencies.
- Comments out deprecated
table
,query
,rdbms
,stream
, andconnector
handling inapi3.py
. - Introduces
LocalFileUploadSerializer
,GuessFileMetadataSerializer
, andPreviewFileSerializer
plus corresponding endpoints and URL routes. - Pins new requirements
polars[calamine]
andpython-magic
.
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
desktop/libs/indexer/src/indexer/api3.py | Commented out large blocks of legacy format-handling logic |
desktop/core/src/desktop/lib/importer/serializers.py | Added serializers for upload, metadata guess, and preview |
desktop/core/src/desktop/lib/importer/api.py | Added upload_file , guess_file_metadata , and preview_file APIs |
desktop/core/src/desktop/lib/importer/init.py | Initialized new importer package |
desktop/core/src/desktop/api_public_urls_v1.py | Registered new importer API routes |
desktop/core/generate_requirements.py | Added polars[calamine] and python-magic to requirements |
Files not reviewed (1)
- desktop/core/base_requirements.txt: Language not supported
Comments suppressed due to low confidence (2)
desktop/core/src/desktop/lib/importer/api.py:77
- [nitpick] The variable name
upload_file
shadows the function name and may be confusing. Consider renaming it to something likefile_obj
oruploaded_file
.
upload_file = serializer.validated_data['file']
desktop/core/src/desktop/lib/importer/api.py:51
- New endpoints (
upload_file
,guess_file_metadata
,preview_file
) currently lack automated unit or integration tests. Adding tests will help catch regressions and validate behavior.
@api_view(['POST'])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new importer
package with serializers, API endpoints, and URL routes to support local file upload, metadata guessing, and content preview. Key changes include:
- New serializers (
LocalFileUploadSerializer
,GuessFileMetadataSerializer
,PreviewFileSerializer
) for validating file operations. - API views (
upload_file
,guess_file_metadata
,preview_file
) with error handling and routing. - Updated
generate_requirements.py
to includepolars[calamine]
andpython-magic
.
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
desktop/core/src/desktop/lib/importer/serializers.py | Adds serializers for upload, metadata guess, preview |
desktop/core/src/desktop/lib/importer/api.py | Implements API endpoints and common error handler |
desktop/core/src/desktop/lib/importer/init.py | Initializes importer package |
desktop/core/src/desktop/api_public_urls_v1.py | Registers new importer routes |
desktop/core/generate_requirements.py | Adds new dependencies (polars[calamine] , python-magic ) |
Files not reviewed (1)
- desktop/core/base_requirements.txt: Language not supported
Comments suppressed due to low confidence (2)
desktop/core/src/desktop/lib/importer/serializers.py:20
- There are no automated tests covering these new serializers. Consider adding unit tests for file-size validation, metadata guessing, and preview logic to ensure behavior remains correct.
class LocalFileUploadSerializer(serializers.Serializer):
desktop/core/src/desktop/lib/importer/api.py:1
- [nitpick] Consider adding a module-level docstring below the license header to describe the purpose of this API module.
#!/usr/bin/env python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Unit tests are work in progress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, see questions and comments. Can you also add unit tests to this PR?
Enhances file type detection reliability by: - Adding proper error handling for python-magic import - Removing dependency on file extensions for type detection - Simplifying delimiter-based file format detection logic - Improving error messaging when magic lib is unavailable Makes file type detection more robust and maintainable by focusing on content analysis rather than extensions.
Implements new preview endpoint to support file imports with: - Excel file preview with sheet selection - Delimited file preview (CSV, TSV) with configurable separators - SQL type mapping for multiple dialects (Hive, Impala, Trino, Phoenix, SparkSQL) - Automatic header detection - Error handling improvements for file processing Enhances the importer API with robust data preview capabilities before import
Ensures has_header parameter is properly coerced to boolean when explicitly provided for both Excel and delimited file previews. Previously, boolean coercion was only done when auto-detecting headers through csv.Sniffer. This change provides consistent boolean handling across all file preview scenarios.
…oding and error logging
…lizer and updating operations for improved validation and error handling
…zer for improved validation and error handling, and enhance operations for better file type detection and previewing.
- Introduces a new configuration section for the data file importer, allowing admins to enable or disable the importer, restrict certain file extensions, and set a maximum upload size. - Updates the API and serializer logic to enforce these settings, improving security and flexibility for file uploads.
Uses a lightweight stdlib-based approach to list sheet names from .xlsx files by reading workbook metadata directly, reducing memory usage and avoiding unnecessary data loading. Falls back to the previous method for non-standard formats, improving robustness and efficiency when handling Excel files.
- Introduces a new endpoint and supporting logic to determine if uploaded files (including Excel and delimited formats) contain a header row. - Refactors file preview APIs to require explicit header row indication, improving reliability and user control. - Enhances error handling and cleans up temporary files on failure to ensure robustness.
- Introduces a new endpoint to retrieve mappings from Polars data types to SQL types for various SQL dialects, improving support for type-aware file imports and downstream table creation. - Refactors type mapping logic for reusability and maintainability, and adds serializer validation for dialect selection.
583a86d
to
708e75a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Introduces a new Importer module to support local file uploads, metadata/header detection, previews, and SQL type mappings.
- Adds importer configuration (
IMPORTER
) tohue.ini
, pseudo-distributed template, andconf.py
. - Implements serializers, unit tests, and API endpoints (
upload
,guess_metadata
,guess_header
,preview
,sql_type_mapping
). - Updates requirements to include
polars[calamine]
andpython-magic
, and integrates the importer into URL routing and theget_config
response.
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
desktop/core/src/desktop/lib/importer/serializers.py | New serializers for file upload, metadata guessing, preview, header guessing, and SQL type mapping |
desktop/core/src/desktop/lib/importer/serializers_tests.py | Unit tests covering valid/invalid cases for all importer serializers |
desktop/core/src/desktop/lib/importer/api.py | API views for importer operations with error handling |
desktop/core/src/desktop/lib/importer/init.py | Initializer for the importer module |
desktop/core/src/desktop/conf.py | Programmatic IMPORTER config section definition |
desktop/core/src/desktop/api_public_urls_v1.py | Routes for new importer endpoints |
desktop/core/src/desktop/api2.py | Includes importer settings in the application config response |
desktop/core/generate_requirements.py | Added polars[calamine] , python-magic , and moved setuptools |
desktop/core/base_requirements.txt | Added new dependencies for file processing and type detection |
desktop/conf/pseudo-distributed.ini.tmpl | Documentation for importer config options |
desktop/conf.dist/hue.ini | Documentation for importer config options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clean! Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work
This pull request introduces a new "importer" feature, along with its configuration, API endpoints, and dependencies. The changes include adding the necessary configuration options, implementing the importer functionality, and updating related files to integrate the feature.
Importer Feature Implementation:
Configuration Options:
importer
section in configuration files (desktop/conf.dist/hue.ini
,desktop/conf/pseudo-distributed.ini.tmpl
) to enable the importer, set file size limits, and restrict file extensions for uploads. [1] [2]IMPORTER
configuration section indesktop/core/src/desktop/conf.py
to manage importer settings programmatically.API Endpoints:
desktop/core/src/desktop/api_public_urls_v1.py
to support importer operations, including file upload, metadata guessing, header detection, file preview, and SQL type mapping.desktop/core/src/desktop/lib/importer/api.py
for handling these operations.Dependency Updates:
polars[calamine]==1.8.2
andpython-magic==0.4.27
todesktop/core/base_requirements.txt
anddesktop/core/generate_requirements.py
to support file processing and type detection. [1] [2] [3]Integration and Refactoring:
API Integration:
desktop/core/src/desktop/api2.py
to include importer settings in the application's configuration response.desktop/core/src/desktop/api_public_urls_v1.py
.Codebase Organization:
__init__.py
todesktop/core/src/desktop/lib/importer
to initialize the importer module.