Skip to content

[ui-importer] Public API integration #4137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 54 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
cb8883d
[importer] Add new component and API endpoint with new directory stru…
Harshg999 Apr 1, 2025
e763d4c
[importer] Implement file upload API for CSV and Excel formats with v…
Harshg999 Apr 8, 2025
da13af9
Refactor importer API: remove unused import and delete obsolete templ…
Harshg999 Apr 8, 2025
25527a7
Refactors file format detection and metadata extraction
Harshg999 Apr 25, 2025
0b53a51
Add file metadata detection and update dependencies
Harshg999 Apr 25, 2025
355b7a4
Refactors file upload API for better separation of concerns
Harshg999 Apr 29, 2025
3ec9c7b
Refactor file metadata detection API and improve efficiency
Harshg999 Apr 29, 2025
5298263
Improves file metadata extraction and error handling
Harshg999 Apr 30, 2025
4749bc8
Improves file type detection with graceful magic lib fallback
Harshg999 Apr 30, 2025
92bb7d1
Adds file preview API for data import functionality
Harshg999 May 5, 2025
eb1e590
Merge branch 'master' of github.com:cloudera/hue into new-importer-wo…
ramprasadagarwal May 6, 2025
58a40d7
[ui-importer] Public API integration
ramprasadagarwal May 6, 2025
fb34811
[importer] Add new component and API endpoint with new directory stru…
Harshg999 Apr 1, 2025
bad25d1
[importer] Implement file upload API for CSV and Excel formats with v…
Harshg999 Apr 8, 2025
45cb6b7
Refactor importer API: remove unused import and delete obsolete templ…
Harshg999 Apr 8, 2025
5000db1
Refactors file format detection and metadata extraction
Harshg999 Apr 25, 2025
fd931d2
Add file metadata detection and update dependencies
Harshg999 Apr 25, 2025
8ddcea0
Refactors file upload API for better separation of concerns
Harshg999 Apr 29, 2025
a72b3b8
Refactor file metadata detection API and improve efficiency
Harshg999 Apr 29, 2025
30235e4
Improves file metadata extraction and error handling
Harshg999 Apr 30, 2025
d2552ad
Improves file type detection with graceful magic lib fallback
Harshg999 Apr 30, 2025
5ed73c2
Adds file preview API for data import functionality
Harshg999 May 5, 2025
649b789
Merge branch 'new-importer-working-dir' of github.com:cloudera/hue in…
ramprasadagarwal May 6, 2025
7b35fc8
[importer] Add new component and API endpoint with new directory stru…
Harshg999 Apr 1, 2025
f562dc9
[importer] Implement file upload API for CSV and Excel formats with v…
Harshg999 Apr 8, 2025
d7a3037
Refactor importer API: remove unused import and delete obsolete templ…
Harshg999 Apr 8, 2025
faf36c5
Refactors file format detection and metadata extraction
Harshg999 Apr 25, 2025
d2d3c81
Add file metadata detection and update dependencies
Harshg999 Apr 25, 2025
ad8ba89
Refactors file upload API for better separation of concerns
Harshg999 Apr 29, 2025
31a75f1
Refactor file metadata detection API and improve efficiency
Harshg999 Apr 29, 2025
938cee7
Improves file metadata extraction and error handling
Harshg999 Apr 30, 2025
6fceede
Improves file type detection with graceful magic lib fallback
Harshg999 Apr 30, 2025
04bcf00
Adds file preview API for data import functionality
Harshg999 May 5, 2025
f6438be
fix the api integration
ramprasadagarwal May 12, 2025
e316224
Merge branch 'new-importer-working-dir' of github.com:cloudera/hue in…
ramprasadagarwal May 12, 2025
22a4865
Merge branch 'master' of github.com:cloudera/hue into feat/importer-6
ramprasadagarwal Jun 6, 2025
8f843fb
revert extra changes
ramprasadagarwal Jun 6, 2025
cb8e35c
[importer] Refactor file format handling and add support for guessing…
ramprasadagarwal Jun 7, 2025
a37558e
[importer] Update API constants for file guessing and preview URLs
ramprasadagarwal Jun 10, 2025
68a2df2
Merge branch 'master' of github.com:cloudera/hue into feat/importer-6
ramprasadagarwal Jun 10, 2025
e1e2aa8
[importer] Enhance file format handling and update tests for EXCEL su…
ramprasadagarwal Jun 12, 2025
1ab4200
[test] Update test description for non-EXCEL file type in SourceConfi…
ramprasadagarwal Jun 12, 2025
55ba6fa
[test] Enhance tests for ImporterFilePreview and SourceConfiguration …
ramprasadagarwal Jun 24, 2025
324cd43
Merge branch 'master' of github.com:cloudera/hue into feat/importer-6
ramprasadagarwal Jun 24, 2025
457ec6d
[importer] fix the getDefaultTableName function
ramprasadagarwal Jun 25, 2025
a6b28be
[test] Refactor ImporterFilePreview tests to use act for rendering an…
ramprasadagarwal Jun 25, 2025
2725311
Merge branch 'master' of github.com:cloudera/hue into feat/importer-6
ramprasadagarwal Jun 25, 2025
38e1083
lint fix
ramprasadagarwal Jun 25, 2025
bfacaee
remove hardcoded defaultDialect
ramprasadagarwal Jun 26, 2025
5cf00fb
fix the tests mocked url
ramprasadagarwal Jun 26, 2025
e897946
Merge branch 'master' of github.com:cloudera/hue into feat/importer-6
ramprasadagarwal Jun 26, 2025
6884980
Merge branch 'master' of github.com:cloudera/hue into feat/importer-6
ramprasadagarwal Jun 27, 2025
44ae88d
fix test
ramprasadagarwal Jun 27, 2025
14f9248
Merge branch 'master' of github.com:cloudera/hue into feat/importer-6
ramprasadagarwal Jun 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Refactors file upload API for better separation of concerns
Extracts file processing logic into a separate operation module
Adds TSV to supported file formats
Updates API documentation with clearer descriptions

The change simplifies the upload endpoint by moving file handling logic
to a dedicated operation, making the code more maintainable and
easier to test.
  • Loading branch information
Harshg999 committed Apr 29, 2025
commit 355b7a4019e748e409c404f19fce2a294366079c
2 changes: 1 addition & 1 deletion desktop/core/src/desktop/api_public_urls_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@
]

urlpatterns += [
re_path(r'^importer/upload/file/?$', importer_api.upload_local_file, name='importer_upload_local_file'),
re_path(r'^importer/upload/file/?$', importer_api.upload_file, name='importer_upload_file'),
re_path(r'^importer/file/detect_metadata/?$', importer_api.detect_file_metadata, name='importer_detect_metadata_file'),
]

Expand Down
61 changes: 14 additions & 47 deletions desktop/core/src/desktop/lib/importer/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
from rest_framework.request import Request
from rest_framework.response import Response

from desktop.lib.importer.operations import local_file_upload
from desktop.lib.importer.serializers import LocalFileUploadSerializer

LOG = logging.getLogger()
Expand All @@ -58,18 +59,21 @@ def decorator(*args, **kwargs):
@api_view(['POST'])
@parser_classes([MultiPartParser])
@api_error_handler
def upload_local_file(request: Request) -> Response:
"""
Upload a local file and process it.
def upload_file(request: Request) -> Response:
"""Handle the local file upload operation.

This endpoint allows users to upload a file from their local system.
Uploaded file is validated using LocalFileUploadSerializer and processed using local_file_upload operation.

Args:
request: The request object containing the file to be uploaded.
request: Request object containing the file to upload

Returns:
Response: A response object containing the result of the upload.
Response containing the result of the upload operation

Raises:
ValidationError: If the file is not valid or if there are issues processing it.
Note:
- File size limits apply based on server configuration
- Supported file types: CSV, TSV, Excel
"""

# Validate the request data using the serializer
Expand All @@ -80,47 +84,10 @@ def upload_local_file(request: Request) -> Response:

upload_file = serializer.validated_data['file']

# Generate a unique filename
username = request.user.username
safe_original_name = re.sub(r'[^0-9a-zA-Z]+', '_', upload_file.name)
unique_id = uuid.uuid4().hex[:8]

filename = f"{username}_{unique_id}_{safe_original_name}"

# Process the file based on its type
result = process_uploaded_file(upload_file, filename)

return Response(result, status=status.HTTP_201_CREATED)
LOG.info(f'User {request.user.username} is uploading a local file: {upload_file.name}')
res = local_file_upload(upload_file, request.user.username)


def process_uploaded_file(upload_file, filename: str) -> Dict[str, Any]:
"""
Process the uploaded file and save it to a temporary location.

Args:
upload_file: The uploaded file object
filename: The base filename to use

Returns:
A dictionary containing the filename and file path
"""
# Create a temporary file with our generated filename
temp_dir = tempfile.gettempdir()
destination_path = os.path.join(temp_dir, filename)

try:
# Simply write the file content to temporary location
with open(destination_path, 'wb') as destination:
for chunk in upload_file.chunks():
destination.write(chunk)

return {'filename': filename, 'file_path': destination_path}

except Exception as e:
# Clean up the file if there was an error
if os.path.exists(destination_path):
os.remove(destination_path)
raise e
return Response(res, status=status.HTTP_201_CREATED)


@api_view(['POST'])
Expand Down
91 changes: 91 additions & 0 deletions desktop/core/src/desktop/lib/importer/operations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#!/usr/bin/env python
# Licensed to Cloudera, Inc. under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. Cloudera, Inc. licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import re
import csv
import uuid
import logging
import tempfile
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union

# TODO: Check if we need try/except for python-magic import because of libmagic
import magic
import polars as pl
from rest_framework import status
from rest_framework.decorators import api_view, parser_classes
from rest_framework.parsers import JSONParser, MultiPartParser
from rest_framework.request import Request
from rest_framework.response import Response

LOG = logging.getLogger()


def local_file_upload(upload_file, username: str) -> Dict[str, str]:
"""Uploads a local file to a temporary directory with a unique filename.

This function takes an uploaded file and username, generates a unique filename,
and saves the file to a temporary directory. The filename is created using
the username, a unique ID, and a sanitized version of the original filename.

Args:
upload_file: The uploaded file object from Django's file upload handling.
username: The username of the user uploading the file.

Returns:
Dict[str, str]: A dictionary containing:
- filename: The generated unique filename
- file_path: The full path where the file was saved

Raises:
ValueError: If upload_file or username is None/empty
Exception: If there are issues with file operations

Example:
>>> result = upload_local_file(request.FILES['file'], 'hue_user')
>>> print(result)
{'filename': 'hue_user_a1b2c3d4_myfile.txt', 'file_path': '/tmp/hue_user_a1b2c3d4_myfile.txt'}
"""
if not upload_file:
raise ValueError("Upload file cannot be None or empty")

if not username:
raise ValueError("Username cannot be None or empty")

# Generate a unique filename
safe_original_name = re.sub(r'[^0-9a-zA-Z]+', '_', upload_file.name)
unique_id = uuid.uuid4().hex[:8]

filename = f"{username}_{unique_id}_{safe_original_name}"

# Create a temporary file with our generated filename
temp_dir = tempfile.gettempdir()
destination_path = os.path.join(temp_dir, filename)

try:
# Simply write the file content to temporary location
with open(destination_path, 'wb') as destination:
for chunk in upload_file.chunks():
destination.write(chunk)

return {'filename': filename, 'file_path': destination_path}

except Exception as e:
# Clean up the file if there was an error
if os.path.exists(destination_path):
os.remove(destination_path)
raise e
5 changes: 3 additions & 2 deletions desktop/core/src/desktop/lib/importer/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@ class LocalFileUploadSerializer(serializers.Serializer):

def validate_file(self, value):
# Add file format validation
# TODO: To remove and allow all file formats?
extension = value.name.split('.')[-1].lower()
if extension not in ['csv', 'xlsx', 'xls']:
raise serializers.ValidationError("Unsupported file format. Please upload a CSV or Excel file.")
if extension not in ['csv', 'tsv', 'xlsx', 'xls']:
raise serializers.ValidationError("Unsupported file format. Please upload a CSV, TSV or Excel file.")

# TODO: Check upper limit for file size
# Add file size validation (e.g., limit to 150 MiB)
Expand Down