-
Notifications
You must be signed in to change notification settings - Fork 28.7k
Insights: apache/spark
Overview
-
0 Active issues
-
- 0 Merged pull requests
- 76 Open pull requests
- 0 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
76 Pull requests opened by 48 people
-
[SPARK-42746][SQL][FIXUP] Fix optimizer failure for SortOrder in the LISTAGG function
#51117 opened
Jun 8, 2025 -
[SPARK-51168][BUILD] Test Hadoop 3.4.2
#51127 opened
Jun 9, 2025 -
Quiet "Unable to load native-hadoop library for your platform" message
#51136 opened
Jun 9, 2025 -
Better do idle python workers cleanup
#51143 opened
Jun 10, 2025 -
[SPARK-52439][SQL] Support creating check constraint with NULL
#51146 opened
Jun 10, 2025 -
[WIP][INFRA] Retry SBT compilation when OOM
#51149 opened
Jun 11, 2025 -
[SPARK-49110][SQL] Fix reading metadata columns for tables with CHAR columns
#51154 opened
Jun 11, 2025 -
[SPARK-52461] [SQL] Collapse inner Cast from DecimalType to DecimalType
#51169 opened
Jun 12, 2025 -
[WIP][SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode
#51192 opened
Jun 16, 2025 -
[SPARK-52509][K8S] Cleanup shuffles from fallback storage
#51199 opened
Jun 17, 2025 -
[SPARK-52508][CORE] Fallback storage retries FileNotFoundExceptions
#51200 opened
Jun 17, 2025 -
[SPARK-52506][CORE] Allow migrating to fallback storage only
#51201 opened
Jun 17, 2025 -
[SPARK-52507][CORE] Attempt to read missing block from fallback storage
#51202 opened
Jun 17, 2025 -
[SPARK-52505][K8S] Allow to create executor kubernetes service
#51203 opened
Jun 17, 2025 -
[SPARK-52495][SQL] Allow including partition columns in the single variant column
#51206 opened
Jun 17, 2025 -
Fabric Spark - EH Connector
#51211 opened
Jun 18, 2025 -
[SPARK-52444][SQL][CONNECT] Add support for Variant/Char/Varchar Literal
#51215 opened
Jun 18, 2025 -
[Test]
#51221 opened
Jun 19, 2025 -
[DRAFT][PYTHON] Improve Python UDF Arrow Serializer Performance
#51225 opened
Jun 19, 2025 -
[WIP][SPARK-51224][BUILD] Test Maven 4
#51230 opened
Jun 20, 2025 -
[SPARK-52544][SQL] Allow configuring Json datasource string length limit through SQLConf
#51235 opened
Jun 20, 2025 -
Initial commit
#51238 opened
Jun 21, 2025 -
[SPARK-51035][BUILD] Upgrade Janino to 3.1.12
#51239 opened
Jun 21, 2025 -
[SPARK-52401][SQL] Fix DataFrame.collect() cache invalidation after saveAsTable append; add regression test
#51240 opened
Jun 21, 2025 -
[SPARK-52401][SQL] Fix DataFrame.collect() cache invalidation after saveAsTable append; add regression test
#51241 opened
Jun 21, 2025 -
[SPARK-52563][PS] Fix var naming bug in _assert_pandas_almost_equal
#51253 opened
Jun 23, 2025 -
SPARK-52564 configuration changes not require deleting the checkpoint
#51264 opened
Jun 24, 2025 -
[SPARK-50603][SQL] Respect user-provided basePath for streaming file source reads without glob
#51267 opened
Jun 24, 2025 -
[SPARK-52565] [SQL] Enforce ordinal resolution before other sort order expressions
#51268 opened
Jun 24, 2025 -
[SPARK-37467][SQL] Consolidate subexpression elimination code for whole stage and non-whole stage
#51269 opened
Jun 24, 2025 -
[SPARK-37466][SQL] Support subexpression elimination in higher order functions
#51272 opened
Jun 24, 2025 -
[SPARK-51885][SQL] Change AnalysisContext.outerPlan from Option[LogicalPlan] to Seq[LogicalPlan]
#51274 opened
Jun 24, 2025 -
[SPARK-52575][SQL] Introduce contextIndependentFoldable attribute for Expressions
#51282 opened
Jun 25, 2025 -
[SDP] [SPARK-52577] Add tests for Declarative Pipelines DatasetManager with Hive catalog
#51283 opened
Jun 25, 2025 -
[CORE] Let LocalSparkContext clear active context in beforeAll
#51284 opened
Jun 25, 2025 -
[SPARK-52582][SQL] Improve the memory usage of XML parser
#51287 opened
Jun 26, 2025 -
[SPARK-50686][SQL] Hash to sort aggregation fallback - memory usage optimization
#51290 opened
Jun 26, 2025 -
[WIP][PYTHON] Arrow UDF for aggregation
#51292 opened
Jun 26, 2025 -
Fix-AQE-OOM
#51295 opened
Jun 26, 2025 -
[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of `replace` in ANSI mode
#51297 opened
Jun 26, 2025 -
[SPARK-52407][SQL] Add support for Theta Sketch
#51298 opened
Jun 27, 2025 -
[SPARK-52592][PS] Prevent error when creating ps.Series from ps.Series
#51300 opened
Jun 27, 2025 -
[SPARK-52598][DOCS] Reorganize Spark Connect programming guide
#51305 opened
Jun 27, 2025 -
[SPARK-52588][SQL] Approx_top_k: accumulate and estimate
#51308 opened
Jun 27, 2025 -
[SPARK-52593][PS] Avoid CAST_INVALID_INPUT of `Series.dot` and `DataFrame.dot` in ANSI mode
#51310 opened
Jun 27, 2025 -
[SPARK-52601][SQL] Support primitive types in TransformingEncoder
#51313 opened
Jun 28, 2025 -
[SPARK-46912][CORE] Using correct environment variables on workers of StandAlone cluster
#51314 opened
Jun 28, 2025 -
[MINOR][DOCS] Updated the docstring of DataStreamWriter.foreach() method
#51316 opened
Jun 29, 2025 -
[SPARK-52614][SQL] Support RowEncoder inside Product Encoder
#51319 opened
Jun 30, 2025 -
[WIP][SPARK-52622][PS] Avoid CAST_INVALID_INPUT of `DataFrame.melt` in ANSI mode
#51326 opened
Jul 1, 2025 -
[SPARK-52706][SQL] Fix inconsistencies and refactor primitive types in parser
#51335 opened
Jul 1, 2025 -
[WIP][SQL][TESTS] Disable stable column aliases in tests if assumed
#51337 opened
Jul 1, 2025 -
[SS][SPARK-52637] Fix version ID mismatch issue for RocksDB compaction leading to incorrect file mapping
#51340 opened
Jul 1, 2025 -
[SPARK-52638][SQL] Allow preserving Hive-style column order to be configurable
#51342 opened
Jul 1, 2025 -
[SPARK-52640][SDP] Propagate Python Source Code Location
#51344 opened
Jul 1, 2025 -
[SPARK-52409][SDP] Only use PipelineRunEventBuffer in tests
#51352 opened
Jul 2, 2025 -
[SPARK-52669][PySpark]Improvement PySpark choose pythonExec in cluster yarn client mode
#51357 opened
Jul 3, 2025 -
[SPARK-52673][CONNECT][CLIENT] Add grpc RetryInfo handling to Spark Connect retry policies
#51363 opened
Jul 3, 2025 -
[WIP][SPARK-52646][PS] Avoid CAST_INVALID_INPUT of `__eq__` in ANSI mode
#51370 opened
Jul 4, 2025 -
[SPARK-52686][SQL] `Union` should be resolved only if there are no duplicates
#51376 opened
Jul 4, 2025 -
[WIP] [SPARK-52689][SQL] Send DML Metrics to V2Write
#51377 opened
Jul 4, 2025 -
[SPARK-52659][SQL]Misleading modulo error message in ansi mode
#51378 opened
Jul 5, 2025 -
[SPARK-52545][SQL][DOCS] Update string literal docs for quote escaping rules
#51379 opened
Jul 5, 2025 -
[SPARK-52617][SQL]Cast TIME to/from TIMESTAMP_NTZ
#51381 opened
Jul 5, 2025 -
[SPARK-52691][BUILD] Upgrade ORC to 2.1.3
#51382 opened
Jul 5, 2025 -
[SPARK-52696][SQL] Strip `__is_duplicate` metadata after analysis
#51389 opened
Jul 7, 2025 -
[SPARK-52705][SQL] Refactor deterministic check for grouping expressions
#51391 opened
Jul 7, 2025 -
approx_top_k_combine
#51393 opened
Jul 7, 2025 -
[SPARK-52701][PS] Fix float32 type widening in `mod` with bool under ANSI
#51394 opened
Jul 7, 2025 -
[SPARK-52699][SQL] Support aggregating TIME type in interpreted mode
#51395 opened
Jul 8, 2025 -
[IN PROGRESS] Support getting pod state using Informers/Listers
#51396 opened
Jul 8, 2025 -
[SPARK-52703][INFRA][PS] Upgrade minimum python version of pandas api to 3.10
#51397 opened
Jul 8, 2025
36 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[SPARK-52187][SQL] Introduce Join pushdown for DSv2
#50921 commented on
Jul 7, 2025 • 43 new comments -
[SPARK-47547] BloomFilter fpp degradation
#50933 commented on
Jul 7, 2025 • 23 new comments -
[SPARK-48359][SQL] Built-in functions for Zstd compression and decompression
#46672 commented on
Jul 8, 2025 • 8 new comments -
[SPARK-42329][SQL]Assign name to _LEGACY_ERROR_TEMP_2256
#51113 commented on
Jun 17, 2025 • 6 new comments -
[SPARK-51554][SQL] Add the time_trunc() function for TIME datatype
#50607 commented on
Jun 11, 2025 • 5 new comments -
[SPARK-42841][SQL]Assign a name to the error class _LEGACY_ERROR_TEMP_2003
#51111 commented on
Jun 27, 2025 • 1 new comment -
[SPARK-51069][SQL] Add big-endian support to UnsafeRowUtils.validateStructuralIntegrityWithReasonImpl
#49773 commented on
Jun 27, 2025 • 1 new comment -
[SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations
#50742 commented on
Jun 27, 2025 • 1 new comment -
[SPARK-51831][SQL] Column pruning with existsJoin for Datasource V2
#51046 commented on
Jun 10, 2025 • 0 new comments -
[SPARK-52327][Core] Glob based provider for history server
#51045 commented on
Jun 23, 2025 • 0 new comments -
[SPARK-47404][SQL] Add configurable size limits for ANTLR DFA cache
#51069 commented on
Jun 12, 2025 • 0 new comments -
[SPARK-52334][CORE][K8S] update all files, jars, and pyFiles to reference the working directory after they are downloaded
#51037 commented on
Jun 21, 2025 • 0 new comments -
[SPARK-48660][SQL] Fix explain result for CreateTableAsSelect
#51013 commented on
Jun 17, 2025 • 0 new comments -
Increase report interval of spaming logs to 10 seconds
#51012 commented on
Jun 9, 2025 • 0 new comments -
[SPARK-52104][CONNECT][SCALA] Validate column name eagerly in Spark Connect Scala Client
#50873 commented on
Jun 10, 2025 • 0 new comments -
[SPARK-52024][SQL] Support cancel ShuffleQueryStage when propagate empty relations
#50814 commented on
Jun 24, 2025 • 0 new comments -
[SPARK-52012][CORE][SQL] Restore IDE Index with type annotations
#50798 commented on
Jul 8, 2025 • 0 new comments -
[WIP][SPARK-52011][SQL] Reduce HDFS NameNode RPC on vectorized Parquet reader
#50765 commented on
Jul 4, 2025 • 0 new comments -
[SPARK-51699][BUILD] Upgrade to Apache parent pom 34
#50627 commented on
Jun 9, 2025 • 0 new comments -
[SPARK-35564][SQL] Support subexpression elimination for conditionally evaluated expressions
#32987 commented on
Jun 24, 2025 • 0 new comments -
[SPARK-37019][SQL] Add codegen support to array higher-order functions
#34558 commented on
Jun 24, 2025 • 0 new comments -
[SPARK-44639][SS][YARN] Use Java tmp dir for local RocksDB state storage on Yarn
#42301 commented on
Jun 25, 2025 • 0 new comments -
[WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming
#42352 commented on
Jul 8, 2025 • 0 new comments -
[SPARK-22876][YARN] Respect YARN AM failure validity interval
#42570 commented on
Jul 3, 2025 • 0 new comments -
[SPARK-49984][CORE] Fix duplicate JVM options
#48488 commented on
Jun 24, 2025 • 0 new comments -
[SPARK-50292] Add MapStatus RowCount optimize skewed job
#48825 commented on
Jun 30, 2025 • 0 new comments -
[SPARK-49547][SQL][PYTHON] Add iterator of `RecordBatch` API to `applyInArrow`
#49005 commented on
Jul 4, 2025 • 0 new comments -
[BUILD] Upgrade `RoaringBitmap` to 1.4.1
#49710 commented on
Jun 24, 2025 • 0 new comments -
[SPARK-51332][SQL] DS V2 supports push down BIT_AND, BIT_OR, BIT_XOR, BIT_COUNT and BIT_GET
#50097 commented on
Jun 27, 2025 • 0 new comments -
[SPARK-51400] Replace ArrayContains nodes to InSet
#50170 commented on
Jun 19, 2025 • 0 new comments -
[WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0
#50213 commented on
Jun 20, 2025 • 0 new comments -
[SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files
#50215 commented on
Jul 7, 2025 • 0 new comments -
Enable -Xsource:3 compiler flag
#50474 commented on
Jun 28, 2025 • 0 new comments -
[SPARK-51745] Enforce State Machine for RocksDBStateStore
#50497 commented on
Jun 14, 2025 • 0 new comments -
[SPARK-51519][SQL] MERGE INTO/UPDATE/DELETE support join hint
#50524 commented on
Jun 16, 2025 • 0 new comments -
[SPARK-51765][DOCS] Docs for SQL Scripting
#50592 commented on
Jun 17, 2025 • 0 new comments