Skip to content

Releases: dimajix/flowman

0.23.1

29 Mar 04:52
Compare
Choose a tag to compare
  • github-154: Fix failing migration when PK requires change due to data type
  • github-156: Recreate indexes when data type of column changes
  • github-155: Project level configs are used outside job
  • github-157: Fix UPSERT operations for SQL Server
  • github-158: Improve non-nullability of primary key column
  • github-160: Use sensible defaults for default documenter
  • github-161: Improve schema caching during execution
  • github-162: ExpressionColumnCheck does not work when results contain NULL values
  • github-163: Implement new column length quality check

0.23.0

18 Mar 17:12
Compare
Choose a tag to compare

The main feature of this version is a significant improvement of the new documentation system, which now also includes column level lineage. The automatically generated documentation is a valuable artifact for both developers and business experts to improve the understanding of the data models and transformations. Flowman projects can also specify quality checks (like NOT NULL condition, foreign key relationships or arbitrary SQL expressions), which are not only included in the documentation but also executed on the real data.

Moreover support for SQL databases has been improved again with the introduction of temporary staging tables to perform updates within a transactional commit.

Detailed Changes

  • github-148: Support staging table for all JDBC relations
  • github-120: Use staging tables for UPSERT and MERGE operations in JDBC relations
  • github-147: Add support for PostgreSQL
  • github-151: Implement column level lineage in documentation
  • github-121: Correctly apply documentation, before/after and other common attributes to templates
  • github-152: Implement new 'cast' mapping

0.22.0

01 Mar 15:01
Compare
Choose a tag to compare
  • Add new sqlserver relation
  • Implement new documentation subsystem
  • Change default build to Spark 3.2.1 and Hadoop 3.3.1
  • Add new drop target for removing tables
  • Speed up project loading by reusing Jackson mapper
  • Implement new jdbc metric sink
  • Implement schema cache in Executor to speed up documentation and similar tasks
  • Add new config variables flowman.execution.mapping.schemaCache and flowman.execution.relation.schemaCache
  • Add new config variable flowman.default.target.verifyPolicy to ignore empty tables during VERIFY phase
  • Implement initial support for indexes in JDBC relations

0.21.2

24 Feb 16:53
Compare
Choose a tag to compare

Fix importing projects

0.21.1

24 Feb 12:33
Compare
Choose a tag to compare
  • flowexec now returns different exit codes depending on the processing result

0.21.0

26 Jan 14:48
Compare
Choose a tag to compare

This is a minor release with only few noticeable changes, but some internal refactorings.

  • Fix wrong dependencies in Swagger plugin
  • Implement basic schema inference for local CSV files
  • Implement new stack mapping
  • Improve error messages of local CSV parser

0.20.1

07 Jan 06:16
Compare
Choose a tag to compare
  • Implement detection of dependencies introduced by schema

0.20.0

05 Jan 16:32
Compare
Choose a tag to compare
  • Fix detection of Derby metastore to truncate comment lengths.
  • Add new config variable flowman.default.relation.input.columnMismatchPolicy (default is IGNORE)
  • Add new config variable flowman.default.relation.input.typeMismatchPolicy (default is IGNORE)
  • Add new config variable flowman.default.relation.output.columnMismatchPolicy (default is ADD_REMOVE_COLUMNS)
  • Add new config variable flowman.default.relation.output.typeMismatchPolicy (default is CAST_ALWAYS)
  • Improve handling of _SUCCESS files for detecting (non-)dirty directories
  • Implement new merge target
  • Implement merge operation for Delta relations
  • Implement merge operation for JDBC relations (only for some databases, i.e. MS SQL)
  • Add new config variable flowman.execution.target.useHistory (default is false)
  • Change the semantics of config variable flowman.execution.target.forceDirty (default is false)
  • Add new -d / --dirty option for explicitly marking individual targets as dirty

0.19.0

14 Dec 11:18
Compare
Choose a tag to compare
  • Add build profile for Hadoop 3.3
  • Add build profile for Spark 3.2
  • Allow SQL expressions as dimensions in aggregate mapping
  • Update Hive views when the resulting schema would change
  • Add new mapping cache command to FlowShell
  • Support embedded connection definitions
  • Much improved Flowman History Server
  • Fix wrong metric names with TemplateTarget
  • Implement more template types for connection, schema, dataset, assertion and measure
  • Implement new measure target for creating custom metrics for measuring data quality
  • Add new config option flowman.execution.mapping.parallelism

0.18.0

13 Oct 17:37
Compare
Choose a tag to compare
  • Improve automatic schema migration for Hive and JDBC relations
  • Improve support of CHAR(n) and VARCHAR(n) types. Those types will now be propagates to Hive with newer Spark versions
  • Support writing to dynamic partitions for file relations, Hive tables, JDBC relations and Delta tables
  • Fix the name of some config variables (floman.* => flowman.*)
  • Added new config variables flowman.default.relation.migrationPolicy and flowman.default.relation.migrationStrategy
  • Add plugin for supporting DeltaLake (https://delta.io), which provides deltaTable and deltaFile relation types
  • Fix non-deterministic column order in schema mapping, values mapping and values relation
  • Mark Hive dependencies has 'provided', which reduces the size of dist packages
  • Significantly reduce size of AWS dependencies in AWS plugin
  • Add new build profile for Cloudera CDP-7.1
  • Improve Spark configuration of LocalSparkSession and TestRunner
  • Update Spark 3.0 build profile to Spark 3.0.3
  • Upgrade Impala JDBC driver from 2.6.17.1020 to 2.6.23.1028
  • Upgrade MySQL JDBC driver from 8.0.20 to 8.0.25
  • Upgrade MariaDB JDBC driver from 2.2.4 to 2.7.3
  • Upgrade several Maven plugins to latest versions
  • Add new config option flowman.workaround.analyze_partition to workaround CDP 7.1 issues
  • Fix migrating Hive views to tables and vice-versa
  • Add new option "-j " to allow running multiple job instances in parallel
  • Add new option "-j " to allow running multiple tests in parallel
  • Add new uniqueKey assertion
  • Add new schema assertion
  • Update Swagger libraries for swagger schema
  • Implement new openapi plugin to support OpenAPI 3.0 schemas
  • Add new readHive mapping
  • Add new simpleReport and report hook
  • Implement new templates