Releases: dimajix/flowman
Releases · dimajix/flowman
0.23.1
- github-154: Fix failing migration when PK requires change due to data type
- github-156: Recreate indexes when data type of column changes
- github-155: Project level configs are used outside job
- github-157: Fix UPSERT operations for SQL Server
- github-158: Improve non-nullability of primary key column
- github-160: Use sensible defaults for default documenter
- github-161: Improve schema caching during execution
- github-162: ExpressionColumnCheck does not work when results contain NULL values
- github-163: Implement new column length quality check
0.23.0
The main feature of this version is a significant improvement of the new documentation system, which now also includes column level lineage. The automatically generated documentation is a valuable artifact for both developers and business experts to improve the understanding of the data models and transformations. Flowman projects can also specify quality checks (like NOT NULL condition, foreign key relationships or arbitrary SQL expressions), which are not only included in the documentation but also executed on the real data.
Moreover support for SQL databases has been improved again with the introduction of temporary staging tables to perform updates within a transactional commit.
Detailed Changes
- github-148: Support staging table for all JDBC relations
- github-120: Use staging tables for UPSERT and MERGE operations in JDBC relations
- github-147: Add support for PostgreSQL
- github-151: Implement column level lineage in documentation
- github-121: Correctly apply documentation, before/after and other common attributes to templates
- github-152: Implement new 'cast' mapping
0.22.0
- Add new
sqlserver
relation - Implement new documentation subsystem
- Change default build to Spark 3.2.1 and Hadoop 3.3.1
- Add new
drop
target for removing tables - Speed up project loading by reusing Jackson mapper
- Implement new
jdbc
metric sink - Implement schema cache in Executor to speed up documentation and similar tasks
- Add new config variables
flowman.execution.mapping.schemaCache
andflowman.execution.relation.schemaCache
- Add new config variable
flowman.default.target.verifyPolicy
to ignore empty tables during VERIFY phase - Implement initial support for indexes in JDBC relations
0.21.2
0.21.1
0.21.0
0.20.1
0.20.0
- Fix detection of Derby metastore to truncate comment lengths.
- Add new config variable
flowman.default.relation.input.columnMismatchPolicy
(default isIGNORE
) - Add new config variable
flowman.default.relation.input.typeMismatchPolicy
(default isIGNORE
) - Add new config variable
flowman.default.relation.output.columnMismatchPolicy
(default isADD_REMOVE_COLUMNS
) - Add new config variable
flowman.default.relation.output.typeMismatchPolicy
(default isCAST_ALWAYS
) - Improve handling of
_SUCCESS
files for detecting (non-)dirty directories - Implement new
merge
target - Implement merge operation for Delta relations
- Implement merge operation for JDBC relations (only for some databases, i.e. MS SQL)
- Add new config variable
flowman.execution.target.useHistory
(default isfalse
) - Change the semantics of config variable
flowman.execution.target.forceDirty
(default isfalse
) - Add new
-d
/--dirty
option for explicitly marking individual targets as dirty
0.19.0
- Add build profile for Hadoop 3.3
- Add build profile for Spark 3.2
- Allow SQL expressions as dimensions in
aggregate
mapping - Update Hive views when the resulting schema would change
- Add new
mapping cache
command to FlowShell - Support embedded connection definitions
- Much improved Flowman History Server
- Fix wrong metric names with TemplateTarget
- Implement more
template
types forconnection
,schema
,dataset
,assertion
andmeasure
- Implement new
measure
target for creating custom metrics for measuring data quality - Add new config option
flowman.execution.mapping.parallelism
0.18.0
- Improve automatic schema migration for Hive and JDBC relations
- Improve support of CHAR(n) and VARCHAR(n) types. Those types will now be propagates to Hive with newer Spark versions
- Support writing to dynamic partitions for file relations, Hive tables, JDBC relations and Delta tables
- Fix the name of some config variables (floman.* => flowman.*)
- Added new config variables
flowman.default.relation.migrationPolicy
andflowman.default.relation.migrationStrategy
- Add plugin for supporting DeltaLake (https://delta.io), which provides
deltaTable
anddeltaFile
relation types - Fix non-deterministic column order in
schema
mapping,values
mapping andvalues
relation - Mark Hive dependencies has 'provided', which reduces the size of dist packages
- Significantly reduce size of AWS dependencies in AWS plugin
- Add new build profile for Cloudera CDP-7.1
- Improve Spark configuration of
LocalSparkSession
andTestRunner
- Update Spark 3.0 build profile to Spark 3.0.3
- Upgrade Impala JDBC driver from 2.6.17.1020 to 2.6.23.1028
- Upgrade MySQL JDBC driver from 8.0.20 to 8.0.25
- Upgrade MariaDB JDBC driver from 2.2.4 to 2.7.3
- Upgrade several Maven plugins to latest versions
- Add new config option
flowman.workaround.analyze_partition
to workaround CDP 7.1 issues - Fix migrating Hive views to tables and vice-versa
- Add new option "-j " to allow running multiple job instances in parallel
- Add new option "-j " to allow running multiple tests in parallel
- Add new
uniqueKey
assertion - Add new
schema
assertion - Update Swagger libraries for
swagger
schema - Implement new
openapi
plugin to support OpenAPI 3.0 schemas - Add new
readHive
mapping - Add new
simpleReport
andreport
hook - Implement new templates