-
Notifications
You must be signed in to change notification settings - Fork 1
Description
A checklist of items that I plan to tackle. Please feel free to edit and open issues related to a certain topic
udfs to support
-
variant_to_json
- returns a JSON string from aVariantArray
-
json_to_variant
- returns aVariantArray
from a JSON string -
cast_to_variant
- returns aVariantArray
from a column -
variant_get(VariantArray, path)
- returns the extracted type dictated by thepath
from theVariantArray
-
is_variant_null(VariantArray)
- tests whether elements inVariantArray
areVariant::Null
-
variant_pretty(VariantArray)
- returns a human-readable version ofVariantArray
-
variant_schema(VariantArray)
- returns the schema of a VariantArray -
variant_object_construct(key1, value1, [keyN, valueN])
-
variant_object_delete(VariantArray, VariantPath)
-
variant_object_insert(key, value)
-
variant_list_construct(value1, [valueN])
-
variant_list_insert(VariantArray, value)
-
variant_explode
-
variant_explode_outer
Databricks supports the following variant-related udfs: https://docs.databricks.com/gcp/en/sql/language-manual/sql-ref-functions-builtin#variant-functions. I think it would be very cool to achieve 1:1 parity with their functionality
misc
- Add a
examples/
directory that lists examples for every udf we support, we can make use of a sample JSONL dataset like: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page - Have integration tests, preferably using slt tests
- Add more unit tests
- Write documentation
- Write the README
- Have better error messages, especially when instructing users which arguments to pass
- Pick a license
questions
It would also be good to compile a list of questions/ideas/limitations that arise from interfacing with arrow's parquet-variant
libraries:
-
Why does
VariantArray
not implementIntoIter
? -
variant_to_json
should accept an optional format configurations. Currently the library dictates how to map specialized Variant types to JSON (e.g. timestamps are always formatted as a string) -
There is a lot of ceremony to go from a
Variant
to aVariantArray
. MaybeVariantArray
shouldimpl From<IntoIterator<Item = Variant>>
? -
When checking if 2
VariantArray
s are equal, it is a bit odd that it will panic when calling.value(i)
when the ith position has empty metadata and value? -
Why doesn't
VariantArray
implPartialEq
? -
Is there a reason why
VariantArray
doesn't implArray
? Seems this would be super nice