Skip to content

Support UDF that returns RowType #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Jan 25, 2020

Conversation

elvaliuliuliu
Copy link
Contributor

@elvaliuliuliu elvaliuliuliu commented Dec 19, 2019

This PR exposes new API to support udf that returns RowType. It will fix #117.

Please see below as an example:

var spark = SparkSession.Builder().GetOrCreate();
var df = spark.Range(0, 5);

Func<Column, Column> udf = Udf<int>(
    r => new GenericRow(new object[] { r + 100 }),
    new StructType(new[]
    {
        new StructField("id", new IntegerType())
    }));
          
// Assume that we have a df that has:
//+---+
// | id |
//+---+
// |  0 |
// |  1 |
// |  2 |
// |  3 |
// |  4 |
//+---+

var udfDf = df.Select(udf(df["id"]).As("udf_col"));

// PrintSchema() prints:
// root
// |-- udf_col: struct (nullable = true)
// |      |--  id: integer (nullable = true)
udfDf.PrintSchema();

// Show() prints:
// +--------+
// | udf_col|
// +--------+
// |  [100] |
// |  [101] |
// |  [102] |
// |  [103] |
// |  [104] |
// +--------+
udfDf.Show();

Udf that returns GenericRow looks like the following:

public static Func<Column, Column> Udf<T>(Func<T, GenericRow> udf, StructType returnType)
{
    return CreateUdf<GenericRow>(udf.Method.ToString(), UdfUtils.CreateUdfWrapper(udf), returnType).Apply1;
}

@elvaliuliuliu
Copy link
Contributor Author

@imback82 @suhsteve: For this WIP PR, I got the following error where RunSimpleWorker() has finished successfully, but I think error happened when sending bytes to C#(java.lang.Object cannot be cast to net.razorvine.pickle.IObjectConstructor). Any idea where could cause this error? Thanks!

image

@elvaliuliuliu elvaliuliuliu changed the title [WIP] Support UDF that returns RowType Support UDF that returns RowType Dec 24, 2019
@imback82 imback82 requested a review from suhsteve January 6, 2020 22:15
@imback82 imback82 added the enhancement New feature or request label Jan 6, 2020
Copy link
Member

@suhsteve suhsteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to git add /path/to/GenericRowPickler.cs. The file wasn't added with your latest commits.

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @elvaliuliuliu (and @suhsteve for review)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST]: UDF to return custom business objects
5 participants