-
Notifications
You must be signed in to change notification settings - Fork 17
TableFunctions
Signature:
Forecast(GroovyExpr DriverNames, // as list of Strings GroovyExpr BaseDrivers, // as a list of numeric values GroovyExpr measureExpr, // as a map of String -> [numeric vals] // specifies the named points to forecast at and // the driver values at these points GroovyExpr driverForecast )
Returns the forecasted Msr for each point specified as a Table of Name, Measure.
Signature:
Feed(String openingColumn, String inColumn, String outColumn, String closingColumn )
Feed calculates the closing balance and “feeds” it to the opening balance of the next time period. Assumes input to function is partitioned by something other than Time, and data within each partitioned by a Time period: Qtr, Month etc. Opening data for the 1st period in each partition is provided. For e.g.
Jan | Feb | Mar | April | |
Opening | 5000 | |||
In | 1000 | 2000 | 5000 | 2000 |
Out | 3000 | 2000 | 7000 | 2000 |
Closing |
Feed calculates the Opening(except the first) & Closing values for each period.
Signature: NPath(String Pattern, GroovyExpr Symbols, GroovyExpr Results)
Returns rows that meet a specified pattern. Use Symbols to specify a list of expressions to match. Pattern is used to specify a Path. The results list can contain expressions based on the input columns and also the matched Path.
- Pattern
- pattern for the Path. Path is ‘dot’ separated list of symbols. Each element is treated as a groovy expression. Elements that end in ‘*’ or ‘+’ are interpreted with the usual meaning of zero or more, one or more respectively. For e.g. “LATE.EARLY*.(ONTIME || EARLY)” implies a sequence of flights where the first occurence was LATE, followed by zero or more EARLY flights, followed by a ONTIME or EARLY flight.
- Symbols
- specify a map from names to expresssions. For e.g. <[LATE: “arrival\_delay > 0”, EARLY: “arrival\_delay < 0” , ONTIME : “arrival\_delay == 0”]> The names for symbols don’t need to be quoted as long as they are valid groovy names. When in doubt add quotes, for eg: <LATE: “arrival\_delay > 0 and arrival\_delay < 60”, “LATE*2”: “arrival\_delay >= 60” >
- Results
- specified as a list. Each entry can be just a string, or a list of 3 elems: [expr, type, name]. If an element is just a string, it is interpreted as a reference to a column in the input to this function or as a Symbol. When specified as a list the first element is interepreted as a groovy expression; the second is interpreted as a typename, and the third is the expression’s alias. For eg <[“weight”, [“2*weight”, “double”, ‘doubleWeight”]>. The expressions are evaluated in the context where all the input columns are available, plus the attributes “path”, “count”, “first”, and “last” are available. Path is a collection of nodes that represents the matching Path, count, first, last are convenience fns about the Path. Each node in the path exposes all the attributes of the corresponding input row.
list incidents where a Flight(to NY) has been more than 15 minutes late 5 or more times in a row.
from npath(<select origin_city_name, year, month, day_of_month, arr_delay, fl_num from flightsdata where dest_city_name = 'New York' and dep_time != ''> partition by fl_num order by year, month, day_of_month, 'LATE.LATE.LATE.LATE.LATE+', <[LATE : "arr_delay \\> 15"]>, <["origin_city_name", "fl_num", "year", "month", "day_of_month", ["(path.sum() { it.arr_delay})/((double)count)", "double", "avgDelay"], ["count", "int", "numOfDelays"] ]>) select origin_city_name, fl_num, year, month, day_of_month, avgDelay, numOfDelays into path='/tmp/wout' serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' with serdeproperties('field.delim'=',') format 'org.apache.hadoop.mapred.TextOutputFormat'
It is possible to treat the rows within a parition as a Table representation of a Level Hierarchy and to provide Navigation functions on the hierarchy. See Hierarchy Evaluation for details.
Signature:
Allocate( // represents the amount to spread GroovyExpr allocateAmt, // specifies the expr to use to evaluate // allocations at each point. GroovyExpr ratioExpr, )
Returns the allocated amount at each point. Assumes input to function is partitioned based on how amount needs to be spread. For eg a Budget over States in a Country based on Sales figures.
Special allocation algorithm, user can specify the ‘Sum Of Years’ or ‘Straight-Line’ method of depreciation. Can be used for Yearly & Quarterly depreiciation.
Signature:
Depreciate( // represents the amount to depreciate GroovyExprbaseAmt, String yearColumn, String qtrColumn, // 'sum of years' or 'straight line' String depreciationMethod )
Returns the depriciation Amount for each year and quarter. Assumes input to function contains rows for each Year and Qtr, data is partitioned by Year, and data is sorted by Year and Qtr.
- using partition right join
- to a table function, provide a set of dimension lists
- output original facts plus any missing facts.
- for e.g. output sales for all weeks, even there are no sales for some products in certain weeks
Frequent Itemsets involve counting how often multiple events occur together. for e.g. how often someone has purchased milk & cereal. Input to itemsets is a collection of items, an itemset.(like the products a given customer has purchased). This is also a iterative algorithm where a n-item frequent itemset can be efficientlycomputed from a table of n-1 frquent itemsets. See Market Basket Analysis for a writeup on how we plan to provide this functionality.
Graph Algorithms like Shortest Path, Transitive closure can be exposed as Table Functions in SQL, and are also highly parallelizable. But these functions are iterative in that they run to a Fixed Point. This requires us to add an additional mechanics into the Engine to orchestrate computations to a fixed point. Stay tuned for support for these algorithms.