TableFunctions

Table Functions.

Time Series Analysis

Forecast

Signature:

Forecast(GroovyExpr DriverNames, // as list of Strings
         GroovyExpr BaseDrivers, // as a list of numeric values
         GroovyExpr measureExpr,
         // as a map of  String -> [numeric vals]
         // specifies the named points to forecast at and
         // the driver values at these points
         GroovyExpr driverForecast
        )

Returns the forecasted Msr for each point specified as a Table of Name, Measure.

Feed

Signature:

Feed(String openingColumn,
     String inColumn,
     String outColumn,
     String closingColumn
    )

Feed calculates the closing balance and “feeds” it to the opening balance of the next time period. Assumes input to function is partitioned by something other than Time, and data within each partitioned by a Time period: Qtr, Month etc. Opening data for the 1st period in each partition is provided. For e.g.

	Jan	Feb	Mar	April
Opening	5000
In	1000	2000	5000	2000
Out	3000	2000	7000	2000
Closing

Feed calculates the Opening(except the first) & Closing values for each period.

NPath

Signature: NPath(String Pattern, GroovyExpr Symbols, GroovyExpr Results)

Returns rows that meet a specified pattern. Use Symbols to specify a list of expressions to match. Pattern is used to specify a Path. The results list can contain expressions based on the input columns and also the matched Path.

Pattern: pattern for the Path. Path is ‘dot’ separated list of symbols. Each element is treated as a groovy expression. Elements that end in ‘*’ or ‘+’ are interpreted with the usual meaning of zero or more, one or more respectively. For e.g. “LATE.EARLY*.(ONTIME || EARLY)” implies a sequence of flights where the first occurence was LATE, followed by zero or more EARLY flights, followed by a ONTIME or EARLY flight.
Symbols: specify a map from names to expresssions. For e.g. <[LATE: “arrival\_delay > 0”, EARLY: “arrival\_delay < 0” , ONTIME : “arrival\_delay == 0”]> The names for symbols don’t need to be quoted as long as they are valid groovy names. When in doubt add quotes, for eg: <LATE: “arrival\_delay > 0 and arrival\_delay < 60”, “LATE*2”: “arrival\_delay >= 60” >
Results: specified as a list. Each entry can be just a string, or a list of 3 elems: [expr, type, name]. If an element is just a string, it is interpreted as a reference to a column in the input to this function or as a Symbol. When specified as a list the first element is interepreted as a groovy expression; the second is interpreted as a typename, and the third is the expression’s alias. For eg <[“weight”, [“2*weight”, “double”, ‘doubleWeight”]>. The expressions are evaluated in the context where all the input columns are available, plus the attributes “path”, “count”, “first”, and “last” are available. Path is a collection of nodes that represents the matching Path, count, first, last are convenience fns about the Path. Each node in the path exposes all the attributes of the corresponding input row.

Examples

Late Flights

list incidents where a Flight(to NY) has been more than 15 minutes late 5 or more times in a row.

	  from npath(<select origin_city_name, year, month, day_of_month, arr_delay, fl_num
			from flightsdata
			where dest_city_name = 'New York' and dep_time != ''>
	  		partition by fl_num
	  		order by year, month, day_of_month,
	  	'LATE.LATE.LATE.LATE.LATE+',
		<[LATE : "arr_delay \\> 15"]>,
		<["origin_city_name", "fl_num", "year", "month", "day_of_month",
				["(path.sum() { it.arr_delay})/((double)count)", "double", "avgDelay"],
				["count", "int", "numOfDelays"]
		]>)
	  select origin_city_name, fl_num, year, month, day_of_month, avgDelay, numOfDelays
	  into path='/tmp/wout'
	  serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
	  with serdeproperties('field.delim'=',')
	  format 'org.apache.hadoop.mapred.TextOutputFormat'

Hierarchical navigation

It is possible to treat the rows within a parition as a Table representation of a Level Hierarchy and to provide Navigation functions on the hierarchy. See Hierarchy Evaluation for details.

Allocations

Allocate

Signature:

Allocate( // represents the  amount to spread
          GroovyExpr allocateAmt, 
          // specifies the expr to use to evaluate
          // allocations at each point.
          GroovyExpr ratioExpr,
         )

Returns the allocated amount at each point. Assumes input to function is partitioned based on how amount needs to be spread. For eg a Budget over States in a Country based on Sales figures.

Depreciation

Special allocation algorithm, user can specify the ‘Sum Of Years’ or ‘Straight-Line’ method of depreciation. Can be used for Yearly & Quarterly depreiciation.

Signature:

Depreciate( // represents the  amount to depreciate
            GroovyExprbaseAmt, 
            String yearColumn,
            String qtrColumn,
            // 'sum of years' or 'straight line'
            String depreciationMethod 
           )

Returns the depriciation Amount for each year and quarter. Assumes input to function contains rows for each Year and Qtr, data is partitioned by Year, and data is sorted by Year and Qtr.

Misc. Functions

Data densification for Reporting

using partition right join
to a table function, provide a set of dimension lists
output original facts plus any missing facts.
for e.g. output sales for all weeks, even there are no sales for some products in certain weeks

Iterative Algorithms

Frequent Itemsets involve counting how often multiple events occur together. for e.g. how often someone has purchased milk & cereal. Input to itemsets is a collection of items, an itemset.(like the products a given customer has purchased). This is also a iterative algorithm where a n-item frequent itemset can be efficientlycomputed from a table of n-1 frquent itemsets. See Market Basket Analysis for a writeup on how we plan to provide this functionality.

Graph Algorithms like Shortest Path, Transitive closure can be exposed as Table Functions in SQL, and are also highly parallelizable. But these functions are iterative in that they run to a Fixed Point. This requires us to add an additional mechanics into the Engine to orchestrate computations to a fixed point. Stay tuned for support for these algorithms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TableFunctions

Table Functions.

Time Series Analysis

Forecast

Feed

NPath

Examples

Late Flights

Hierarchical navigation

Allocations

Allocate

Depreciation

Misc. Functions

Data densification for Reporting

Iterative Algorithms

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally