Analytics functions in Hive
Hive provides the following set of analytical functions:
RANKDENSE_RANKROW_NUMBERPERCENT_RANKCUME_DISTNTILE
Common and useful sets of analytical functions are ranking functions where rows from resultset are ranked according to a scheme.
How to do it…
Let's analyze each function in detail. We will be using the same sales dataset and applying analytical functions to it:
ROW_NUMBER: This function will provide a unique number to each row in resultset based on theORDER BYclause within thePARTITION. For example, if we want to assignrow_numberto eachfname, which is also partitioned by IP address in thesalesdataset, the query would be:hive> select fname,ip,ROW_NUMBER() OVER (ORDER BY ip ) as rownum from sales;
RANK: It is similar toROW_NUMBER, but the equal rows are ranked with the same number. For example, if we useRANKin the previous query instead ofROW_NUM:hive> select fname,ip,RANK() OVER (ORDER BY ip) as ranknum, RANK() OVER (PARTITION BY ip order...