DS_BDA_QB_UG24
DS_BDA_QB_UG24
(Autonomous)
Dundigal, Hyderabad - 500 043
QUESTION BANK
COURSE OBJECTIVES:
The students will try to learn:
COURSE OUTCOMES:
After successful completion of the course, students should be able to:
CO 1 Explain the evolution of big data and big data analytics along with Understand
its characteristics and challenges included in traditional business
intelligence.
CO 2 Make use of appropriate components for processing, scheduling Apply
and knowledge extraction from large volumes the applications for
handling huge volume of data
CO 3 Develop a Map Reduce application for optimizing the jobs. Apply
CO 4 Develop the applications for handling huge volume of data using Apply
Pig Latin.
CO 5 Explain the importance of bigdata framework HIVE and its Understand
built-in functions, data types and services like DDL in Hadoop
distributed file system.
CO 6 Extend the big data technologies used to process and querying the Analyze
bigdata in Hadoop, MapReduce, Pig and Hive.
QUESTION BANK:
Q.No QUESTION Taxonomy How does this subsume CO’s
the level
MODULE I
INTRODUCTION TO BIGDATA
PART A-PROBLEM SOLVING AND CRITICAL THINKING QUESTIONS
1 Enumerate you are a senior Apply The learner to recall and CO1
faculty at a reputed understand the different
institute. The HOD has data types and types of
asked you to make a list of Digital Data in depth and
the unstructured data that then identify the
gets generated on the appropriate data type for
institution website that can each source.
then be stored and analyzed
to improve the website to
facilitate and enhance the
student’s learning.(features:
pdf and doc
files,forums,blogs,links,.xls
sheet, txt files,. wav files,
log files) Identify
appropriate data type for
each source of students
learning resources.
2 Interpret you have just got Understand The learner torecall the CO1
a book issued from the concept of RDBMS data
library. What are the details and interpret what kind of
about the book that can be data we do analysis as
placed in an RDBMS table? result.
3 Explain the process of Data Understand The learner to recallthe CO1
Preparation in Big Data to preprocessing steps and
enhance the better business textbfexplain the steps to
decision value. enhance the insight and
decision making.
Page 2
4 Indicate which V’s are Apply The learner to textbfrecall CO1
satisfied by real time big the definition of big data
data case study – Amazon characteristics-3V’s, relate
Click Stream with to real time case study.
justification.
5 Find out the same Understand The learner to recall and CO1
visualization tool that we relatethe visualization tools
run over conventional data for traditional BI and Big
warehouse, be used in Big Data.
Data environment?
6 Compare the traditional Understand The learner to recall the CO 1
analytics architecture and different architectural
modern database components of data
architecture? analytics and compare with
database architecture.
7 Interpret stock market Apply The learner torecall the CO1
predictions a case study, predictive analytics
elaborate on the Real Time concepts and relate to the
Analytics Platform(RTAP). given case study with
Present the assumptions appropriate assumptions.
mode.
8 Explain in detail about the Understand a) The learner to recall the CO 1
following: a) Multivariate different analysis approaches
analysis performed in Big in big data and explain
Data. b) Methods of multivariate analysis. b)
Stochastic search. The learner torecall the
different analysis approaches
in big data and explain the
stochastic search method
9 Identify the outliers in the Understand The learner to recall the CO 1
given data and write the outlier identification
different issues and techniques and explain
challenges associated in data different issues and
stream query processing. challenges in query
processing.
10 Represent in detail about a Apply The learner to recalland CO 1
cloud-based analytical tools summarizethe cloud-based
for specific big data solutions available in market
processing. for handling big data.
Identifycloud for Big Data
Development is a good
choice with available service
providers for specific
applications.
Page 3
PART-B LONG ANSWER QUESTIONS
1 Explain the ETL(Extract- Understand The learner to recall all the CO1
Transform-Load) process ETL operations and tools
concerning Big Data with and then describe its
neat sketch? functionality towards Big
Data.
2 Classify the types of Digital Understand The learner to define all CO1
Data and explain the the three types of Digital
sources of Digital Data with Data and outline its
aneat sketch. sources clearly.
3 Illustratethe Evolution of Understand The learner to recall the CO1
Big Data in detail? In definition of Big Data and
perspective of Doug Laney summarize the evolution
and a Gartner analyst of big data from primitive
coined the term “Big Data”. levels.
4 Summarize the challenges of Understand The learner to recall the CO1
Big Data in various phases different phases in big data
of process with a neat process and explain the
diagram? challenges included in each
phase.
5 Illustrate the basic Understand The learner to recall the CO2
characteristics and sources characteristics of big data
of Big Data? and showthe 5V’s of big
data characteristics along
with sources.
6 Annotate your comments. Remember — CO1
Why we need Big Data?
7 Recognize how is traditional Remember — CO1
Business Intelligent (BI)
environment different from
the Big Data environment?
8 Summarize a typical data Understand The learner to define forces CO1
warehouse environment on data warehouse concept
with the Big Data? with the big data sources
examples with a sketch.
9 Describe the term Big Data Understand The learner to recall and CO1
Analytics and what is explain the concept of
changing in the realms of analytics with its changing
Big Data? scenarios of data.
10 Explain the various Understand The learner to list and CO1
applications of Big Data explain the different
analytics and why this analytical applications in
sudden hype around Big the sudden hype.
Data Analytics?
Page 4
11 Classify the Big Data Understand The learner to recall the CO2
entails with first and second big data terms and relate
school of thoughts. tothe thoughts of analytics.
12 Classify the different Understand The learner to recall the CO1
analytics types such as various analyticstypes in
Analytics -1.0, Analytics terms of hindsight, insight
2.0, Analytics 3.0 with a and foresightand
neat diagram? textbfillustrate in neat
diagram.
13 List the top challenges Remember — CO1
facing Big Data in the
present scenario along with
Hadoop solutions.
14 Describe what kind of Understand The learner to recall and CO1
technologies are we looking explain about various
forward to meeting the technologies included to
challenges posed by Big meet the Big Data
Data? challenges.
15 Outline the various Understand The learner to recall the CO1
terminologies used in Big basic key terminology in big
Data environments with a data and explain in detail
neat diagram? with diagram.
16 Identify the various types of Apply The learner to recalland CO1
Analytics along with its summarize the different
impact. types ofanalytics with
respect to predictive and
prescriptive
analytics.Recognize the
impact of each in big data
processing
17 Outline the key questions to Understand The learner textbfrecall and CO 2
be answered by all the textbfexplainfew key
organizations stepping onto questions to transform
the analytics? fromthe storage of data to
the insights from these
analytics.
18 Recall the CAP theorem Remember — CO1
and how it is different from
ACID properties in a
distributed computing
environment?
Page 5
19 Explain how Big Data Understand The learner to recall the CO1
Analytics can be useful in concepts of big data
the development of smart analytics and explain the
cities and explain the landscape of technology
landscape of Big Data included in real time
Technology? applications related to
smart city development.
20 Compare SQL, No SQL and Understand The learner to recall SQL CO1
New SQL in detail? databases and compares
with two important
technologies No SQL and
New SQL databases.
PART-C SHORT ANSWER QUESTIONS
1 Recall the term data and Remember — CO 1
show its importance in
various data sets.
2 Define the term information Remember — CO 1
for data analysis.
3 Describe“BIG DATA” in Understand The learner to recall the CO2
simple terms and along with concept of data and
it’s significance. information clearly and
explainthe importance in
terms of GB,TB and PB.
4 List out various data Remember — CO1
formats that come under
Big Data?
5 Compare structured and Understand The learner to recall and CO2
unstructured data. Compare the different data
formats i.e., structured and
unstructured data in terms
of sources, size, speed etc.
6 Relate the different sources Understand learner to recall the sources CO2
of Big Data, which leads to of big data andunderstand
huge volumes. how it leads to the high and
huge volume of data.
7 Illustrate the characteristics Understand The learner to list the CO2
of data and its sensitivity characteristics of the data
for further enhancements? and explain the sensitivity
characteristics for future
enhancement.
8 Explain about the different Understand The learnerto recall and CO1
approaches to deal with Big explain different
Data? approaches in bigdata
processing.
Page 6
9 State few examples of Remember — CO1
human generated and
machine-generated data.
Mention in which category
your examples belong to?
10 Identify the benefits and Remember — CO1
importance of Big Data in
this modern world.
11 How does Big Data assist in Remember — CO1
Business Decision making?
12 Which programming Remember — CO1
language is preferred for
specific Big Data Processing
among R, Python or other
language.
13 Define the term Big Data Remember — CO1
Analytics. What’s the need
to store Data for Business
Analytics?
14 Define various kinds of Remember — CO1
projects are better suitable
for Big Data? Name top 3
domains where Big Data
projects are applicable.
15 Extend the adoption of Big Understand The learner to recall CO2
Data have impact on day to bigdata use cases and
day business operations relatethe power of Big Data
with different use cases. in various domains and at
various levels.
16 Define Big Data insight? Remember — CO1
How are Big Data and Data
Science related?
17 List the several Remember — CO1
methodologies to avoid over
fitting.
18 Compare the importance of Understand The learner to recall the CO1,CO2
business analysis and definitions of analysis and
analytics? analytics in comparison
withimportance of business
decision making.
19 How Outliers skew the Remember — CO1
result in the input data
which may affect the
behavior of the model.
Page 7
20 List the tools for Big Data Remember — CO1
Visualization?
MODULE II
INTRODUCTION TO HADOOP
PART-A PROBLEM SOLVING AND CRITICAL THINKING QUESTIONS
1 Describe the concept of Understand The learner to recall and CO4
Distributed and parallel explain the big data
computing challenges with a challenges in distributed
neat diagram? environment.
2 Compare between the Understand The learner to recall the CO4
Hadoop1.0 and Hadoop 2.0 basic features of different
architectures in detail? versions and compare at
architecture level.
3 Identify which technology is Apply The learnerto recall and CO5
used to import or export relatethe different file
the data from RDBMS to systems, data import and
file systems? export tools and identify
the appropriate tool for
translating from one to
another file systems.
4 Explain the four modules Understand The learner torecall and CO4
that make up the Apache explain the differentmodules
Hadoop framework? to the specified framework.
5 Describe the architecture of Understand The learner to recall the CO 4
Hadoop technology and components of Hadoop
Justify how it satisfies the framework and its
business insights functionality of each and
now-a-days? discuss how those are
helpful for describing
business insights.
6 Illustrate “Big Data is a Understand The learner to recall the CO5
buzz word!” and list a few various statistics related to
statistics to explorebig data bigdata creation and show
which is generated every in diagrammatic way.
day?
7 Explain the flow of data Understand The learner to recall CO4
generated from the IoT different data types and
devices towards Big Data explain the streamed data
through cloud computing process in importance of
services. current technologies.
Page 8
8 Accommodate the 500GB of Apply The learner to recall and CO5
data filewithin a cluster of explainthe Hadoop storage
commodity hardware and structure and block size
how the hadoop can constraints and construct a
overcome the challenge of cluster for given data
storing and processing the hardware requirement.
data?
9 Explain in detail about the Understand
The learner to recall the CO5
Hadoop YARN with an Hadoop controlling
example? components and explainthe
functionalities of YARN
components with example.
10 Interpret the integrated Understand The learner to recall and CO4
Hadoop systems offered by explainglimpse of the
leading market vendors with leading market vendors
Cloud-based Hadoop offering integrated Hadoop
solutions. system.
PART-B LONG ANSWER QUESTIONS
1 Explain Hadoop Understand The learner to recall the CO4
architecture and its components and explain
components with diagram. Hadoop architecture clearly.
2 Summarize the Hadoop Understand The learner to recall the CO4
Ecosystem role in different Different components in
use cases. hadoop and explain the role
in the solutions of specific
use cases efficiently.
3 With the help of Hadoop Understand The learner to recall the CO4
explain the processing of hadoop ecosystem and
Big Data and challenges in relate each component to
distributed and parallel process the big data.
computing environment?
4 Explain interacting process Understand The learner to recall the CO4
with Hadoop Ecosystem in big data technologies and
terms of various big data relate with the identified
processing technologies. interactions to process the
data.
5 Find out the 5 basic Remember —- CO4
problems facing in Big Data
and how to overcome the
challenges in Hadoop
through HDFS.
6 Illustrate with neat diagram Understand The learner to recall the CO4
about Hadoop and its hadoop features and show
features? with diagram.
Page 9
7 Recall the concept of Divide Remember — CO1
and conquer philosophy to
enrich the jobs efficiently.
8 DemonstrateDistributed Understand The learner to recall and CO4
processing is non-trivial. explain about distributed
environment.
9 Findout the Big Data Remember — CO1
storage as a challenge and
find a solution to overcome
through Hadoop system.
10 Demonstrate in detail about Understand The learner to recall the CO4
the history of Hadoop with hadoop evolutions and show
a neat sketch. in diagram.
11 How Apache Hadoop Remember —- CO4
Ecosystem technologies
draw a distributed efficient
responsibility.
12 Explain in detail about the Understand The learner to recall the CO4
Animal planet for Hadoop hadoop versions and explain
and list out the reasons for the history behind logos.
the specific animal.
13 Recall all the Apache Remember — CO4
Hadoop Ecosystem
technologies to map each
other with a neat sketch.
14 Recall all the Big Data Remember — CO4
storage and processing
elements and justify
whether Hadoop tackles
these challenges.
15 Extendthe core components Understand The learner to recall the CO4
of Hadoop with workflows core components and
in detail? explainthe components
briefly.
16 What are the different types Understand The learner to recall the CO4
of database management core components and
systems? explainthe components
briefly.
17 Explain is a default Understand The learner to Explain is a CO4
constraint default constraint.
18 Explain how would you find Understand The learner to Explain how CO4
the second highest salary would you find the second
from the below table? highest salary from the
below table.
Page 10
19 Explain how can we handle Understand The learner to Explain how CO4
expectations in SQL Server? can we handle expectations
in SQL Server? briefly.
20 Find out how many Understand The learner to Find out CO4
authentication modes are how many authentication
there? And what are they? modes are there? And what
are they?
PART-C SHORT ANSWER QUESTIONS
1 Memorize why Hadoop is Remember — CO1
called a Big Data
technology? How it
supports Big Data?
2 IdentifyBig Data is Understand The learner to recall CO2
encountered as a problem in adverse the term Big
the real time scenario? Dataand relatethe reasons
for this phenomenon.
3 List out various technologies Remember — CO1
came into an existence in
processing Big Data?
4 Recall the introduction Remember —- CO1
NoSQL
5 Summarize the challenges of Understand The learner to recall and CO2,CO3
Big Data with Hadoop relate the challenges of big
environment. data inHadoopenvironment
6 List out the advantages of Remember — CO2,CO3
NoSQL
7 Correlate the situation Remember — CO1
necessity for Hadoop arises
and why do we need
Hadoop.
8 Name some of the uses of Remember — CO2
NoSQL in industry
9 Compare the traditional Understand The learner to recall and CO2
RDBMS and Hadoop data compare the differences
bases? between traditional DBMS
system and Hadoop.
10 Recall the basic Remember — CO1
requirements that are to be
fulfilled with the structured
and unstructured data?
11 Recall the basic core Remember — CO2
components of Hadoop for
analyzing the data.
Page 11
12 Explain in detail about the Understand The learner to recalland CO3
Hadoop Cluster? explainthe Hadoop Cluster
components.
13 List the difference between Remember — CO4
SQL and NoSQL.
14 Name the Distributed remember — CO3
Computing challenges over
Big Data in Hadoop?
15 What are thevarious Remember — CO4
Hadoop Distributors for
processing Big Data?
16 What are thevarious Remember — CO4
Hadoop Distributors for
processing Big Data?
17 What is CRUD in Remember — CO4
MongoDB? Eplain with an
example
18 What is the usage of profiler Remember — CO4
in MongoDB?
19 List out the difference Remember — CO4
between MongoDB and
Redis database?
20 List out the difference Remember — CO4
between MongoDB and
Cassandra??
MODULE III
T HE HADOOP DISTRIBUTED FILESYSTEM
PART A-PROBLEM SOLVING AND CRITICAL THINKING QUESTIONS
1 Demonstrate as per the Understand The learner to recall the CO6
configuration, HDFS is in automatic
high availability mode with failovermaintenance and
automatic failover. Explain explain configuration
in brief about the daemon details.
which will take care of the
failover.
Page 12
2 Compare the setup of Understand The learner to know the CO6
YARN cluster where the resource allocation within
application memory the queues is controlled
available is 30GB with two separately.Compare with
companies Wipro and TCS. different schedules.
Wipro queue has 15GB
allocated and TCS queue
has 5GB allocated. Each
map task requires 25 GB
allocation. How does the
fair scheduler assign the
available memory resources
under the Dominant
Resource Fairness(DRF)
scheduler?
3 Illustrate the usual block Apply The learner to recall the CO2
size on an HDFS? Can we concept of block size in
make it much larger say HDFS and explain the
1GB and what are the limitations with the size
advantages that a block variations.Illustrate the
provides over a file system? advantages over distributed
file system.
4 Demonstrate what do you Understand The learner to recall the CO6
mean by High Availability single point of failure
in HDFS? What are failover without any manual
and fencing, and what role intervention and
do they play in making the demonstrate its features
system highly available. in the available system.
5 Examine the number of Apply The learner to recall the CO 2
spilled records from map total amount of memory
tasks far exceeds the size in a buffer and explain
number of map output the concept of heaps.
records. The child heap size Apply to tune io.sort. to
is 1GB and your io.sort.mb maximize the heap size in
value is set to 1000MB. How memory while sorting files.
would you tune your
io.sort? MB value to
achieve maximum memory
to I/0 ratio.
6 What are the components Remember — CO4
and characteristics of
HDFS?
Page 13
7 Illustrate the anatomy of Understand The learner to recall the CO 4
File Read in HDFS with a architecture of Hadoop and
neat sketch and elaborate illustrate the workflow of
the workflow from client to Hadoop and client
Hadoop framework. communication.
8 Illustrate the anatomy of Understand The learner to recall the CO 3
File Write in HDFS with a architecture of Hadoop and
neat sketch and elaborate Illustrate the workflow of
the File Writes processing Hadoop and client
methodology in the communication for file
distributed file system. writing.
9 Identifythe mechanism Apply The learner to recall the CO 6
ofheartbeat in HDFS and concept of heartbeat in the
justify the Name Node cluster and explain the
handles data nodes failures? data storage nodes in the
framework.Identify the
heartbeat signals to the
data nodes in a regular time
stamp.
10 Examine if we want to copy Apply The learner to recall the CO4
10 blocks from one machine concept of block replication
to another, but another and explain the fail over
machine can copy only 8.5 management and then apply
blocks, can the blocks be the given blacks to the
broken at the time of framework.
replication?
PART-B LONG ANSWER QUESTIONS
1 Explain in brief bout the Understand The learner to recall the CO6
Hadoop’s rack topology concept of rack and show
with the following terms: the topology with the
• Rack Awareness collection of multiple servers
• Fault Tolerance based on the requirements.
2 Outline the different ways Understand The learner should recall CO6
to overwrite the replication the Hadoop file system
factors in HDFS? commands and relate to file
writing.
3 Explain the importance of Understand The learner to textbfrecall CO6
Input Format and Record the input formats and relate
Reader in Hadoop? What the record readers and
are the various Input different formats of Hadoop.
Formats in Hadoop?
Page 14
4 Discuss the HDFS Understand The learner to define and CO6
Architecture and HDFS discuss the architecture of
Commands in brief. Write HDFS.
down the goals of HDFS.
5 How does HDFS ensure Understand The learner to recalland CO6
data Integrity in a Hadoop explainthe data integrity in
Cluster? Hadoop.
6 Discuss racks in Hadoop Understand The learner to define the CO6
Cluster? Explain how concept of racks in a cluster
Hadoop Clusters are and explain the cluster
arranged in several racks arrangement.
with a real time example?
7 Create a file in Understand The learner to recallthe CO6
HDFS.Explain the Anatomy anatomy of file read and
of a File Read and Write? write and explainthe
workflow in creating file.
8 Explain the following terms Understand The learner to recalland CO6
in detail: explainthe important nodes
a) Name Node in the Hadoop cluster.
b) Secondary Name Node
c) Data Node
d) Job Tracker
e) Task Tracker
9 Demonstrate the Streaming Understand The learner to nameand CO6
access pattern of HDFS explainthe different access
Hadoop Cluster? patterns and parameters.
10 Differentiate between the Understand The learner to recall CO6
basic File System and thebasic file system and
HDFS? explain with other file
systems.
11 Explainin detail about Understand The learner to define the CO6
Hadoop Cluster and the important nodes in the
Master – Slave architecture? cluster.Explain each
12 Describein detail about the Understand The learner torecall the CO6
two types of “writes” n concept of HDFS and
HDFS? explain the anatomy of file
read /write?
13 Which modes can Hadoop Understand The learner to recall the CO6
be run in? List out the few different modes of Hadoop
features for each mode. and explain feature of each
Page 15
14 The default block size is Apply The learner torecall the CO4
64MB and the replication block and replication
factor is 3.Calculate no. concepts in Hadoop and
ofblocks allocated for a file explain the block allocation
having the size of 300MB? process. Apply on the given
file.
15 What is Name Node and Remember — CO1
Data Node? Explain how
many Name Nodes and
Data Nodes can run on a
single Hadoop cluster?
16 Define metadata and Understand The learner to recall the CO6
commodity hardware? Does basic terminologies of
commodity hardware Hadoop cluster and explain
include RAM? Is Name metadata and commodity
Node also commodity? hardware.
17 Explain how the NameNode Understand The learner to recall CO5
gets to know all the various nodes in hadoop and
available data node in the explain all the available
Hadoop cluster? nodes in the cluster
18 Explain HDFS Name Node Understand The learner to define and CO4
Federation, NFS Gateway, explain tthe various terms
Snapshots, Checkpoint and of HDFS.
Backups.
19 Bring out the concepts of Understand The learner to recall the CO6
HDFS block replication, Hadoop Cluster block
with an example? replications and explain
with example
20 Illustrate for each YARN Understand The learner to recall and CO5
job, the Hadoop framework determine the container log
generates a task log file, to be stored in the nodes.
where are Hadoop task log
files stored?
PART-C SHORT ANSWER QUESTIONS
1 On what concept the Remember – CO 4
Hadoop framework works?
2 What are the main Remember – CO 4
components of a Hadoop
Application?
3 What is Hadoop streaming? Remember – CO 5
4 What are the most Remember – CO 4
commonly defined input
formats in Hadoop?
Page 16
5 Define Hadoop and mention Remember – CO 2
its component?
6 Compare HDFS with Remember – CO 4
Network Attached Storage
(NAS).
7 How is HDFS fault tolerant? Remember – CO 4
8 Why do we use HDFS for Remember – CO 4
applications having large
data sets and not when
there are a lot of small files?
9 How do you define Rack Remember – CO 4
Awarenessin Hadoop?
10 What is the difference Remember – CO 6
between an HDFS Blockand
an Input Split?
11 Choose the block size and Remember — CO1
replication factor to
configure HDFS?
12 What are the benefits of Remember — CO6
block transfer?
13 Recall the term daemon and Understand The learner to recall the CO6
mention the 5 daemons in daemon and list out the
the Hadoop cluster? various daemons in the
cluster.
14 Define various modes of Remember — CO6
Hadoop?
15 Illustrate the client Understand The learner to recall the CO6
communication with HDFS? workflow concept and show
the communications with
the Hadoop cluster.
16 Explain about the file Understand The learner to recall the CO6
permissions and data file permissions of HDFS.
integrity in HDFS?
17 What mechanism does Remember — CO1
Hadoop framework provide
to synchronize changes
made in Distribution Cache
during runtime of the
application?
18 Suppose Hadoop spawned Apply The learner to recall the CO4
100 tasks for a job and one Hadoop features and relate
of the tasks failed. What to the replication feature
will Hadoop do? and apply on task
monitoring.
Page 17
19 What is anInput Split and Remember — CO4
HDFS block?
20 What is the difference Remember -— CO3
between MapReduce engine
and HDFS cluster? What is
“Key-Value pair” in HDFS?
MODULE IV
UNDERSTANDING MAP REDUCING FUNDAMENTALS
PART A- PROBLEM SOLVING AND CRITICAL THINKING QUESTIONS
1 Discuss briefly about the job Understand The learner to define CO3
or application ID. How job derivative and explain the
history server is handling formula on log files.
the job details and brief
about logging and log files.
2 Explain the role of a Understand The learner to recall the CO3
combiner and partitioner in different derivatives and
a Map-Reduce job? Is the describe its jobs.
combiner triggered first or
the partitioner?
3 DiscussMapReduce runs on Understand The learner to define and CO3
top of yarn and utilizes explain how much
YARN containers to maximum memory each
schedule and execute its map and reduce task will
map and reduce tasks. take.
When configuring
MapReduce resource
utilization on YARN, what
are the aspects to be
considered?
4 Examine every hour Hadoop Apply The learner to recall the CO3
runs 100 jobs in parallel. Hadoop cluster and relate
Now currently, single job is the scheduler when the
running. How much of the single application is running
resource capacity of the may request entire
cluster will be used by this cluster.Identify the
running single job? resource capacity for
running single job.
5 Construct the MapReduce Apply The learner to recall and CO 3
job, under what scenario relatethe concept of
does a combiner get MapReduce jobs and
triggered? What are the options for the MapReduce
various options to reduce jobs in minimizing.Identify
the shuffling of data in a the optimal scenario.
map – reduce job?
Page 18
6 ExamineMapReduce job Apply The learner to recall and CO3
you consistently see that relatethe MapReducejobs
map tasks on your cluster and make consistently see
are running slowly because that MapReduce map tasks
of excessive garbage on your cluster.
collection of JVM. How do
you increase JVM heap size
property to 3GB to
optimize performance?
7 Explain the concept of joins Understand The learner to concept of CO 3
in MR jobs? Compare the joins in MapReduce jobs
various join processing and explain the mapper
methods? and reducer methods
8 Summarizethe reason why Understand The learner to recall the CO3
we can’t perform basic idea of mapper and
“aggregation” (addition) in reducer and explain the
mapper? Why do we need analytical process.
the “reducer” for this?
9 Write a MapReduce Apply The learner to recall the CO 3
program that mines weather basic constructs of
data. Hint: Weather sensors MapReduceprogram and
collecting data every hour Summarize the data and
at many locations across the identify suitable map and
globe gather a large volume reduce functions to perform
of log data, which is a good analysis.
candidate for analysis with
MapReduce, since it is semi
structured and record
oriented.
10 Make use of Hadoop Apply
The learner to recall the CO 3
MapReduce functions for MapReduce programming
implementing matrix functions and explain the
multiplication. applicability and develop
map and reduce functions to
perform
matrixmultiplication.
PART-B LONG ANSWER QUESTIONS
1 Explain Map-reduce Understand The learner to recall the CO3
framework in brief and architecture of cluster and
Draw the architectural explain the framework of
diagram for Physical MapReduce.
Organization of Compute
Nodes.
Page 19
2 Infer out the main features Understand The learner to recallthe CO6
of MapReduce and its concepts of MapReduce and
significance? explain the features and
itssignificance.
3 Describe the working of the Understand The learner to recall and CO3
MapReduce algorithm? relate the working principles
of MapReduce.
4 Explain working of following Understand The learner to recall all the CO3
phases of MapReduce with definitions of MapReduce
one common example. and explain in detail.
(i) Map Phase
(ii) Combiner Phase
(iii) Shuffle and Sort Phase
(iv)Reducer Phase.
5 Estimate the entire process Understand The learner to know the CO3
of data analysis conducted process of analytics and
in the MapReduce understand the
programming model? programming model.
6 Explain the description of Understand The learner to recall and CO3
MapReduce process for a explain the process for
specific case? analyzing and understand in
specific case.
7 Describe the uses of Understand The learner to recall and CO3
MapReduce? Define what explain the uses and
conditions must be met to conditions in MR jobs.
implement MapReduce
application?
8 Extend the MapReduce be Understand The learner to recall the CO3
used to solve any kind of MapReduce framework and
computational problems? if solve the computational
not, explain the cases where problems.
MapReduce is not
applicable?
9 Discuss some techniques to Understand The learner to define the CO3
optimize MapReduce jobs optimized techniques and
and the points you need to explain the designing of a
consider while designing a file.
file system in MapReduce?
10 Illustrate a short note on Understand The learner to recallInput CO3
Input Split and Explain the Split concepts and relate to
MapReduce application? the applications of
MapReduce.
Page 20
11 Classify a short note on Understand The learner to recall the CO4
Input Format and the File input File formats and
InputFormat class? demonstrate different types
for specific needs.
12 Explain the anatomy of a Understand The learner to recall and CO3
map-reduce job run? explain a clear assumptions
of data transformations.
13 Illustrate with diagram Understand The learner to recall the CO3
about how Hadoop uses concept of HDFS and state
HDFS staging directory as the directories in an MR
well as local directory jobs.
during a job run?
14 Demonstrate the map side Understand The learner to definethe CO3
join by comparing with a map side join and explain in
reduced side join in detail while comparing with
MapReduce programming? reduce side join in fulfilling
a job.
15 Explain in detail about the Understand The learner to recalland CO 3
few interesting facts about relate the basic facts
MapReduce. MapReduce and understand
the applications.
16 Illustrate how MapReduce Understand The learner to recall the CO 3
Engine Works ina step by MR framework and relates
step procedure? the step by step procedure
of MapReduce.
17 Explain how MapReduce Understand The learner to recallthe CO3
Works on Parallel Parallel Programming and
Programming Concept? explain MR phases.
18 Describe in detail about the Understand The learner to recall the 3 CO3
Driver class, map and classes and relate them to
reducer phases with a real the phases in the
time example? MapReduce.
19 Interpret the Data Locality Apply The learner to recall and CO3
Optimization in MR jobs? explain Data Locality
Optimization features and
apply them in the cluster.
20 Discuss theworkflow in a Understand The learner to recall the CO3
basic word count MR framework and explain
MapReduce program to steps to implement the
understand MapReduce MapReduce job.
Paradigm.
PART-C SHORT ANSWER QUESTIONS
Page 21
1 Recall the term Remember — CO3
MapReduce? Explain about
life cycle of MapReduce?
2 Visualize the terms Map Remember — CO3
Phase and Reducer Phase
and Differentiate the
measures in Sort and
shuffle?
3 Show the differences Remember — CO4
between Block and Input
split?
4 Tabulate what are the main Remember — CO3
classes of MR Job?
5 What are the basic Remember — CO3
parameters of a mapper and
reducer?
6 Summarize the naming Understand The learner to recall the CO3
conventions for output files MapReduce phases and
from Map phase and explain thenaming
Reduce Phase? conventions in different
phases
7 Recall the terms identity Remember — CO3
Mapper and Reducer and
state its computation?
8 Illustrate in detail isit Understand The learner to recall the CO3
mandatory to set input and Map and Reduce jobs and
output type/format in infer the input and output
MapReduce? formats.
9 What do you understand by Remember — CO3
TextInputFormat,
KeyValueTextInputFormat
and NLineOutputFormat?
10 What is RecordReader in Remember — CO3
MapReduce
11 Describethe term Remember — CO3
Combiner?
12 Define the NullWritable and Remember — CO3
how is it special from other
Writable data types?
13 Describe about the Mapper Remember — CO3
Output (intermediate
key-value data) stored?
14 What does a MapReduce Remember — CO3
partitioner do?
Page 22
15 Generalize the use of Understand The learner to recall the CO6
Context object? concept of containers and
name the use cases of
Context object.
16 What is role of distributed Remember — CO3
Cache in MapReduce
Framework?
17 DefineCustom Remember — CO3
Writable?What is a
Writable in Hadoop?
18 Recall about Data Locality Remember — CO3
in MapReduce?
19 Explain in what scenario Understand The learner to recallthe CO3
can the container be killed concept of container and
by the node manager? explain the node manager
responsibility.
20 Expresshow does a map task Understand The learner to recall the CO6
partition the output in the partitions and explainmap
case of multiple reducers? tasks in the concept of
reducers.
Page 23
MODULE V
INTRODUCTIN TO PIG AND HIVE
PART A-PROBLEM SOLVING AND CRITICAL THINKING QUESTIONS)
1 On what scenarios Analyze The learner torecall the CO5
MapReduce jobs will be concept of PIG Latin and
more useful than PIG. MapReduce, relate
Categorize the problems different scenarios and
which can only be solved by categorize each problem
MapReduce and cannot be based on Hadoop
solved by PIG? appropriate component to
solve.
2 I already register my Understand The learner to recall the CO5
LoadFunc / StoreFunc jars functions of PIG and relate
in ”register” statement, but to the given situation.
why I still get ”Class Not
Found” exception? Explain
the situation briefly.
3 Afile employee.txt in the Apply The learner to recall and CO5
HDFS directory with 100 relatethe PIG commands
records. To see only the and apply on the given file
first 10 records from the to get the result data.
employee.txt file. Illustrate
the results with appropriate
command.
4 Solve a statistical problem Apply The learner to recall the CO5
by calculating percentage concepts of PIG and
(partial aggregate / total explain the aggregations
aggregate) in PIG? and solve a given proven
problem.
5 Illustratedifferent types of Apply The Learner to recall the CO5
joins in Pig Latin with types of joins and relate the
examples on different data PIG Latin joins and apply
types. on different data types.
6 Discuss the Hive commands Understand The Learner to textbfrecall CO5
to create a table with four the commands of Hive and
columns: First name, last extend with creation of
name, age, and income? table.
7 A start-up company wants Understand The Learner to recall the CO 6
to use Hive for storing its Hive concepts in detail and
data. Discuss a shell explain the storage
command in Hive to list all capabilties.
the files in the current
directory?
Page 24
8 Explain a shell command in Understand
The Learner to recall the CO5
Hive to list all the files in commands in hive and
the current directory? relate to find the list of files.
9 Develop a PIG Latin Analyze The Learner torecall and CO 5
program for an application relate the programming in
of word count in a given file. PIG and develop an
application for word count.
10 Describe the importance of Understand The Learner to recall the CO5
partitions in Hive with an concept of partitions and
example? explain the importance of
partitions.
PART-B LONG ANSWER QUESTIONS
1 Explain briefly the Understand The learner to recall the CO5
difference between concept of PIG and relate
MapReduce and PIG? the parameters and
functions of both frame
works.
2 Discuss and explain PIG Understand The learner to define and CO5
structure and architecture discuss the automatic
in brief? optimizations.
3 Compare logical and Understand The learner to recall the CO 5
physical plans in Pig Latin? plans and compare logical
and physical plans in Pig
Latin.
4 Compare PIG and SQL for Understand The learner to recall the CO5
query optimization and query optimization concepts
significance? and compare PIG and
SQLin query optimization
and significance.
5 Outline the conditions and Understand The learner to recall the CO 5
Data Types in PIG? list of data types and
explain the conditions.
6 Explain Pig features for Understand The learner to recall and CO5
allowing grouping on relate the concepts of
expressions? grouping and the pig
expressions.
7 Describe in detail about the Understand The learner to recall the CO 4
scalar and complex data data types in specific ways
types in PIG? and explain the data types
in PIG.
8 Explain multi query Understand The learner to recall the all CO5
execution in PIG and its the operations and functions
operations? and explain the operations.
Page 25
9 Describe the Functions that UnderstandThe learner to recall and CO5
can be used in PIG and PIG relate the functions and
latin Schemas? schemasused in PIG and
PIG Latin.
10 Explain the UDF functions Understand The learner to recall the CO5
used in PIG with its functions in PIG and
description? explain the UDF functions
and its descriptions.
11 Explain in brief the Understand The learner to recall the CO5
architecture of Hive? architecture of Hive and
explain its components.
12 Discuss the various Hive Understand The learner torecall the CO5
services with an example? Hive services and explain
with examples.
13 Describe the various Hive Understand The learner to recall the CO5
Data types? data typesin HIVE and
explain each in detail.
14 Explain the Built-in Understand The learner to recallthe CO5
Functions in Hive? basic built in functions and
explainwith example.
15 Discuss the user defined Understand The learner to recall and CO5
functions in hive? relate the user parameters
and understand its
functions.
16 Explain about Collection Understand The learner to recall the CO5
data types in hive? hive data types and
explain collection data
types in detail.
17 Compare HIVE and PIG in Understand The learner to recall the CO5
detail? concepts of PIG and HIVE
and compare in detail.
18 Explain the procedure to Understand The learner to recall the CO5
load data in manage tables? tables and explain the data
transformations clearly.
19 Explain architecture of Understand The learner to recall the CO5
Apache Hive and various Apachehive architecture and
data insertion techniques in outline the various data
Hive with example. insertion techniques.
20 Describe Hive SQL Data Understand The learner torecall the CO5
Definition Language. DDL concepts and explain
queries in HIVE SQL DDL.
PART-C SHORT ANSWER QUESTIONS
1 List the advantages and Remember — CO5
uses of PIG.
Page 26
2 List out the features of PIG Remember — CO5
and different modes of
execution in PIG.
3 What is the need of Remember — CO5
MapReduce during PIG
programming?
4 Why should we use ‘distinct’ Remember — CO5
keyword in PIG scripts?
5 What is the importance of Remember — CO6
PIG use cases?
6 List the custom Data types Remember — CO6
in PIG and define briefly.
7 Describe the term inner bag Understand The learner to recall the CO6
and PIG in embedded scripts use in PIG and
mode. explain them in embedded
mode.
8 Illustrate the co-group Understand The learner to know the CO5
representations in PIG? groups concept and show
the elements in the field.
9 Discuss the keyword Understand The learner to recall the CO5
‘DEFINE’ like a function functions and explain the
name? parameters of the function.
10 Discuss the keyword Understand The learner to recall the CO5
‘FUNCTIONAL’ a User user’sperspectives and
Defined Function (UDF)? explain the keywords in
UDF.
11 Illustrate PIG Latin Understand The learner to recall the CO5
language is case-sensitive or semantics of PIG Latin and
not? What does FOREACH infer for all applications.
do?
12 List out the relational Remember — CO6
operations in PIG Latin?
13 Recall the importance of Remember — CO6
partioning and bucketing in
Hive?.
14 Illustrate the OrderBy and Understand The learner to recall the CO5
SortBy with anexample in functions in hive and relate
Hive? the orders in examples.
15 Explain the different kinds Understand The learner to recall the CO6
of tables in Hive? concept of tables in hive
and explain the tables.
16 How to create external table Remember — CO6
in hive?
Page 27
17 In Hive, explain the term Understand The learner to recall the CO6
‘aggregation’ and its uses? concept of DDL and explain
the aggregation and its uses.
18 Interpret joins with an Understand The learner to recall the CO6
example? joins in hive and explain
with example.
19 List out the Data types in Remember — CO6
Hive?
20 List out the Hive services Remember — CO6
with a neat sketch?
Page 28