SIH-PROJECT

Problem statement

MGNREGA program houses a large volume of data on various parameters across the country (excess of 50 TBs) consisting of year on year data starting primarily FY 2005-06 till date. As part of reporting and monitoring activities, the program has a large and complex reporting framework consisting of reports count in excess of 600 reports. It is required that a solution be devised which is able to streamline the reporting process (generation of reports) and is able to highlight/eliminate duplicate reports, properly categorize reports, highlight high/medium/low importance reports etc. An on the fly dynamic facility for generation of reports by selection required filters/parameters may also be conceptualized, developed and implemented with minimal gaps and errors. Sample Data Required: Yes (Reports, available in public domain)

Objective

Python Data Pre-processing using Spark Dataframe

Loading Data (Loading CSV file into HDFS, HDFS to Spark)
Exploring Data 2.1 Understaning Dataframe Schema 2.2 Obtaining summary statistics 2.3 GroupBy and Aggregation 2.4 Visualizing Data
Cleaning Data 3.1 Filtering Data
Streamlining the reporting data
Eliminating the duplicate report / categorize report
Highlighting the High/Medium/Low importance reports
Generation of reports by selection required filters/parameters.

Architecture Diagram

Output

To Start all the Daemons services

PySpark

Map Reduce

Map Reduce Task

To Streamline the Data

spark-submit spark_stream_main.py localhost 9999

spark_stream_main.py - mapreducing code of live data from locathost port 9999

Hadoop Web UI

Moving data from LocalFile to HDFS.

we can access it from any where.

Dealing with 50TB data using functions

process_data.py

Remove duplicate records.
Removing duplicate based on col
Sort by max job card holders state
Sort by min job card holders state
Sort by max SC people's hold job cards
Sort by min SC people's hold job cards
Sort by state names

Create a horizontal bar plot


df_pandas.plot(kind='barh', x='State Name', y='Job card Holders', colormap='winter_r')
plt.show()

After completion of process, we can store data into HDFS. It's secure and we can access from any where.while retriving the data, we can use the above filters agian.

predictive analytics

We are having huge amount of data. Based on this we can predict, what will be happen. Like this year 20,000 Job cards issued. So, what will be value of next year!. Here, we did predictive analytics by using fulldata.csv.
prediction analysis.ipynb - This file has code of predictive analytics. After that, we can plot graph to visuailize the output. Click Here! to see the predictive analytics report and graph.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
NC_SVCE_Mk202_Team_Tyro-master		NC_SVCE_Mk202_Team_Tyro-master
Solution file		Solution file
Sql files		Sql files
User interface		User interface
architectureImage		architectureImage
datasets		datasets
output		output
screenShots		screenShots
README.md		README.md
prediction analysis.ipynb		prediction analysis.ipynb
process_data.py		process_data.py
spark_stream_main.py		spark_stream_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIH-PROJECT

Problem statement

Objective

Architecture Diagram

Output

To Start all the Daemons services

PySpark

Map Reduce

Map Reduce Task

To Streamline the Data

Hadoop Web UI

Moving data from LocalFile to HDFS.

Dealing with 50TB data using functions

Create a horizontal bar plot

predictive analytics

About

Uh oh!

Releases

Packages

Languages

prawinrajan/Big-Data-Project

Folders and files

Latest commit

History

Repository files navigation

SIH-PROJECT

Problem statement

Objective

Architecture Diagram

Output

To Start all the Daemons services

PySpark

Map Reduce

Map Reduce Task

To Streamline the Data

Hadoop Web UI

Moving data from LocalFile to HDFS.

Dealing with 50TB data using functions

Create a horizontal bar plot

predictive analytics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages