0% found this document useful (0 votes)
105 views

Satya Sandeep - Data Engineer Resume

Uploaded by

venkat k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Satya Sandeep - Data Engineer Resume

Uploaded by

venkat k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Satya Sandeep Chintu

609-297-8787
[email protected]

Professional Summary:
● Over 12 years of professional IT experience, including working as a Big Data Engineer dealing with Apache Hadoop
Ecosystem such as HDFS, MapReduce, Hive, Sqoop, Oozie, HBase, Spark-Scala, Kafka, and Big Data Analytics.
● Experience in designing, implementing large scale data pipelines for data curation using Spark/Data Bricks along with
Python and Scala.
● Experience in Hadoop architecture and various components such as Job Tracker, Task Tracker, Name Node, Data
Node and MapReduce programming paradigm.
● Highly experienced in developing Hive Query Language and Pig Latin Script.
● Experienced in using distributed computing architectures such as AWS products (EC2, Redshift, and EMR, Elastic
search, Athena, and Lambda), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve
bigdata type problems.
● Strong experience writing, troubleshooting, and optimizing Spark scripts using Python, Scala.
● Experienced in using Kafka as a distributed publisher-subscriber messaging system.
● Strong knowledge on performance tuning of Hive queries and troubleshooting various issues related to
Joins, memory exceptions in Hive.
● Exceptionally good understanding of partitioning, bucketing concepts in Hive and designed both
Managed and External tables in Hive.
● Experience in importing and exporting data between HDFS and Relational Databases using Sqoop.
● Experience in real time analytics with Spark Streaming, Kafka and implementation of batch processing
using Hadoop, Map Reduce, Pig and Hive.
● Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL
Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database
access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
● Experienced in building highly scalable Big-data solutions using NoSQL column-oriented databases like
Cassandra, MongoDB and HBase by integrating them with Hadoop Cluster.
● Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation Worked with
● Extensive work on ETL consisting of data transformation, data sourcing, mapping, conversion and
loading data from heterogeneous systems like flat files, Excel, Oracle, Teradata, MSSQL Server.
● Experience of building ETL production pipelines using Informatica Power Center, SSIS, SSAS, SSRS.
● Proficient at writing MapReduce jobs and UDF’s to gather, analyse, transform, and deliver the data as
per business requirements and optimize the existing algorithms for best results.
● Experience in working with Data warehousing concepts like Star Schema, Snowflake Schema, Data
Marts, Kimball Methodology used in Relational and Multidimensional data modelling.
● Strong experience leveraging different file formats like Avro, ORC, Parquet, JSON and Flat files.
● Sound knowledge on Normalization and Denormalization techniques on OLAP and OLTP systems.
● Good experience with Version Control tools Bitbucket, GitHub, GIT.
● Experience with Jira, Confluence, and Rally for project management and Oozie, Airflow scheduling tools.
● Monitor portfolio performance, attribution, and benchmarks, making adjustments as needed to achieve
targeted returns.
● Conduct thorough research and analysis of financial markets, macroeconomic trends, and industry
sectors to provide actionable insights for trading decisions.
● Monitor news, economic indicators, and geopolitical events to identify potential market opportunities
and risks.
● Integrate data and workflows into the Quantexa platform, ensuring seamless connectivity and data
synchronization for advanced analytics and entity resolution.
● Configure and manage connectors, APIs, and data loaders within the Quantexa environment
● Experienced in Strong scripting skills in Python, Scala and UNIX shell.
● Involved in writing Python, Java APIs for Amazon Lambda functions to manage the AWS services.
● Experience in design, development and testing of Distributed Client/Server and Database applications
using Java, Spring, Hibernate, JSP, JDBC, REST services on Apache Tomcat Servers.
● Hands on working experience with RESTful API’s, API life cycle management and consuming RESTful
services.
● Have good working experience in Agile/Scrum methodologies, communication with scrum calls for
project analysis and development aspects.
● Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud DNS, Cloud
Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and
Implementation using GCP.
Satya Sandeep Chintu
609-297-8787
[email protected]

● Good understanding and ability to learn, apt new skills and good communicator with team spirit minded.
● Excellent interpersonal skills, team player, aggressive, hardworking and result oriented.
● Good understanding of Business workflow, Business logic and Business methods for further
implementation of user requirement in a distributed application environment.
Technical Skills:

Programming Python, Scala, SQL, Java, C/C++, Shell Scripting


Languages:
Web Technologies: HTML, CSS, XML, AJAX, JSP, Servlets, JavaScript
Big Data Stack: Hadoop, Spark, MapReduce, Hive, Pig, Yarn, Sqoop, Flume, Oozie,
Kafka, Impala, Storm
Cloud Platform: Amazon Web Services (AWS), Google Cloud Platform (GCP), Azure
Relational Oracle, MySQL, SQL Server, DB2, PostgreSQL, Teradata, Snowflake
databases:
NoSQL databases: MongoDB, Cassandra, HBase, Pig
Version Control Bitbucket, GIT, SVN, GitHub
Systems:
IDEs: PyCharm, IntelliJ IDEA, Jupyter Notebooks, Google Colab, Eclipse
Operating Systems: Unix, Linux, Windows

Professional experience:

Client: Ross Stores, Inc - Dublin, CA


Feb 2024 – Till Date
Role: Lead Data Engineer
Responsibilities:
 Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from Azure SQL, Blob
storage, and Azure SQL Data warehouse.
 Designed and implemented end-to-end data pipelines using Azure Data Factory to facilitate efficient data ingestion,
transformation, and loading (ETL) from diverse data sources into Snowflake data warehouse.
 Orchestrated robust data processing workflows utilizing Azure Databricks and Apache Spark for seamless large-scale data
transformations and advanced analytics improving data processing speed by 14%.
 Developed real-time data streaming capabilities into Snowflake by seamlessly integrating Azure Event Hubs and Azure
Functions, enabling prompt and reliable data ingestion.
 Created pipelines to run the notebooks using ADF on a scheduled basis.
 Stored streaming data in, Azure CLI, and Azure Data Lake Storage for decoupling ingestion from processing.
 Employed Azure Blob Storage for optimized data file storage and retrieval, implementing advanced techniques like
compression and encryption to bolster data security and streamline storage costs.
 Designed and implemented data processing workflows using Azure Databricks, leveraging Spark for large-scale data
transformations.
 Worked on Spreadsheet dataset to create a data load template for Workday integration.
 Demonstrated expert-level technical capabilities in areas of Azure Batch and Interactive solutions, and operational zing Azure
Data Lakes (ADLS), Azure IoT, Copilot end-to-end, MS Fabric, and Azure Cloud Analytics solutions.
 Experience in developing CI/CD (continuous integration and continuous deployment) and automation using Jenkins, Git,
docker, Kubernetes for ML model’s deployment.
 Integrated ADF with email Azure logic apps for sending email notifications.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data
Factory, data bricks, PySpark, Spark SQL and U-SQL, Azure CLI, Azure Data Lake Analytics.
 Understanding of structured data sets, data pipelines, ETL tools, data reduction, transformation and aggregation technique,
Knowledge of tools such as DBT, DataStage
 Created and managed different types of tables in Snowflake, such as transient, temporary, and persistent tables.
 Experience as Azure Data Engineer in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics,
Azure Analytical services, Redis, Azure Cosmos, Copilot, Databricks, MS Fabric, NO SQL DB, Big Data Technologies (Hadoop
and Apache Spark), and Data bricks, Azure IoT, migration.
 Develop ELT data pipeline to migrate applications using DBT and Snowflake framework.
 Built scalable and optimized Snowflake schemas, tables, and views to support complex analytics queries and reporting
Satya Sandeep Chintu
609-297-8787
[email protected]

requirements.
 Designed and implemented data pipelines for ingesting, transforming, and loading large datasets into BigID. (Highlight
specific tools used like Apache Spark or Airflow)
 Developed data ingestion pipelines using Azure CLI, Azure Event Hubs, and Azure Functions to enable real-time data
streaming into Snowflake.
 Utilized AWS SQS for decouple data ingestion from processing for scalability and reliability.
 Worked with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
 Developed real-time data streaming capabilities into Snowflake, enabling prompt and reliable data ingestion.
 Integrated Snowflake seamlessly with Power BI and Azure Analysis Services to deliver interactive dashboards and reports,
empowering business users with self-service analytics capabilities.
 Designed and implemented a real-time data streaming solution using Azure EventHub.
 Leveraged Snowflake's Time Travel feature, ensuring optimal data management and regulatory compliance.
 Proficient in Snowflake integration, integrated with different data connectors, Rest API’s and Spark.
 Conducted performance tuning and optimization activities to ensure optimal performance of Azure Logic Apps
 Developed a Spark Streaming application to process real-time data from various sources such as Kafka, and Azure Event
Hubs.
 Demonstrated hands-on experience in Azure Cloud Services, including Azure Synapse Analytics, SQL Azure, Data Factory,
Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.
 Created batch and streaming pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, and Pipelines to
efficiently extract, transform, and load data.
 Built a Spark Streaming application for real-time analytics on streaming data, leveraging Spark SQL to query and aggregate
data in real-time and visualize the results in Power BI or Azure Data Studio.
 Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the SQL Activity.
Performed data flow transformation using the data flow activity. Implemented Azure, self-hosted integration runtime in
ADF. Developed streaming pipelines using with Python.
 Deployed and developed various data pipelines extensively and intricately harnessing the power of C# .NET, MS Fabric, Azure
Data Factory.
 Working as Kubernetes Administrator, involved in configuration for web apps, Azure App services, Azure Application insights,
Azure Application gateway, Azure DNS, Azure traffic manager, App services.
 Transformed and copied data from JSON files stored in Data Lake Storage into Azure Synapse Analytics tables using Azure
Databricks, ensuring accurate and efficient data migration.
 Leveraged Azure Databricks, Azure Storage Account, and other relevant technologies for source stream extraction, cleansing,
consumption, and publishing across multiple user bases.
 Knowledge of PDI's features for data cleansing, validation, and enrichment.
 Familiarity with scheduling and monitoring ETL jobs in Pentaho Data Integration.
 Build streaming ETL pipelines using Spark Streaming to extract data from various sources, transform it in real-time, and load
it into a data warehouse such as Azure Synapse Analytics.
 Used tools such as, Azure CLI, Azure Databricks or HDInsight to scale out the Spark Streaming cluster as needed.
 Developed Spark API to import data into HDFS from Teradata and created Hive tables.
 Familiarity with data security and privacy practices, especially relevant to BigID focus on data governance.
 Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression.
 Involved in running all the Hive scripts through Hive on Spark and some through Spark SQL.
 Using the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE.
 Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and
analyzing the data in HDFS.
 Used DBT to debug complex chains of queries. They can be split into multiple models and macros that can be tested
separately.
 Developed Spark core and Spark SQL scripts using Scala for faster data processing.
 Performed data profiling and transformation on the raw data using Python.
 Mentor and guide analyst on building purposeful analytics tables in dbt for cleaner schemas.
 Implement RDD/Datasets/Data frame transformations in Scala through Spark Context and Hive Context
 Used Jira for bug tracking and Bitbucket to check in and check out code changes.
 Designed and implemented data archiving and data retention strategies using, Azure CLI, Azure Blob Storage and
Snowflake's Time Travel feature.
Environment: Azure Data Factory, Azure Databricks, Snowflake data warehouse, Azure Event Hubs, Azure Functions, Azure
Logic Apps, Azure Data Lake Storage, Azure Blob Storage, DBT (data build tool), Power BI, Kubernetes ARM Templates,
Satya Sandeep Chintu
609-297-8787
[email protected]

Terraform, Apache Purview, Apache Atlas, AWS Glue, AWS Sage Maker.

Client: ADP Private Ltd- India


Feb 2021 – Jul 2023
Role: Senior Data Engineer
Responsibilities:
 Developed data pipeline using Spark, Hive and HBase to ingest customer behavioural data and
financial histories into Hadoop cluster for analysis.
 Working Experience on Azure Databricks cloud to organize the data into notebooks and make it
easy to visualize data using dashboards.
 Performed ETL on data from different source systems to Azure Data Storage services
using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics. Data
Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure
Synapse) and processing the data in In Azure Databricks.
 Worked on managing the Spark Databricks by proper troubleshooting, estimation, and monitoring of
the clusters.
 Performed Data Aggregation, Validation, and Azure HD Insight using spark scripts written in
Python.
 Implemented Azure Stream Analytics for processing the real-time Geo-Spatial data for location-
based targeted sales campaigns.
 Performed monitoring and management of the Hadoop cluster by using Azure HDInsight.
 Created partitioned tables in Databricks using Spark and designed a data model using Azure
Snowflake Datawarehouse.
 Used Hive, Impala, and Sqoop utilities and Oozie workflows for data extraction and data
loading.
 Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
 Wrote Python scripts to parse XML documents and load the data in the database.
 Integrated Nifi with Snowflake to optimize the client session running.
 Optimized the performance of queries with modification in TSQL queries to Snow SQL,
established joins, and created clustered indexes.
 Created stored procedures to import data into the Elasticsearch engine.
 Worked with Data Governance, Data Quality, and Metadata Management team to understand
the project.
 Used Spark SQL to process a huge amount of structured data to aid in better analysis for our
business teams.
 Created HBase tables to store various data formats of data coming from different sources.
 Responsible for importing log files from various sources into HDFS using Flume.
 Development of routines to capture and report data quality issues and exceptional scenarios.
 Installed and configured Hadoop and was responsible for maintaining cluster and managing and
reviewing Hadoop log files.
 Involved in troubleshooting at database levels, error handling, and performance tuning of queries
and procedures.
 Worked on SAS Visual Analytics & SAS Web Report Studio for data presentation and reporting.
 Extensively used SAS/Macros to parameterize the reports so that the user could choose the
summary and sub-setting variables to be used from the web application.
 Developed data warehouse models in snowflake for various datasets using WhereScape.
 Responsible for translating business and data requirements into logical data models in support of
Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
 Provided thought leadership for architecture and the design of Big Data Analytics solutions for
customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations
and to implement Azure Data solution.
Environments: Big data, Spark, Azure, TSQL, SQL, Metadata, Unix

Client: Universal Electronics - India


Oct 2019 – Feb 2021
Role: Senior Data Engineer
Responsibilities:
 Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, Redshift, Lambda
and Glue).
Satya Sandeep Chintu
609-297-8787
[email protected]

 Working knowledge of Spark RDD, Data frame API, Data set API, Data Source API, Spark SQL
and Spark Streaming.
 Developed Spark Applications by using Python and Implemented Apache Spark data
processing Project to handle data from various RDBMS and Streaming sources.
 Worked with Spark for improving performance and optimization of the existing algorithms in
Hadoop.
 Using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD and Spark YARN.
 Used Spark Streaming APIs to perform transformations and actions on the fly for building
common.
 Learner data model which gets the data from Kafka in real time and persist it to Cassandra.
 Developed Kafka consumer API in python for consuming data from Kafka topics.
 Consumed Extensible Markup Language (XML) messages using Kafka and processed the
XML file using Spark Streaming to capture User Interface (UI) updates.
 Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
 Load D-Stream data into Spark RDD and do in memory data Computation to generate output
response.
 Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka
as a Data pipeline system.
 Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for data
sets processing and storage.
 Experienced in Maintaining the Hadoop cluster on AWS EMR.
 Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3
buckets using Elasticsearch and loaded data into Hive external tables.
 Configured Snow pipe to pull the data from S3 buckets into Snowflakes table.
 Stored incoming data in the Snowflakes staging area.
 Created numerous ODI interfaces and loaded into Snowflake DB.
 Worked on Amazon Redshift for shifting all Data warehouses into one Data warehouse.
 Good understanding of Cassandra architecture, replication strategy, gossip, snitches etc.
 Designed columnar families in Cassandra and Ingested data from RDBMS, performed
data transformations, and then exported the transformed data to Cassandra as per the business
requirement.
 Used the Spark Data Cassandra Connector to load data to and from Cassandra.
 Worked from Scratch in Configurations of Kafka such as Mangers and Brokers.
 Experienced in creating data-models for Clients transactional logs, analyzed the data from
Cassandra.
 Tables for quick searching, sorting and grouping using the Cassandra Query Language.
 Tested the cluster performance using Cassandra-stress tool to measure and improve the
Read/Writes.
 Used Hive QL to analyse the partitioned and bucketed data, Executed Hive queries on Parquet
tables.
 Used Apache Kafka to aggregate web log data from multiple servers and make them available
in downstream systems for Data analysis and engineering type of roles.
 Worked in Implementing Kafka Security and boosting its performance.
 Experience in using Avro, Parquet, RC file and JSON file formats, developed UDF in Hive.
 Developed Custom UDF in Python and used UDFs for sorting and preparing the data.
 Worked on Custom Loaders and Storage Classes in PIG to work on several data formats like
JSON, XML, CSV and generated Bags for processing using pig etc.
 Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.
 Written several Map Reduce Jobs using Pyspark, Numpy and used Jenkins for Continuous
integration.
 On cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
 Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, map R, HDFS, Hive, Pig,
Apache Kafka, Sqoop, Python, Pyspark, Shell scripting, Linux, MySQL Oracle Enterprise DB,
SOLR, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, Cassandra and Agile
Methodologies.

Client: Revalsys Technologies - India


Apr 2018 – Oct 2019
Satya Sandeep Chintu
609-297-8787
[email protected]

Role: Data Engineer


Responsibilities:
 Participate in requirement grooming meetings which involves understanding functional requirements
from business perspective and providing estimates to convert those requirements into software
solutions (Design and Develop & Deliver the Code to IT/UAT/PROD and validate and
manage data Pipelines from multiple applications with fast-paced Agile Development
methodology using Sprints with JIRA Management Tool).
 Responsible to check data in DynamoDB tables and to check EC2 instances are upon running for
 (DEV, QA, CERT and PROD) in AWS.
 Analysis on existing data flows and create high level/low level technical design documents for
business stakeholders that confirm technical design aligns with business requirements.
 Creation and deployment of Spark jobs in different environments and loading data to NOSQL
database Cassandra/Hive/HDFS. Secure the data by implementing encryption based.
 Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups,
Optimized volumes, and EC2 instances and created monitors, alarms, and notifications for EC2 hosts
using Cloud Watch.
 Developing code using Apache Spark and Scala, IntelliJ, NoSQL databases (Cassandra), Jenkins,
Docker pipelines, GITHUB, Kubernetes, HDFS file System, Hive, Kafka for streaming Real
time streaming data, Kibana for monitor logs etc. authentication/authorization to the data
Responsible to deployments to DEV, QA, PRE-PROD (CERT) and PROD using AWS.
 Scheduled Informatica Jobs through Autosys scheduling tool.
 Created quick Filters Customized Calculations on SOQL for SFDC queries, Used Data loader for ad
hoc data loads for Salesforce.
 Extensively worked on Informatica power centre Mappings, Mapping Parameters, Workflows,
Variables and Session Parameters.
 Responsible for facilitating load data pipelines and benchmarking the developed product with the set
performance standards.
 Used Debugger within the Mapping Designer to test the data flow between source and target and to
troubleshoot the invalid mappings.
 Worked on SQL tools like TOAD and SQL Developer to run SQL Queries and validate the
data.
 Study the existing system and conduct reviews to provide a unified review on jobs.
 Involved in Onsite & Offshore coordination to ensure the deliverables.
 Involving in testing the database using complex SQL scripts and handling the performance issues
effectively.
Environment: Apache spark, Scala, Cassandra, HDFS, Hive, GitHub, Jenkins, Kafka, SQL
Server 2008, Salesforce Cloud, Visio, TOAD, Putty, Autosys Scheduler, UNIX, AWS, WinSCP,
Salesforce data loader, SFDC Developer console.

Client: OMICS International- India


Nov 2016 – Mar 2018
Role: Data Engineer
Responsibilities:
 Involved in the implementation of the project went through several phases namely: data set
analysis, preprocessing data set, user-generated data extraction, and modeling.
 Participated in Data Acquisition with the Data Engineer team to extract historical and real-time data
by using Sqoop, Pig, Flume, Hive, MapReduce, and HDFS.
 Wrote user-defined functions (UDFs) in Hive to manipulate strings, dates, and other data.
 Performed Data Cleaning, features scaling, and features engineering using pandas, and Numpy
packages in Python.
 Process Improvement: Analyzed error data of recurrent programs using Python and devised a new
process to reduce the turnaround time of the problem's solutions by 60%.
 Worked on production data fixes by creating and testing SQL scripts.
 Deep dive into complex data sets to analyze trends using Linear Regression, Logistic Regression,
Decision Trees
 Prepared reports using SQL and Excel to track the performance of websites and apps
 Visualized data using Tableau to highlight abstract information
 Applied clustering algorithms i.e. Hierarchical, K-means using Scikit, and Scipy.
Satya Sandeep Chintu
609-297-8787
[email protected]

 Performed Data Collection, Data Cleaning, Data Visualization, and Feature Engineering using Python
libraries such as Pandas, Numpy, matplotlib, and Seaborn.
 Optimized SQL queries for transforming raw data into MySQL with Informatica to prepare structured
data for machine learning.
 Used Tableau for data visualization and interactive statistical analysis.
 Worked with Business Analysts to understand the user requirements, layout, and look of the
interactive dashboard.
 Used SSIS to create ETL packages to Validate, Extract, Transform, and Load data into a Data
Warehouse and Data Mart.
 The lifetime values were classified based on the RFM model by using an XG Boost classifier.
 Maintained and developed complex SQL queries, stored procedures, views, functions, and reports
that meet customer requirements using Microsoft SQL Server
 Participated in Building Machine Learning using python
Environment: Python, PL/SQL scripts, Oracle Apps, Excel, IBM SPSS, Tableau, Big Data,
HDFS, Sqoop, Pig, Flume, Hive, MapReduce, HDFS, SQL, Pandas, NumPy, Matplotlib,
Seaborn, ETL, SSIS, SQL Server, Windows.

Client: People Tech Group Inc- India


Aug 2015 – Nov 2016
Role: Data Engineer
Responsibilities:
 Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and
Parquet to HDFS cluster with compressed for optimization.
 Worked on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into
HDFS using Sqoop.
 Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV
files using Spark.
 Created environment to access Loaded Data via Spark SQL, through JDBC ODBC (via Spark Thrift
Server).
 Developed real time data ingestion/ analysis using Kafka / Spark-streaming.
 Configured Hive and written Hive UDF's and UDAF's Also, created Static and Dynamic with
bucketing as required.
 Worked on writing Scala programs using Spark on Yarn for analysing data.
 Managing and scheduling Jobs on a Hadoop cluster using Oozie.
 Created Hive External tables and loaded the data into tables and query data using HQL.
 Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective
querying on the log data.
 Developed Oozie workflow for scheduling and orchestrating the ETL process and worked on Oozie
workflow engine for job scheduling.
 Managed and reviewed the Hadoop log files using Shell scripts.
 Migrated ETL jobs to Pig scripts to do transformations, even joins and some pre-aggregations before
storing the data onto HDFS.
 Using Hive join queries to join multiple tables of a source system and load them to Elastic search
tables.
 Real time streaming, performing transformations on the data using Kafka and Kafka Streams.
 Built NiFi dataflow to consume data from Kafka, make transformations on data, place in HDFS &
exposed port to run Spark streaming job.
 Developed Spark Streaming Jobs in Scala to consume data from Kafka topics, made transformations
on data and inserted to HBase.
 Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
 Experience in managing and reviewing huge Hadoop log files.
 Collected the logs data from web servers and integrated in to HDFS using Flume.
 Expertise in designing and creating various analytical reports and Automated Dashboards to help
users to identify critical KPIs and facilitate strategic planning in the organization.
 Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting.
 Worked with Avro Data Serialization system to work with JSON data formats.
 Used Amazon Web Services (AWS) S3 to store large amounts of data in identical/similar
repositories.
Satya Sandeep Chintu
609-297-8787
[email protected]

Environment: Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie,
Pig, Nifi, Sqoop, AWS (EC2, S3, EMR), Shell Scripting, HBase, Jenkins, Tableau, Oracle,
MySQL, Teradata and AWS.

Client: iPrism Technologies - India Oct 2011 – Aug 2015


Role: Data Analyst

Responsibilities:
 Effectively led client projects. These projects contained a heavy Python, SQL, Tableau and data modelling.
 Performed data merging, cleaning, and quality control procedures by programming data object rules into a
database management system.
 Created detailed reports for management.
 Reported daily on returned survey data and thoroughly communicated survey progress statistics, data issues, and
their resolution.
 Involved in Data analysis and quality check.
 Extracted data from source files, transformed and loaded to generate CSV data files with Python programming
and SQL queries.
 Stored and retrieved data from data-warehouses.
 Created the source to target mapping spreadsheet detailing the source, target data structure and transformation
rule around it.
 Wrote Python scripts to parse files and load the data in database, used Python to extract weekly information
from the files, Developed Python scripts to clean the raw data.
 Worked extensively with Tableau Business Intelligence tool to develop various dashboards.
 Worked on datasets of various file types including HTML, Excel, PDF, Word and its conversions.
 Analysed data from company databases to drive optimization and improvement of product development,
marketing techniques and business strategies
 Performed Database and ETL development per new requirements as well as actively involved in improving overall
system performance by optimizing slow running/resource intensive queries.
 Developed data mapping documentation to establish relationships between source and target tables including
transformation processes using SQL.
 Participated in data modelling discussion and provided inputs on both logical and physical data modelling.
 Reviewed the Performance Test results to ensure all the test results meet requirement needs.
 Created master Data workbook which represents the ETL requirements such as mapping rules, physical Data
element structure and their description.
Environment: Oracle 10g, UNIX Shell Scripts, MS Excel, MS Power Point, Python, SQL.

You might also like