0% found this document useful (0 votes)
108 views5 pages

Abhilash Resume

Abhilash G is a Big Data Engineer with over 9 years of IT experience, specializing in the development and implementation of big data projects using technologies such as Hadoop, Spark, and AWS. He has extensive hands-on experience with Python and PySpark, managing cloud operations on platforms like Azure and AWS, and has led multiple project deliveries. His professional experience includes roles at Amgen, Evergy, and CNA, where he developed data ingestion frameworks, optimized data processing, and managed Hadoop clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views5 pages

Abhilash Resume

Abhilash G is a Big Data Engineer with over 9 years of IT experience, specializing in the development and implementation of big data projects using technologies such as Hadoop, Spark, and AWS. He has extensive hands-on experience with Python and PySpark, managing cloud operations on platforms like Azure and AWS, and has led multiple project deliveries. His professional experience includes roles at Amgen, Evergy, and CNA, where he developed data ingestion frameworks, optimized data processing, and managed Hadoop clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Abhilash G

Big Data Engineer


E: [email protected]
Ph:913-802-2191
LinkedIn: https://www.linkedin.com/in/abhilash-g-a7027b18b/

Summary:
 Over 9 years of IT experience in analysis, design, develop, implementation of applications running on
various platforms.
 Have good Hands-on Experience on development of Big data projects using Hadoop, Hive, Spark, and
MapReduce open-source tools/technologies.
 Hands of experience with Spark 3.0.2.
 Hands on experience in Python Pyspark programming on Cloudera, HortonWorks, Azure Databricks,
Hadoop Clusters, Aws EMR clusters, AWS Lambda functions and CFT’S
 Worked on python pyspark programming.
 Worked on Airflow python code for jobs.
 Created Clusters in Azure Data Bricks to run Python Notebooks.
 Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups, AWS CLI.
 Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations,
monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Hortonworks
HDP,Databricks
 Good working experience with python to develop Custom Framework for generating of rules (just like rules
engine). Developed Hadoop streaming Jobs using python for integrating python API supported
applications.
 Good experience of AWS Elastic Block Storage (EBS), different volume types and use of various types
of EBS volumes based on requirement.
 Configured Databricks clusters to terminate after inactivity for certain period of time.
 Created python scripts to start/Stop Clusters depending on the usage of the cluster.
 Implemented AWS provides a variety of computing and networking services to meet the needs of
applications
 Excellent working knowledge on Object Oriented Principles (OOP), Design & Development and have good
understanding of programming concepts like data abstraction, concurrency, synchronization, multi-
threading and thread communication, networking, security.
 Developed ETL with SCD’s, caches, complex joins with optimized SQL queries.
 knowledge in relational data base (DB2, MSSQL, Teradata, Oracle 8i/9i/10g/11i).
 Experience working in Agile environment.

Technical Skills:

Programming C, C++, JAVA 8, SCALA


Technologies

Frameworks Java, Spring, Jersey, JavaScript, CSS, LESS, HTML 5, JQuery, Apache CXF,
Angular
JS, Jasmine

1 | Page
Big Data Technologies Apache Spark 1.6/2.2, HDFS, Amazon S3, YARN, Apache Oozie, Apache Hive,
Cloudera Impala, Apache Cassandra

Markups HTML, CSS, XML, XSL

Storage Technologies SQL, PL/SQL, Stored Procedures, Triggers, CQL, Hive QL, Parquet

Operating Systems Microsoft Windows 2000/XP/Vista/7, Unix, Linux, OS X

Professional Experience
Client: Amgen Thousand Oaks, CA January 2021 - Present
Lead Big Data Engineer
Responsibilities:

 Created Framework jobs using pyspark to ingest data from multiple Source systems.
 Developed and executed data ingestion frameworks with PySpark, catering to diverse source systems.
 Orchestrated data pipeline executions through Databricks Scheduler.
 Engaged in business requirement collection, analysis, and the conceptualization of data products.
 Crafted, validated, and managed data pipelines, integrating sources such as smart sheets, Excel, and
databases to produce end products.
 Leveraged Spark with Python, employing DataFrames, Datasets, and Spark SQL API for expedited data
processing.
 Authored scripts for secure password management of service accounts via AWS Secret Manager.
 Streamlined real-time data ingestion from various systems utilizing AWS Data Migration Service and
Kinesis.
 Implemented file-based and batch job ingestion scripts.
 Managed cloud operations on Azure platforms, including Data Lake, Databricks, and Blob storage.
 Executed SPARK SQL queries for application data validation.
 Created and maintained Databricks Notebooks with PySpark for data cleansing post-ingestion.
 Initiated Vacuum jobs to eliminate unused clusters.
 Automated cluster termination scripts triggered by inactivity.
 Devised data ingestion scripts from box locations with Airflow DAGs.
 Successfully ingested, transformed, and stored batch files and database tables into parquet and Delta
formats.
 Integrated CICD pipelines for deploying Python code from Git repositories to Databricks.
 Retrieved data from third-party websites through API calls.
 Led multiple project deliveries as a Team Lead.
 Automated Databricks notebooks using SPARK SQL and Python for pipeline executions.
 Configured Spark clusters and optimized high concurrency clusters in Azure Databricks for efficient
data preparation.
 Utilized SQL, Python/PySpark, and relational databases for data querying and management across
various database systems.
 Integrated Python code with Plotly Dash apps for tableau environment hosting.
2 | Page
 Composed Python scripts for Plotly Dash applications.
 Validated application data with SPARK SQL queries.
 Synthesized Spark code with PySpark to enhance data processing speeds.
 Operated Azure cloud systems, handling data ingestion, transformation, and storage in Azure Data
Lake.
 Processed data from AWS S3 into Databricks Notebooks.
 Developed utility code for AWS S3 data ingestion using Boto functions.
 Employed AWS EC2 and S3 services for handling smaller datasets.

Client: Evergy, Kansas City, MO October 2017 – December 2020


Hadoop Developer

Responsibilities:

 Migrate complex Map reduce programs, Hive scripts into Spark RDD transformations and actions.
 Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.
 Using MapReduce Job exported Batch file into AWS S3
 Experience in developing Spark Applications using Spark RDD, Spark-SQL and Dataframe APIs.
 Analyzed HBase data in Hive by creating external partitioned and bucketed tables
 Used Apache Hue web interface to monitor the Hadoop cluster and run the jobs.
 Used Oozie scheduler to submit workflows.
 Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such
as Map-Reduce, Spark, Hive, and Sqoop) as well as systems specific jobs (such as Python programs and shell
scripts).
 Experience in Performance Tuning and Debugging of existing ETL processes.
 Wrote python scripts to process semi-structured data in formats like JSON.
 Worked with UNIX shell scripting for enhancing the job performance.
 spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
 Created stored procedures and packages in Oracle as a part of the pre and Post ETL process.
 Export and Import data into HDFS and Hive using Sqoop.
 Design and develop ETL code using Informatica Mappings to load data from heterogeneous Source systems like
flat files, XML’s, MS Access files, Oracle to target system Oracle under Stage, then to data warehouse and then to
Data Mart tables for reporting.
 Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
 Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.
 Load data into Hive partitioned tables
 Create reports for the BI team using Sqoop to export data into HDFS and Hive.
 Managed and reviewed Hadoop log files.

Environment: Apache Spark 1.6.0/2.2.0, Apache Hive, Cloudera Impala, Amazon S3, AWS, Aurora, REST,
MySQL 5.6, Junit, Mockito, Linux, Cloudera 5.x,HBase,ApacheKafka 0.9.x/0.10.x,Swagger, Parquet, Git, Intellij
IDEA, Apache Oozie, Agile/Scrum, Beeline

3 | Page
Client: CNA Chicago, IL October 2015 – September
2017
Hadoop Developer

Responsibilities:

 Responsible to Load the data into Spark RDD and performed in-memory data computation to generate the
output response.
 Developed spark coding using python scripting to analyze the data we are getting from different sources
 Worked in writing Spark Sql scripts for optimizing the query performance.
 Contributed towards building Apache Spark applications using Python.
 Writing UDF/Map reduce jobs depending on the specific requirement.
 Created Hive schemas using performance techniques like partitioning and bucketing.
 Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the
Spark framework.
 Worked on migrating Map Reduce programs into Spark transformations using Spark and Pyspark.
 Used PL/SQL to write scripts to do batch updates to database and to generate reports from Database.
Environment: Hadoop, HDFS, Hive, Oozie, SqoopHbase, Flume, Spark, Scala, SQL Server, Eclipse, PyCharm,
Maven, JIRA and GitHub, Junit, Mockito, Linux, Cloudera 5.x, Tomcat, Jenkins, XSLT, XML, Tomcat, JMS,
Swagger, Parquet, Git, Maven, SQL Server, Intellij IDEA.

Client: Legacy Health Portland, OR Jan 2015 – September 2015


SQL Developer

Responsibilities:

 Involved in Business requirement gathering, Technical Design Documents, Business use cases and Data
mapping.
 Scheduled the SQL jobs and SSIS Packages using Tidal (Enterprise Scheduler).
 Designed a complex SSIS package for data transfer from three different firm sources to a single destination
like SQL Server 2008.
 Created packages using different transformations like pivot, condition split, fuzzy lookup, aggregate,
execute SQL flow task, data flow task to extract data from different databases and flat files.
 Used SSIS to create ETL packages (.dtsx files) to validate, extract, transform and load data to data
warehouse databases
 Wrote SQL scripts using Script task to insert/update and delete data in SQL database, in creating
configuration packages using C# Scripting.
 Deployed the SSIS package on various servers using package configuration from test server to production
servers.
 Automated the SSIS jobs in SQL scheduler as SQL server agent job for daily, weekly and monthly loads.
 Generated a report which identifies the performance efficiency of every component within the modules of
payroll.
 Deployed code into PROD environment through JIRA (Atlassian Product) based on the tasks (Business
Request) assigned.
 Experience in Design and Development of Applications based on.NET and MS SQL Server

4 | Page
Environment: MS SQL Server 2005/2008, MS SQL Server Integration Services (SSIS), Visual Studio 2008/2010,
SQL Server Reporting Services (SSRS).

5 | Page

You might also like