Pavel Grigoryev PavelGrigoryevDS

👋 Welcome! I'm Pavel

🧑‍💻 About me

I hold a higher technical education.
I specialize in data analysis with a focus on empowering informed decision-making.
By extracting insights from complex data sets, I help organizations make data-driven decisions that drive business growth and improvement.

🛠️ Languages and Tools

Programming Languages: Python, SQL (PostgreSQL, MySQL, ClickHouse), NoSQL (MongoDB).
Data Analysis & Visualization:
- Libraries: Pandas, NumPy, SciPy, Statsmodels, Pingouin, Plotly, Matplotlib, Seaborn.
- Tools & Frameworks: Dash, Power BI, Tableau, Redash, DataLens, Superset.
Big Data & Distributed Computing: Apache Spark, Apache Airflow.
Machine learning and AI: Scikit-learn, MLlib.
Time Series Forecasting: Facebook Prophet, Uber Orbit.
Natural Language Processing: NLTK, SpaCy, TextBlob.
Web scraping: BeautifulSoup, Selenium, Scrapy.
DevOps: Linux, Git, Docker.
IDEs: VS Code, Google Colab, Jupyter Notebook, Zeppelin, PyCharm.

🎯 Skills

Deep data analysis:
- Preprocessing, cleaning, and identifying patterns using visualization to support decision-making.
Writing complex SQL queries:
- Working with nested queries, window functions, CASE and WITH statements for data extraction and analysis.
Understanding product strategy:
- Knowledge of product development and improvement principles, including analyzing user needs and formulating recommendations for its growth.
Product metrics analysis:
- LTV, RR, CR, ARPU, ARPPU, MAU, DAU, and other key performance indicators.
Conducting A/B testing:
- Analyzing results using statistical methods to evaluate the effectiveness of changes.
Cohort analysis and RFM segmentation:
- Identifying user behavior patterns to optimize marketing strategies.
End-to-End Data Pipelines:
- Building automated ETL processes from databases to dashboards with Airflow orchestration.
Data visualization and dashboard development:
- Creating interactive reports in Tableau, Redash, Power BI, and other tools for presenting analytics.
Web scraping:
- Experience in extracting data from websites using tools and libraries such as BeautifulSoup, Scrapy, and Selenium for information gathering and data analysis.
Working with big data:
- Experience with tools and technologies for processing large volumes of data (e.g., Hadoop, Spark).
Machine Learning Applications:
- Capable of building and applying machine learning models for data analysis tasks, including forecasting, classification, and clustering, to uncover deeper insights and enhance decision-making processes.
Business and Metric Forecasting:
- Building and interpreting time series forecasts for key business metrics using libraries like Uber Orbit and Facebook Prophet for intuitive, robust forecasting to support strategic planning and goal-setting.
Working with APIs:
- Integrating and extracting data from various sources via APIs.
Process Automation:
- Automating data workflows and routine tasks using Linux scripting, Apache Airflow and other DevOps tools.

🌟 Featured Projects

😎 Awesome Data Analysis

Key Methods: Knowledge Management | Critical Thinking | Research | Content Curation | Information Architecture

A curated knowledge hub demonstrating systematic approach to data analysis, reflecting expertise in structuring complex information and evaluating technical content.

Systematized 500+ resources into logical learning paths
Implemented quality control - Selected materials based on accuracy and relevance
Optimized for usability - Structured content for quick navigation
Enhanced user experience - Developed a web version to facilitate easy access and navigation
Synthesized fragmented knowledge into unified framework
Covered full analytics pipeline from fundamentals to deployment

🧩 Building Startup Analytics

Building analytics process for startup: infrastructure, dashboards, A/B testing, forecasting, automated reports, and anomaly detection.

Built complete data infrastructure from raw events to automated business intelligence
Designed interactive dashboards for real-time monitoring of user engagement and retention
Implemented rigorous A/B testing pipeline with statistical validation of feature experiments
Developed forecasting models for server load prediction and capacity planning
Created automated reporting system with daily Telegram delivery to stakeholders
Established real-time anomaly detection for proactive issue resolution
Enabled data-driven product decisions through comprehensive analytics ecosystem

🌊 Deep Sales Analysis of Olist E-Commerce

Comprehensive analysis of Brazilian e-commerce data, uncovering key insights and actionable business recommendations.

Time-series analysis of sales dynamics, seasonality, and trend decomposition
Anomaly detection in orders, payments, and delivery times
Customer profiling (RFM segmentation, clustering, geo-analysis)
Cohort analysis to track retention and lifetime value (LTV)
NLP processing of customer reviews (sentiment analysis)
Hypothesis validation involved conducting tests to verify data-driven assumptions.
Delivered strategic, data-backed recommendations to optimize logistics, enhance customer retention strategies, and drive sales growth.

🌐 WWI Data Pipeline and Dashboard

Key Methods: ETL Pipelines | Star Schema | Data Warehousing | Dashboard Design | SQL Optimization

End-to-end data pipeline and business intelligence solution for global distributor Wide World Importers.

Built automated ETL pipeline transforming OLTP data into optimized star schema data mart
Designed and implemented interactive dashboard for sales, logistics, and customer analytics
Developed daily automated data updates with Airflow DAG orchestration
Enabled data-driven decision making across sales, procurement, and logistics departments
Reduced manual reporting time through automated data consolidation and visualization

🚀 Frameon - Advanced Analytics for Pandas

Powerful pandas extension that enhances DataFrames with production-ready analytics while maintaining native functionality.

Seamlessly integrates exploratory analysis, statistical testing and visualization into pandas workflows
Provides instant insights through automated data profiling and quality checks
Enables cohort analysis with flexible periodization and metric customization
Offers built-in statistical methods (bootstrap, effect sizes, group comparisons)
Generates interactive visualizations with single-command access
Supports both DataFrame-level and column-specific analysis
Modular architecture allows extending with domain-specific methods
Preserves all native pandas functionality for backward compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly