Skip to content

devonfw-forge/python-data-driven-decisions

 
 

Repository files navigation

Data Driven Decisions

Use Python, Pandas, Spark etc to demontrate that correlation can be used as a basis for decision making.

This project consists of finding the correlation between the GDP (Gross Domestic Product) and social and economical indicators, such as population growth, fertility rates, investment in specific sectors or prices.

The project will be developed by 2 teams in parallel. You can find more information in their main branches:


Execute the project

Execute the notebooks in the following order:

Data_load
Data_normalization
Data_outliers, Data_filling, Data_visualization.

This will create a series of output DataFrames as .csv files.

Explanation of the followed process

The Hypothesis: It is assumed that there exists a correlation between economic growth and indicators as infant mortality, access to education... We want to demonstarte the validity of this assumption based on available datasets.

In order to check the veracity of this hypothesis the following steps are going to be followed:

First step : Choose the indicators

In order to study the correlation between the economic indicators and some socio-demographic indicators, we have to choose the different indicators :

  1. Gdp from 1850 to 2020 in pounds

  2. Infant mortality of children under 5 years old

  3. Percentage of population age 15+ with tertiary schooling.

  4. Fertility rate

  5. gender inequality

  6. Life expectancy

I choose to measure the economic growth to compare the indicators with the GDP of the country.

2nd step : Select source of information

I chose to extract datasets about these indicators from the website Our world in data

About

Use Python, Pandas, Spark etc to demontrate that correlation can be used as a basis for decision making

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •