Analyzing data from Tech Careers Report (Landing.jobs' survey 2021).
The goal of this project is to analyze data from Landing.Jobs' survey 2021 by taking advantage of Statistics tools, Machine Learning and Data Visualization.
The analysis was performed using Jupyter Notebook with Python for mainly apply both Statistic and Machine Learning tools and Microsoft PowerBI for some Data Visualization.
Statistic
- Pearson's Correlation Coefficient
- Boxplot analysis
Machine Learning
- Association Rules
- K-Means Clustering
Data Visualization
- Radar Chart
- Distribution Tree
- Bar charts
- Geographic Map
- Area Chart
The Tech roles with less than 10% of female representativity are: CTO, Solutions Architect, DevOps Engineer, SysAdmin Engineer, Mobile Apps Developer and Full-Stack Developer. While, the Tech roles with more of female representativity (> 20%) are: Business Application, Data Scientist/Data Engineer, Project Manager, Quality Assurance/Testing, Scrum Master, UX/UI Designer.
Pearson's correlation coefficient is the test statistics that measures the statistical relationship or association, between two continuous variables. It is known as the best method of measuring the association between variables of interest because it is based on the method of covariance. In order to understand better our results, correlation coeficient is in the range from -1 to 1: negative coeficients show that there is a negative correlation (when one increases, the other one decreases); positive coeficients indicate that there is a positive correlation (both increase or decrease together); when the coeficient is 0, it means that there is not a linear correlation.
I performed the encoder of the categotical varibles in order to process the analysis.
encoder= ce.OrdinalEncoder(cols=['Working_Experience'],return_df=True,mapping=[
{'col':'Working_Experience','mapping':{'No working experience':0,'Less than 1 year':1,
'Between 1 - 3 years':2, 'Between 3 - 6 years':3, 'Between 6 - 9 years':4,
'More than 9 years':5}}])
The outcome is a symmetrical heatmap that indicates a positive correlation (increase and decrease together) between the salary's average and Working Experience, English Level, Age and Salary Fairness. Also, a slightly positive correlation with remote work opinion and changing job next 6 months was observed.
Remote work opinion has a slightly positive correlation with changing job next 6 months. But checking the type of working (Remote_Working_Current category - full remote, flexible or full office) in more details, we can conclude that around 43% of people who work full time in office are more likely to change job next 6 months. However, people who always work remotly or flexible (regardless the Covid) correspond to 27%. For this analysis, we have selected people who have chosen levels 5 and 4 in the changing jobs next 6 months category.
The conclusion of correlation analysis is that companies that provide fairness salaries tend to keep their talents; 2. Workers who have flexibility to work remotely are less likely to change jobs next months.
We can see that workers from Unicorn and Startups present the higher median, which means that they scored theirs salary fairness higher than workers from other companies. It's interesting note that the category Unicorn has data less spreaded.
Workers from Unicorn companies present the smallest median. So, we are able to say that Unicorn companies' workers are less likely to changing jobs, and they feel fairness of theirs salaries. Important note that Scale-up has the highest average salary.
Unicorn companies tend to provide higher and fairness salaries.
English level influences salary average. Workers with better English skill tends to earn more. Peolple who are native, bilingue or full professional English level have salaries higher the global mean.
Here is an example: in average, people with no working experience but fluent in English has almost the same salary of someone with 3-6 years of experience but with an elementary English level.
Here, we are applying K-Means algorithm for verifying if the number of languages that a person knows and his salary average is correlated (whether it forms cluster or not, by mean). In general, there is no correlation between the number of languages that a worker knows with their salary (no cluster were found). We can extend this result, and conclude that specialists tend to earn more.
Although both number of languages and Salary are not related in this sample, let's analyse if this pattern occours in the top 3 languages with the highest salary: Go, Perl and Kotlin.
The main result of the latest analysis is that there is a correlation between number of languages a worker who knows Go and their salary. It does not occurs for the other 2 ones: Perl and Kotlin.
As a matter of fact, Go developers earn higher salaries and they also have background in multiple programming languages. Considering that they are majority > 9years of experience and Go is a recent programming language, this workers must have adopted Go additionally.
In this session, we are going to apply Association Rules Apriori algorithm in order to figure out what language are learned together by the Tech workers. For definition, antecedent is an item found within the data and a consequent is an item found in combination with the antecedent. Association rules are created by searching data for frequent if-then patterns and using the criteria support and confidence to identify the most important relationships.
The Apriori algorithm that we have applied shows typical tech stacks of developers that work on projects. From that, we are able to recognize the different roles you can found in the top job roles. Considering (antecedent) + (consequent):
- The tech stack (Javascript, HTML/CSS) + (SQL), (SQL, C#) + (Javascript), (Javascript) + (Java), (PHP) + (Javascript) and (Javascript, C#) + (SQL) are technologies that typically belong to Full-Stack Developer role.
- The tech stack (Java) + (SQL) and (C#) + (SQL) are technologies that belong to Back-End Developer role.
- The tech stack (Javascript) + (HTML/CSS), (Typescript, HTML/CSS) + (Javascript) and (Typescript, Javascript) + (HTML/CSS) are technologies that typically belong to Front-End Developer role.
- The tech stack (Python) + (SQL) are technologies that typically belong to Data Scientist/Data Engineer role.
- Finally, the tech stack (Bash/Shell/PowerShell) + (SQL) are technologies that typically belong to DevOps Engineer and Maintenace & Support roles.
- Typical tech stacks of developers were founded.
- Javascript is the most popular language and typically related to Front-End and Full-Stack developers, being learned with more sort of languages
As a result, we can conclude that Go is learned with Bash/Shell/PowerShell, HTML/CSS, Javascript, Python and SQL both in pair combination both as antecedent or consequent.
Developers who have background in Bash/Shell/PowerShell, HTML/CSS, Javascript, Python and SQL are potencial candidate to adopt Go as well. In addicion, new learners who want to be a Go developers, might consider learn some of this other languages too.