0% found this document useful (0 votes)
8 views11 pages

Da Ans (GKJ)

Uploaded by

rahulsahoo2520
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

Da Ans (GKJ)

Uploaded by

rahulsahoo2520
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

DA ANS

1. What is Data Science?

Data science is a multidisciplinary field focused on extracting insights from


large volumes of raw, structured, or unstructured data using scientific
methods, algorithms, and technologies.

It involves asking the right questions, analyzing data, and using models to
make informed decisions.

2. Types of Data in Data Science with Examples:

Structured Data: Clearly defined data types like databases (e.g., sales
records, employee information).

Unstructured Data: Data without a pre-defined structure, like text from


social media posts or emails.

Semi-structured Data: Data that doesn't conform to a rigid structure but


has some organizational properties, like JSON or XML files.

3. What is Data Visualization in Data Science?

Data visualization is the process of creating graphical representations


(charts, graphs, etc.) of data to make the information easier to understand
and interpret. It helps in identifying trends, patterns, and insights.

4. What is Data Analytics and Its Applications?

Data analytics is the process of examining raw data to extract meaningful


insights. Applications include:

Predictive analytics in healthcare (predicting patient outcomes)

Fraud detection in finance

Customer segmentation in marketing.

5. What is EDA (Exploratory Data Analysis) and Its Types?

EDA is a method of analyzing data sets to summarize their main


characteristics, often using visual methods. Types of EDA include:

DA ANS 1
TYPES:
Univariate Analysis (Single Variable)

Histogram: Shows the frequency distribution of a variable.

Box Plot: Visualizes the spread and skewness of data using quartiles.

Bivariate Analysis (Two Variables)

Scatter Plot: Plots the relationship between two numerical variables.

Line Plot: Shows trends over time for time series data.

Multivariate Analysis (Multiple Variables)

Heatmap: Visualizes correlations between multiple variables.

3D Scatter Plot: Shows relationships among three variables


simultaneously.

6. What is Dimensionality Reduction and Its Importance?

Dimensionality reduction is the process of reducing the number of


variables or features in a dataset while preserving its essential information.

It is important for simplifying models, reducing computation time, and


eliminating noise in the data.

7. Why Learn Python for Data Science?

Python is highly preferred for data science due to its simplicity, extensive
libraries for data analysis (like Pandas, NumPy), and strong community
support.

It also offers tools for data visualization (Matplotlib, Seaborn) and machine
learning (Scikit-learn).

8. Features of Python:

Easy to learn: Simple syntax and readability make it beginner-friendly.

Cross-platform compatibility: Runs on Windows, Mac, Unix, and Linux.

Portable: Code can be run on different machines without changes.

Extensive libraries: Offers tools for data manipulation, computation, and


visualization (Pandas, NumPy, Matplotlib).

DA ANS 2
Strong community support: Active community provides libraries and
resources for data science.

9. What are Data Analysis Tools?

Tools for data analysis include:

Python: For data manipulation and machine learning.

R: For statistical analysis.

Excel: For basic data analysis.

SQL: For database querying.

Tableau/Power BI: For advanced data visualization.

10. Importance of Data Visualization in Data Analytics:

Simplifies Complex Data: Visualizing data helps in breaking down large,


complex datasets into easily digestible insights.

Identifies Patterns and Trends: Visual tools like line charts, heatmaps, and bar
graphs help identify patterns that are not immediately apparent in raw data.

Facilitates Quick Decision Making: Well-designed visualizations allow


stakeholders to quickly grasp the key findings, making data-driven decisions
faster.

Enhances Communication: Visual representations of data make it easier to


present complex results to non-technical audiences, improving understanding
and collaboration.

Highlights Outliers and Anomalies: Visualization can quickly highlight


deviations from expected trends, aiding in error detection or further
investigation.

11. . Discuss bar chart line chart area fill and pie chart with examples.

Bar Chart: Visualizes categorical data with rectangular bars where the length
represents the data value. Example: Sales of different products.

Line Chart: Shows trends over time by connecting data points with a line.
Example: Stock prices over a month.

DA ANS 3
Area Chart: Similar to a line chart but with the area below the line filled with
color. Example: Visualizing the proportion of different market segments.

Pie Chart: Circular chart divided into slices representing proportions. Example:
Market share of companies in a sector.

12. Explain purpose of python for data science.

Data Manipulation and Cleaning: Python’s libraries like Pandas and NumPy
allow handling large datasets.

Data Analysis and Exploration: Tools to perform statistical analysis and


uncover insights.

Data Visualization: Libraries like Matplotlib and Seaborn make it easy to


create graphs and plots.

Machine Learning and Deep Learning: Python supports libraries like


TensorFlow, Scikit-learn, and PyTorch for building predictive models.

Automation: Automating repetitive data tasks.

Integration and Scalability: Python integrates well with other tools and
technologies, facilitating scalable data science solutions

13. Is data analysis part of data analytics justifying your answer?

Yes, data analysis is a subset of data analytics. Data analysis focuses on


examining and drawing insights from data, while data analytics encompasses
the entire process of transforming data into actionable insights, including data
mining, modeling, and visualization.

14. Explain in detail about pandas library

Pandas is essential for data manipulation and analysis. It provides data


structures like DataFrames for working with structured data, offering functions
for filtering, aggregation, and transformation.

15.. Explain in detail about data visualization tools

Tableau: A powerful tool for creating interactive dashboards.

Power BI: Microsoft's tool for data visualization and business intelligence.

Matplotlib: A Python library for static and animated visualizations.

DA ANS 4
Seaborn: Built on Matplotlib for creating informative and attractive graphics.

16. . List and explain components of python used for data science.

Data Manipulation Libraries: Pandas, NumPy.

Data Visualization Libraries: Matplotlib, Seaborn, Plotly.

Machine Learning Libraries: Scikit-learn, TensorFlow, Keras, PyTorch.

17. Explain different types of data visualization tools with their features

Tableau: Drag-and-drop interface, integrates with multiple data sources.

Power BI: Cloud-based and integrates with Microsoft services.

Matplotlib: Customizable but more suited for static visualizations.

Seaborn: Adds aesthetic appeal to statistical plots.

18. Explain life cycle of data science

Data science is a multi-step process that involves extracting insights from data.
Key phases:

1. Discovery:

Clearly define the business problem you want to solve.

Data Identification: Identify the relevant data sources.

Hypothesis Formation: Create initial hypotheses to guide your analysis.

2. Data Preparation:

Cleaning: Handle missing values, outliers, and inconsistencies.

Integration: Combine data from different sources.

Transformation: Convert data into a suitable format for analysis.

3. Model Planning:

Exploratory Data Analysis (EDA): Use statistical methods and


visualizations to understand data relationships.

Feature Selection: Choose the most relevant features for modeling.

DA ANS 5
Algorithm Selection: Decide which algorithms are appropriate for your
problem.

4. Model Building:

Training: Use a portion of the data to train the model.

Testing: Evaluate the model's performance on unseen data.

Tuning: Adjust model parameters to improve performance.

5. Deployment:

Integration: Integrate the model into a production environment.

Monitoring: Continuously track the model's performance.

Maintenance: Update the model as needed to adapt to changing data.

6. Communication:

Results Presentation: Clearly communicate the findings to stakeholders.

Actionable Insights: Provide recommendations based on the results.

19. Explain data analytics process with an example

Data analytics is a systematic approach to understanding data and extracting


valuable insights. Here's a simplified breakdown of the key steps:

1. Data Collection:

Gather raw data from various sources.

Combine data from different systems using data integration techniques.

Extract relevant subsets if needed.

2. Data Cleaning:

Identify and correct errors, inconsistencies, and duplicates.

Ensure data quality for accurate analysis.

Organize data according to analysis requirements.

3. Data Analysis and Interpretation:

Use analytical tools like Python, R, SQL, or Excel.

DA ANS 6
Build models to understand data relationships.

Test and refine models until they meet expectations.

Apply models to new data in production.

4. Data Visualization:

Create visual representations (charts, graphs) of data.

Identify patterns, trends, and insights more easily.

Communicate findings effectively to stakeholders.

Example: A marketing team wants to analyze customer behavior to improve sales.


They collect data on customer purchases, demographics, and website
interactions. After cleaning the data, they build a model to predict which
customers are likely to make a purchase. They then visualize the results to identify
key customer segments and tailor marketing campaigns accordingly.
20. Explain data analytics types with an example

Data analytics transforms raw data into actionable insights. Here are the main
types:

1. Descriptive Analytics: This type of analysis summarizes and describes data.


For example, a teacher might use descriptive analytics to calculate the
average test score for a class.

2. Diagnostic Analytics: This type of analysis investigates the causes of events


or trends. For example, a school counselor might use diagnostic analytics to
determine why a student's grades have declined.

3. Predictive Analytics: This type of analysis forecasts future outcomes. For


example, a high school might use predictive analytics to predict which
students are at risk of dropping out.

4. Prescriptive Analytics: This type of analysis suggests optimal solutions. For


example, a college might use prescriptive analytics to recommend the best
course schedule for a student based on their academic goals and interests.

21. Perform EDA Process on Iris Dataset:

Step 1: Load the dataset.

DA ANS 7
import pandas as pd
iris = pd.read_csv("iris.csv")
print(iris.head())

Step 2: Summary statistics.

print(iris.describe())

Step 3: Visualize feature distributions.

import matplotlib.pyplot as plt


iris.hist(figsize=(10, 6))
plt.show()

Step 4: Scatter plot for feature relationships.

pd.plotting.scatter_matrix(iris, figsize=(10, 8))


plt.show()

22. Explain types of EDA process with an example.

Univariate Analysis: Analyzes one variable at a time (e.g., summary statistics


or histogram of Iris species).

iris['sepal_length'].hist()
plt.show()

Multivariate Analysis: Looks at relationships between multiple variables (e.g.,


scatter plot of sepal length vs. sepal width).

plt.scatter(iris['sepal_length'], iris['sepal_width'])
plt.show()

23. Explain EDA quantitative and graphical techniques with an example

Quantitative Techniques: Mean, variance, and correlations. Example:

DA ANS 8
print(iris.mean())
print(iris.corr())

Graphical Techniques: Histograms, scatter plots, and box plots. Example:

iris.boxplot(column='sepal_length', by='species')
plt.show()

24. Why Top Companies Use Python as Implementation Language?

Flexibility: Python works across platforms.

Extensive Libraries: Libraries like Pandas, NumPy, and TensorFlow.

Community Support: A large, active community that develops powerful


libraries.

Ease of Use: Python’s simple syntax allows for rapid development.

25.Explain limitations of python

Speed: Python is slower compared to compiled languages like C++.

Mobile Development: Limited in native mobile applications.

Memory Consumption: Can be inefficient for memory-intensive tasks.

Threading: Global Interpreter Lock (GIL) affects multi-threaded applications.

26.Explain the applications of data science with examples

Healthcare: Predictive analytics for patient outcomes (e.g., IBM Watson


Health).

Finance: Fraud detection and risk modeling (e.g., JP Morgan).

Retail: Inventory management and recommendation systems (e.g., Amazon).

27. Explain Pros and cons of Python for Data Science

Pros

Easy to learn and use.

Vast libraries and frameworks.

DA ANS 9
Cross-platform support.

Cons:

Slower execution compared to compiled languages.

Less efficient in mobile computing.

28.Explain in detail about feature selection and feature generation with an


example

Feature Selection: Choosing the most relevant features for model building.
Example:

from sklearn.feature_selection import SelectKBest, f_class


if
X_new = SelectKBest(f_classif, k=2).fit_transform(X, y)

Feature Generation: Creating new features from existing ones. Example:

iris['petal_area'] = iris['petal_length'] * iris['petal_wi


dth']

29. Explain the following 1) Univariant EDA 2) Multivariant EDA

Univariate EDA: Focuses on a single variable, looking at its distribution,


outliers, etc. Example: Histogram of ages in a dataset.

Multivariate EDA: Analyzes the relationship between two or more variables.


Example: Correlation matrix to understand relationships between multiple

features 4†source . 】
30. . Explain preprocessing techniques with an example.

Data Cleaning: Handling missing values and duplicates.

iris.fillna(iris.mean(), inplace=True)

Normalization: Scaling data to a fixed range.

DA ANS 10
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
iris_scaled = scaler.fit_transform(iris.drop('species', ax
is=1))

DA ANS 11

You might also like