Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Data Analysis with Python
Data Analysis with Python

Data Analysis with Python: A Modern Approach

Arrow left icon
Profile Icon David Taieb
Arrow right icon
$31.99 $35.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Feb 2025 490 pages 1st Edition
eBook
$31.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Arrow left icon
Profile Icon David Taieb
Arrow right icon
$31.99 $35.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Feb 2025 490 pages 1st Edition
eBook
$31.99 $35.99
Paperback
$43.99
Subscription
Free Trial
eBook
$31.99 $35.99
Paperback
$43.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Data Analysis with Python

Chapter 1. Programming and Data Science – A New Toolset

"Data is a precious thing and will last longer than the systems themselves."

Tim Berners-Lee, inventor of the World Wide Web

(https://en.wikipedia.org/wiki/Tim_Berners-Lee)

In this introductory chapter, I'll start the conversation by attempting to answer a few fundamental questions that will hopefully provide context and clarity for the rest of this book:

  • What is data science and why it's on the rise
  • Why is data science here to stay
  • Why do developers need to get involved in data science

Using my experience as a developer and recent data science practitioner, I'll then discuss a concrete data pipeline project that I worked on and a data science strategy that derived from this work, which is comprised of three pillars: data, services, and tools. I'll end the chapter by introducing Jupyter Notebooks which are at the center of the solution I'm proposing in this book.

What is data science

If you search the web for a definition of data science, you will certainly find many. This reflects the reality that data science means different things to different people. There is no real consensus on what data scientists exactly do and what training they must have; it all depends on the task they're trying to accomplish, for example, data collection and cleaning, data visualization, and so on.

For now, I'll try to use a universal and, hopefully, consensual definition: data science refers to the activity of analyzing a large amount of data in order to extract knowledge and insight leading to actionable decisions. It's still pretty vague though; one can ask what kind of knowledge, insight, and actionable decision are we talking about?

To orient the conversation, let's reduce the scope to three fields of data science:

  • Descriptive analytics: Data science is associated with information retrieval and data collection techniques with the goal of reconstituting past events to identify patterns and find insights that help understand what happened and what caused it to happen. An example of this is looking at sales figures and demographics by region to categorize customer preferences. This part requires being familiar with statistics and data visualization techniques.
  • Predictive analytics: Data science is a way to predict the likelihood that some events are currently happening or will happen in the future. In this scenario, the data scientist looks at past data to find explanatory variables and build statistical models that can be applied to other data points for which we're trying to predict the outcome, for example, predicting the likelihood that a credit card transaction is fraudulent in real-time. This part is usually associated with the field of machine learning.
  • Prescriptive analytics: In this scenario, data science is seen as a way to make better decisions, or perhaps I should say data-driven decisions. The idea is to look at multiple options and using simulation techniques, quantify, and maximize the outcome, for example, optimizing the supply chain by looking at minimizing operating costs.

In essence, descriptive data science answers the question of what (does the data tells me), predictive data science answers the question of why (is the data behaving a certain way), and prescriptive data science answers the questions of how (do we optimize the data toward a specific goal).

Is data science here to stay?

Let's get straight to the point from the start: I strongly think that the answer is yes.

However, that was not always the case. A few years back, when I first started hearing about data science as a concept, I initially thought that it was yet another marketing buzzword to describe an activity that already existed in the industry: Business Intelligence (BI). As a developer and architect working mostly on solving complex system integration problems, it was easy to convince myself that I didn't need to get directly involved in data science projects, even though it was obvious that their numbers were on the rise, the reason being that developers traditionally deal with data pipelines as black boxes that are accessible with well-defined APIs. However, in the last decade, we've seen exponential growth in data science interest both in academia and in the industry, to the point it became clear that this model would not be sustainable.

As data analytics are playing a bigger and bigger role in a company's operational processes, the developer's role was expanded to get closer to the algorithms and build the infrastructure that would run them in production. Another piece of evidence that data science has become the new gold rush is the extraordinary growth of data scientist jobs, which have been ranked number one for 2 years in a row on Glassdoor (https://www.prnewswire.com/news-releases/glassdoor-reveals-the-50-best-jobs-in-america-for-2017-300395188.html) and are consistently posted the most by employers on Indeed. Headhunters are also on the prowl on LinkedIn and other social media platforms, sending tons of recruiting messages to whoever has a profile showing any data science skills.

One of the main reasons behind all the investment being made into these new technologies is the hope that it will yield major improvements and greater efficiencies in the business. However, even though it is a growing field, data science in the enterprise today is still confined to experimentation instead of being a core activity as one would expect given all the hype. This has lead a lot of people to wonder if data science is a passing fad that will eventually subside and yet another technology bubble that will eventually pop, leaving a lot of people behind.

These are all good points, but I quickly realized that it was more than just a passing fad; more and more of the projects I was leading included the integration of data analytics into the core product features. Finally, it is when the IBM Watson Question Answering system won at a game of Jeopardy! against two experienced champions, that I became convinced that data science, along with the cloud, big data, and Artificial Intelligence (AI), was here to stay and would eventually change the way we think about computer science.

Why is data science on the rise?

There are multiple factors involved in the meteoric rise of data science.

First, the amount of data being collected keeps growing at an exponential rate. According to recent market research from the IBM Marketing Cloud (https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=WRL12345GBEN) something like 2.5 quintillion bytes are created every day (to give you an idea of how big that is, that's 2.5 billion of billion bytes), but yet only a tiny fraction of this data is ever analyzed, leaving tons of missed opportunities on the table.

Second, we're in the midst of a cognitive revolution that started a few years ago; almost every industry is jumping on the AI bandwagon, which includes natural language processing (NLP) and machine learning. Even though these fields existed for a long time, they have recently enjoyed the renewed attention to the point that they are now among the most popular courses in colleges as well as getting the lion's share of open source activities. It is clear that, if they are to survive, companies need to become more agile, move faster, and transform into digital businesses, and as the time available for decision-making is shrinking to near real-time, they must become fully data-driven. If you also include the fact that AI algorithms need high-quality data (and a lot of it) to work properly, we can start to understand the critical role played by data scientists.

Third, with advances in cloud technologies and the development of Platform as a Service (PaaS), access to massive compute engines and storage has never been easier or cheaper. Running big data workloads, once the purview of large corporations, is now available to smaller organizations or any individuals with a credit card; this, in turn, is fueling the growth of innovation across the board.

For these reasons, I have no doubt that, similar to the AI revolution, data science is here to stay and that its growth will continue for a long time. But we also can't ignore the fact that data science hasn't yet realized its full potential and produced the expected results, in particular helping companies in their transformation into data-driven organizations. Most often, the challenge is achieving that next step, which is to transform data science and analytics into a core business activity that ultimately enables clear-sighted, intelligent, bet-the-business decisions.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Bridge your data analysis with the power of programming, complex algorithms, and AI
  • Use Python and its extensive libraries to power your way to new levels of data insight
  • Work with AI algorithms, TensorFlow, graph algorithms, NLP, and financial time series
  • Explore this modern approach across with key industry case studies and hands-on projects

Description

Data Analysis with Python offers a modern approach to data analysis so that you can work with the latest and most powerful Python tools, AI techniques, and open source libraries. Industry expert David Taieb shows you how to bridge data science with the power of programming and algorithms in Python. You'll be working with complex algorithms, and cutting-edge AI in your data analysis. Learn how to analyze data with hands-on examples using Python-based tools and Jupyter Notebook. You'll find the right balance of theory and practice, with extensive code files that you can integrate right into your own data projects. Explore the power of this approach to data analysis by then working with it across key industry case studies. Four fascinating and full projects connect you to the most critical data analysis challenges you’re likely to meet in today. The first of these is an image recognition application with TensorFlow – embracing the importance today of AI in your data analysis. The second industry project analyses social media trends, exploring big data issues and AI approaches to natural language processing. The third case study is a financial portfolio analysis application that engages you with time series analysis - pivotal to many data science applications today. The fourth industry use case dives you into graph algorithms and the power of programming in modern data science. You'll wrap up with a thoughtful look at the future of data science and how it will harness the power of algorithms and artificial intelligence.

Who is this book for?

This book is for developers wanting to bridge the gap between them and data scientists. Introducing PixieDust from its creator, the book is a great desk companion for the accomplished Data Scientist. Some fluency in data interpretation and visualization is assumed. It will be helpful to have some knowledge of Python, using Python libraries, and some proficiency in web development.

What you will learn

  • A new toolset that has been carefully crafted to meet for your data analysis challenges
  • Full and detailed case studies of the toolset across several of today's key industry contexts
  • Become super productive with a new toolset across Python and Jupyter Notebook
  • Look into the future of data science and which directions to develop your skills next

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Last updated date : Feb 11, 2025
Publication date : Dec 31, 2018
Length: 490 pages
Edition : 1st
Language : English
ISBN-13 : 9781789958195
Category :
Languages :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Last updated date : Feb 11, 2025
Publication date : Dec 31, 2018
Length: 490 pages
Edition : 1st
Language : English
ISBN-13 : 9781789958195
Category :
Languages :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total Can$ 158.97
Hands-On Predictive Analytics with Python
$48.99
Data Analysis with Python
$43.99
Hands-On Machine Learning for Algorithmic Trading
$65.99
Total Can$ 158.97 Stars icon

Table of Contents

13 Chapters
1. Programming and Data Science – A New Toolset Chevron down icon Chevron up icon
2. Python and Jupyter Notebooks to Power your Data Analysis Chevron down icon Chevron up icon
3. Accelerate your Data Analysis with Python Libraries Chevron down icon Chevron up icon
4. Publish your Data Analysis to the Web - the PixieApp Tool Chevron down icon Chevron up icon
5. Python and PixieDust Best Practices and Advanced Concepts Chevron down icon Chevron up icon
6. Analytics Study: AI and Image Recognition with TensorFlow Chevron down icon Chevron up icon
7. Analytics Study: NLP and Big Data with Twitter Sentiment Analysis Chevron down icon Chevron up icon
8. Analytics Study: Prediction - Financial Time Series Analysis and Forecasting Chevron down icon Chevron up icon
9. Analytics Study: Graph Algorithms - US Domestic Flight Data Analysis Chevron down icon Chevron up icon
10. The Future of Data Analysis and Where to Develop your Skills Chevron down icon Chevron up icon
A. PixieApp Quick-Reference Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Raj Singh Jan 09, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Data Analysis with Python: A Modern Approach" is a very interesting addition to the popular literature in this hot technical area. There are numerous books out there that teach the underlying technologies and mathematical techniques used in data science, from Python to linear regression to neural network models. But there are very few books that detail real-world experience in operationalizing these skills. This is in my opinion the key contribution of this book to the data science community. So few people are willing to describe in detail the architecture of years of their work, plus also have open source libraries available that the reader can use to apply and then extend what they learn in their own projects.Before you buy this book you should understand what it's not. If you're looking for a book that will serve as a class in data science, this isn't it. Textbook-style books, by design, start with the assumption that the readers have a certain shared background. Then they build on that background chapter-by-chapter teaching a set of new skills. You are expected to apply what you learn in previous chapters in order to understand those that follow. This is also not a book that will teach you how to program. While there are a lot of code samples, they serve more to illustrate a functional point rather than to teach Python, or how to write PixieApp applications using PixieDust.What this book is good at is putting data science in the context of the broader business challenge, which is to apply data science to real-life operational systems, and share the insights derived by data science using web architectures with which most developers are familiar. If you're a moderately experienced programmer with some Python and data science skills, this book can really broaden your horizons to understand how your work can better be utilized throughout your organization, and how you can get going with open source tools instead of investing immediately in pricey software from the big data science players like IBM, Oracle, Matlab or Tableau. Hey you might end up not needing that expensive stuff once you get good with Jupyter notebooks and the Pixiedust tools!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.

Modal Close icon
Modal Close icon