Pandas for Everyone

Python Data Analysis

Leseprobe

Pandas for Everyone - Chen, Daniel;Chen, Daniel Y.

Leseprobe

Fotogalerie

Daniel Chen, Daniel Y. Chen

Pandas for Everyone

Python Data Analysis

Broschiertes Buch

Jetzt bewerten Jetzt bewerten

Andere Kunden interessierten sich auch für

Joseph Richards
Real-World Machine Learning

45,99 €
Q. Ethan McCallum
Bad Data Handbook

37,99 €
Holden Karau
High Performance Spark

36,99 €
Michael Rosenblum
Oracle PL/SQL Performance Tuning Tips & Techniques

56,99 €
Ryan Turner
Python Machine Learning

26,99 €
Philipp K. Janert
Data Analysis with Open Source Tools

30,99 €
Matthew Kirk
Thoughtful Machine Learning with Python

39,99 €

Produktbeschreibung

The Hands-On, Example-Rich Introduction to Pandas Data Analysis in Python

Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets.

Pandas for Everyone brings together practical knowledge and insight for solving real problems with Pandas, even if you re new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world problems.

Chen gives you a jumpstart on using Pandas with a realistic dataset and covers combining datasets, handling missing data, and structuring datasets foreasier analysis and visualization. He demonstrates powerful data cleaning techniques, from basic string manipulation to applying functions simultaneously across dataframes.

Once your data is ready, Chen guides you through fitting models for prediction, clustering, inference, and exploration. He provides tips on performance and scalability, and introduces you to the wider Python data analysis ecosystem.
Work with DataFrames and Series, and import or export dataCreate plots with matplotlib, seaborn, and pandasCombine datasets and handle missing dataReshape, tidy, and clean datasets so they re easier to work withConvert data types and manipulate text stringsApply functions to scale data manipulationsAggregate, transform, and filter large datasets with groupbyLeverage Pandas advanced date and time capabilitiesFit linear models using statsmodels and scikit-learn librariesUse generalized linear modeling to fit models with different response variablesCompare multiple models to select the best Regularize to overcome overfitting and improve performanceUse clustering in unsupervised machine learning

Produktdetails

Produktdetails
Verlag: Addison-Wesley / Pearson
Seitenzahl: 416
Erscheinungstermin: 26. Dezember 2017
Englisch
Abmessung: 231mm x 177mm x 27mm
Gewicht: 650g
ISBN-13: 9780134546933
ISBN-10: 0134546938
Artikelnr.: 44468979

Herstellerkennzeichnung
Libri GmbH
Europaallee 1
36244 Bad Hersfeld
gpsr@libri.de

Produktdetails

Verlag: Addison-Wesley / Pearson
Seitenzahl: 416
Erscheinungstermin: 26. Dezember 2017
Englisch
Abmessung: 231mm x 177mm x 27mm
Gewicht: 650g
ISBN-13: 9780134546933
ISBN-10: 0134546938
Artikelnr.: 44468979

Herstellerkennzeichnung
Libri GmbH
Europaallee 1
36244 Bad Hersfeld
gpsr@libri.de

Autorenporträt

Daniel Chen is a graduate student in the interdisciplinary PhD program in Genetics, Bioinformatics & Computational Biology (GBCB) at Virginia Tech. He is involved with Software Carpentry as an instructor and lesson maintainer. He completed his master’s degree in public health at Columbia University Mailman School of Public Health in Epidemiology, and currently works at the Social and Decision Analytics Laboratory under the Biocomplexity Institute of Virginia Tech where he is working with data to inform policy decision-making. He is the author of Pandas for Everyone and Pandas Data Analysis with Python Fundamentals LiveLessons.

Inhaltsangabe

Foreword xix
Preface xxi
Acknowledgments xxvii
About the Author xxxi

Part I: Introduction 1

Chapter 1: Pandas DataFrame Basics 3
1.1 Introduction 3
1.2 Loading Your First Data Set 4
1.3 Looking at Columns, Rows, and Cells 7
1.4 Grouped and Aggregated Calculations 18
1.5 Basic Plot 23
1.6 Conclusion 24

Chapter 2: Pandas Data Structures 25
2.1 Introduction 25
2.2 Creating Your Own Data 26
2.3 The Series 28
2.4 The DataFrame 36
2.5 Making Changes to Series and DataFrames 38
2.6 Exporting and Importing Data 43
2.7 Conclusion 47

Chapter 3: Introduction to Plotting 49
3.1 Introduction 49
3.2 Matplotlib 51
3.3 Statistical Graphics Using matplotlib 56
3.4 Seaborn 61
3.5 Pandas Objects 83
3.6 Seaborn Themes and Styles 86
3.7 Conclusion 90

Part II: Data Manipulation 91

Chapter 4: Data Assembly 93
4.1 Introduction 93
4.2 Tidy Data 93
4.3 Concatenation 94
4.4 Merging Multiple Data Sets 102
4.5 Conclusion 107

Chapter 5: Missing Data 109
5.1 Introduction 109
5.2 What Is a NaN Value? 109
5.3 Where Do Missing Values Come From? 111
5.4 Working with Missing Data 116
5.5 Conclusion 121

Chapter 6: Tidy Data 123
6.1 Introduction 123
6.2 Columns Contain Values, Not Variables 124
6.3 Columns Contain Multiple Variables 128
6.4 Variables in Both Rows and Columns 133
6.5 Multiple Observational Units in a Table (Normalization) 134
6.6 Observational Units Across Multiple Tables 137
6.7 Conclusion 141

Part III: Data Munging 143

Chapter 7: Data Types 145
7.1 Introduction 145
7.2 Data Types 145
7.3 Converting Types 146
7.4 Categorical Data 152
7.5 Conclusion 153

Chapter 8: Strings and Text Data 155
8.1 Introduction 155
8.2 Strings 155
8.3 String Methods 158
8.4 More String Methods 160
8.5 String Formatting 161
8.6 Regular Expressions (RegEx) 164
8.7 The regex Library 170
8.8 Conclusion 170

Chapter 9: Apply 171
9.1 Introduction 171
9.2 Functions 171
9.3 Apply (Basics) 172
9.4 Apply (More Advanced) 177
9.5 Vectorized Functions 182
9.6 Lambda Functions 185
9.7 Conclusion 187

Chapter 10: Groupby Operations: Split–Apply–Combine 189
10.1 Introduction 189
10.2 Aggregate 190
10.3 Transform 197
10.4 Filter 201
10.5 The pandas.core.groupby.DataFrameGroupBy Object 202
10.6 Working with a MultiIndex 207
10.7 Conclusion 211

Chapter 11: The datetime Data Type 213
11.1 Introduction 213
11.2 Python’s datetime Object 213
11.3 Converting to datetime 214
11.4 Loading Data That Include Dates 217
11.5 Extracting Date Components 217
11.6 Date Calculations and Timedeltas 220
11.7 Datetime Methods 221
11.8 Getting Stock Data 224
11.9 Subsetting Data Based on Dates 225
11.10 Date Ranges 227
11.11 Shifting Values 230
11.12 Resampling 237
11.13 Time Zones 238
11.14 Conclusion 240

Part IV: Data Modeling 241

Chapter 12: Linear Models 243
12.1 Introduction 243
12.2 Simple Linear Regression 243
12.3 Multiple Regression 247
12.4 Keeping Index Labels From sklearn 251
12.5 Conclusion 252

Chapter 13: Generalized Linear Models 253
13.1 Introduction 253
13.2 Logistic Regression 253
13.3 Poisson Regression 257
13.4 More Generalized Linear Models 260
13.5 Survival Analysis 260
13.6 Conclusion 264

Chapter 14: Model Diagnostics 265
14.1 Introduction 265
14.2 Residuals 265
14.3 Comparing Multiple Models 270
14.4 k-Fold Cross-Validation 275
14.5 Conclusion 278

Chapter 15: Regularization 279
15.1 Introduction 279
15.2 Why Regularize? 279
15.3 LASSO Regression 281
15.4 Ridge Regression 283
15.5 Elastic Net 285
15.6 Cross-Validation 287
15.7 Conclusion 289

Chapter 16: Clustering 291
16.1 Introduction 291
16.2 k-Means 291
16.3 Hierarchical Clustering 297
16.4 Conclusion 301

Part V: Conclusion 303

Chapter 17: Life Outside of Pandas 305
17.1 The (Scientific) Computing Stack 305
17.2 Performance 306
17.3 Going Bigger and Faster 307

Chapter 18: Toward a Self-Directed Learner 309
18.1 It’s Dangerous to Go Alone! 309
18.2 Local Meetups 309
18.3 Conferences 309
18.4 The Internet 310
18.5 Podcasts 310
18.6 Conclusion 311

Part VI: Appendixes 313

Appendix A: Installation 315
A.1 Installing Anaconda 315
A.2 Uninstall Anaconda 316

Appendix B: Command Line 317
B.1 Installation 317
B.2 Basics 318

Appendix C: Project Templates 319

Appendix D: Using Python 321
D.1 Command Line and Text Editor 321
D.2 Python and IPython 322
D.3 Jupyter 322
D.4 Integrated Development Environments (IDEs) 322

Appendix E: Working Directories 325

Appendix F: Environments 327

Appendix G: Install Packages 329
G.1 Updating Packages 330

Appendix H: Importing Libraries 331

Appendix I: Lists 333

Appendix J: Tuples 335

Appendix K: Dictionaries 337

Appendix L: Slicing Values 339

Appendix M: Loops 341

Appendix N: Comprehensions 343

Appendix O: Functions 345
O.1 Default Parameters 347
O.2 Arbitrary Parameters 347

Appendix P: Ranges and Generators 349

Appendix Q: Multiple Assignment 351

Appendix R: numpy ndarray 353

Appendix S: Classes 355

Appendix T: Odo: The Shapeshifter 357

Index 359

Inhaltsangabe