Python for Data Science edX

Learn to use powerful, open-source, Python tools, including Pandas, Git and Matplotlib, to manipulate, analyze, and visualize complex datasets.

Data Analytics Overview

Data Visualization Data Wrangling, Data Exploration, and Model Selection Exploratory Data Analysis or EDA Hypothesis Building and Testing Introduction to Data Visualization Plotting Processes in Data Science

Data Manipulation with Python (Pandas)

Data Operations Data Standardization Data Structures DataFrame Introduction to Pandas Missing Values Pandas File Read and Write Support Series SQL Operation

Data Science Overview

Data Science Data Scientists Examples of Data Science Python for Data Science

Data Science with Python Web Scraping

Common Data/Page Formats on The Web Importance of Objects Navigating options Searching the Tree The Parser Understanding the Tree Web Scraping

Machine Learning with Python (Scikit–Learn)

How Supervised and Unsupervised Learning Models Work How Supervised and Unsupervised Learning Models Work · Scikit-Learn Introduction to Machine Learning K Nearest Neighbors (K-NN) Model Machine Learning Approach Model Evaluation: Metric Functions Model Persistence Pipeline Scikit-Learn Supervised Learning Models - Linear Regression Supervised Learning Models: Logistic Regression Unsupervised Learning Models: Clustering

Mathematical Computing with Python (NumPy)

Accessing Array Elements: Indexing, Slicing, Iteration, Indexing with Boolean Arrays Basic Operations: Concept and Examples Broadcasting Class and Attributes of ndarray Object Copy and Views Linear Algebra NumPy Overview Properties, Purpose, and Types of ndarray Shape Manipulation Universal Functions (ufunc)

Natural Language Processing with Scikit-Learn

Bag of Words Extraction Considerations Major NLP Libraries NLP Applications NLP Approach for Text Data NLP Environment Setup NLP Overview NLP Overview · NLP Approach for Text Data NLP Sentence analysis Pipeline Scikit - Learn Approach Built - in Modules Scikit - Learn Approach Feature Extraction Scikit - Learn Approach Model Training Scikit - Learn Grid Search and Multiple Parameters Scikit-Learn Approach

Python integration with Hadoop, MapReduce and Spark

Apache Spark Big Data Hadoop Architecture Cloudera QuickStart VM Set Up MapReduce Need for Integrating Python with Hadoop PySpark PySpark Integration with Jupyter Notebook Resilient Distributed Systems (RDD) Spark Tools

Python: Environment Setup and Essentials

Basic Data Types: Integer, Float, String, None, and Boolean; Typecasting Basic Operators: 'in', '+', '*' Creating and using operations on sets Creating, accessing, and slicing lists Creating, accessing, and slicing tuples Creating, viewing, accessing, and modifying dicts Functions and Control Flow Installation of Anaconda Python Distribution - For Windows, Mac OS, and Linux Introduction to Anaconda Jupyter Notebook Installation Jupyter Notebook Introduction Jupyter Notebook Introduction · Variable Assignment Variable Assignment

Scientific computing with Python (Scipy)

Linear Algebra SciPy and its Characteristics SciPy sub-packages SciPy sub-packages: Optimize SciPy sub-packages: Statistics SciPy sub-packages: Weave SciPy sub-packages: Integration

Statistical Analysis and Business Applications

Bell Curve Chi-Square Test Correlation Matrix Histogram Histogram · Bell Curve Hypothesis Testing Inferential Statistics Introduction to Statistics Introduction to Statistics: Statistical and Non-Statistical Analysis Some Common Terms Used in Statistics Statistical and Non-Statistical Analysis