Data Science with Python, R and Pandas

Price: $1,299.00

Data Science with Python, R and Pandas 

This comprehensive program focuses on the fundamentals of acquiring, parsing, validating, and wrangling data with Python and its associated ecosystem of libraries. After an introduction to Data Science as a field and a primer on the Python programming language, learners will walk through the data science process by building a simple recommendation system.

After this introduction, learners dive deeper into each of the specific steps involved in the first half of the data science process–mainly how to acquire, transform, and store data (often referred to as an ETL pipeline). The content then covers how to download data that is openly accessible on the Internet by working with APIs and websites, and how to parse this XML and JSON data. With this structured data, learners can build data models, store and query data, and work with relational databases. Along the way, learners will pick up the fundamentals of programming with Python (including object-oriented programming and the standard library) as well as the best practices of building sustainable data science applications.

Moving on, this program then address Data Science with Python and R looking at the fundamentals of data preparation, data analysis, data visualization, machine learning, and interactive data science applications. Throughout this portion of the program, learners will be provided insruction in how to build predictive models and how to create interactive visual applications for their line of business using the Anaconda platform. This course will introduce data scientists to using Python and R for building on an ecosystem of hundreds of high performance open source tools.

The Pandas Data Analysis with Python Fundamentals part of the program then provides analysts and aspiring data scientists with a practical introduction to Python and pandas: the analytics stack that enables you to move from spreadsheet programs such as Excel into automation of your data analysis workflows.  This comprehensive course starts by introducing Python and pandas and why they are great tools for data analysis. The content then covers installing and starting Python. From there, the content covers the basics of working with datasets in Python and with pandas, followed by plotting and visualization, data assembly and manipulations, missing data, and tidy data.


Program Objectives
 
Upon completion of the course, the student will be able to:

  • Install and start Python
  • Load data sets into pandas and begin to assess and analyze them
  • Use Pandas data structures and import/export data
  • Combine multiple data sets
  • Deal with missing data
  • Tidy and reshape data
  • Get up and running with a Python data science environment
  • Understand the essentials of Python 3, including object-oriented programming
  • Employ the basics of the data science process and what each step entails
  • Build a simple (yet powerful) recommendation engine for Airbnb listings
  • Find high-quality data sources and how to scrape websites if no existing dataset is available
  • Work with APIs programmatically, including (but not limited to) the Foursquare API
  • Employ strategies for parsing JSON and XML into a structured form
  • Build data models and work with database schemas
  • Understand the basics of relational databases with SQLite and how to use an ORM to interface with them in Python
  • Employ best practices of data validation, including common data quality checks
  • Query data in a database, including joining data tables and aggregating data
  • Understand the fundamentals of exploratory data analysis
  • Find and handle missing or malformed data
  • Understand the importance of creating reproducible analyses and how to share them effectively
  • Use Anaconda and Jupyter notebooks
  • Understand Open Data Science concepts, roles, and workflows
  • Wrangle data with Pandas
  • Understand Anaconda Enterprise and collaboration workflows
  • Create interactive visualizations with Bokeh
  • Use Conda package management
  • Use R for data processing and visualization
  • Build statistical and predictive models
  • Use Excel and Python with Anaconda Fusion
  • Understand and use Mosaic for databases with distributed data
  • Understand distributed and parallel computing with Dask

Optional Volunteer Externship Opportunity

Students who complete this program are eligible to participate in an optional volunteer externship opportunity with a local company/agency/organization whose work aligns with this area of study in order to gain valuable hands-on experience.  As students progress through their eLearning program, an Externship Coordinator will reach out to coordinate placement.

Note:
Additional documentation including health records, immunizations, drug-screening, criminal background checks, etc. may be required by the externship facility.