Python Programming for Data Science: Intermediate

Overview

Data science is a discipline that uses scientific methods, processes and algorithms to extract meaningful information, knowledge and insights from structured and unstructured data.

The aim of this course is to provide insights on intermediate and advanced data science topics, using the Python programming language. The course will explore concepts such as machine learning, deep learning and natural language processing from a practical hands-down point of view. The focus will be on tools and methods rather than diving into the theoretical basis, in order to be appreciated by an audience with a minimal mathematical background.

Experience in using a programming or scripting language is a must. The student should master all the concepts explored in the course Python Programming for Data Science - Introduction.

In order to complete the assignment (and in order to get the full benefit from the course) students will need access to a computer capable of running the open-source software used in the course and access to the Internet. A limited amount of class time will be allocated to working on the class assignment, so students should ensure that they have access to a computer outside of class.

The course will rely on Jupyter Notebooks for interactive Python programming as they are widely used in Data Science.

Programme details

Course begins: 21 Jan 2024

Week 1: Introduction to the course. Basic overview of Machine Learning. Linear Regression example.

Week 2: Overview of a data-science pre-processing pipeline. Exploratory Data Analysis

Week 3: Data cleaning and preparation.

Week 4: Supervised Learning: regression.

Week 5: Supervised Learning: classification. 

Week 6: Decision Trees. Ensemble Methods. Hyperparameter Tuning

Week 7: Dimensionality reduction and Unsupervised Learning. 

Week 8: The Perceptron. Back-propagation. Fully-connected neural networks. 

Week 9: Deep Learning: fundamental concepts. Transformers and attention.

Week 10: Deep Learning: other architectures- GANs/Autoencoders

 

The following Python libraries will be used during the course:

  • scikit-learn (weeks 2-7)
  • Pytorch (weeks 8-10)
  • NumPy pandas, matplotlib, seaborn (throughout the course)
  • HuggingFace Transformers (week 9)

Certification

Credit Application Transfer Scheme (CATS) points 

To earn credit (CATS points) for your course you will need to register and pay an additional £30 fee for each course you enrol on. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online. If you do not register when you enrol, you have up until the course start date to register and pay the £30 fee. 

See more information on CATS point

Coursework is an integral part of all online courses and everyone enrolled will be expected to do coursework, but only those who have registered for credit will be awarded CATS points for completing work at the required standard. If you are enrolled on the Certificate of Higher Education, you need to indicate this on the enrolment form but there is no additional registration fee. 

 

Digital credentials

All students who pass their final assignment, whether registered for credit or not, will be eligible for a digital Certificate of Completion. Upon successful completion, you will receive a link to download a University of Oxford digital certificate. Information on how to access this digital certificate will be emailed to you after the end of the course. The certificate will show your name, the course title and the dates of the course you attended. You will be able to download your certificate or share it on social media if you choose to do so. 

Please note that assignments are not graded but are marked either pass or fail. 

Fees

Description Costs
Course Fee £310.00
Take this course for CATS points £30.00

Funding

If you are in receipt of a UK state benefit, you are a full-time student in the UK or a student on a low income, you may be eligible for a reduction of 50% of tuition fees. Please see the below link for full details:

Concessionary fees for short courses

Tutor

Dr Nick Day

Dr Nicholas Day teaches computer programming in C#, C++, Java and Python at both Buckinghamshire New University and Oxford University. Nicholas started his career as an Associate Lecturer at Buckinghamshire New University in late 2014, progressing to become a Graduate Teaching Associate in February 2020 and is now a Lecturer at the same institution, since August 2021. He completed the PGCert in Teaching and Learning in 2015 and also acquired fellowship of AdvanceHE (previously Higher Education Academy). Between 2016 and 2019, Nicholas assisted Dr Vasos Pavlika with the delivery of introductory programming courses in C++ and Java for the Department for Continuing Education at Oxford University. He was empanelled as a Department Tutor in 2019 and started delivered an Introduction to Object-Oriented Programming Using Java, later adapting the course for online delivery in 2022. He has also started researching and teaching Artificial Intelligence and Data Science material.

Nick’s scholarly interests are Computer Science Education (CSEd), Computing Education Research (CER), and online pedagogy. He completed his PhD in March 2020, which investigated the learning and teaching of computer science education, specialising in delivery of computer programming modules. Post-PhD completion, Nicholas is involved with Knowledge Transfer Partnership (KTP) applications and in discussion with data-driven companies regarding research projects and consultancy work. Nicholas also now supervises current PhD students in fields associated with Data Science and Virtual Reality, in addition to mentoring departmental colleagues who are undertaking PhD research. During the COVID-19 pandemic, Nicholas began teaching online and recording videos to increase access and engagement with educational material. He is passionate about pedagogy and utilises his research findings to inform curriculum design.

Course aims

  • Explore the landscape of contemporary machine learning (ML) and deep learning.
  • Learn how to use a variety of machine-learning algorithms to extract features from the data using Python libraries.
  • Familiarise with the concepts of overfitting and regularisation in ML.
  • Gain insights on how to face scaling issues in a 'big data' scenario.

Teaching methods

Each week's session will consist of pre-recorded lectures and hands-on programming exercises, class discussions and interactive programming demonstrations by the lecturer/ 

Learning outcomes

A the end of the course the students will be able to:

  • choose the right ML task and evaluation metric for a given ML problem and select a set of ML models to be trained;
  • set up a data pre-processing pipeline for data science and machine learning algorithms;
  • use Python machine learning tools (namely scikit-learn, TensorFlow and PyTorch) to build up ML models, train and evaluate them on a test set;
  • evaluate whether a model overfits or underfits the data and act accordingly (e.g. opportunely regularise and overfitting model);
  • to identify the appropriate and most performant model for a given task and tune appropriately the hyperparameters (parameters that cannot be learned by the model).

Assessment methods

Students will be asked to submit a portfolio of exercises for their coursework assignment. The first exercise will be given mid-way through the course, and the second due after the completion of the course. 

In order to complete the assignment (and in order to get the full benefit from the course) students will need access to a computer capable of running the open source software used in the course and access to the Internet. Only a limited amount of class time will be allocated to working on the assignment, so students should ensure that they have access to a computer outside of class.

Coursework is an integral part of all weekly classes and everyone enrolled will be expected to do coursework in order to benefit fully from the course. Only those who have registered for credit will be awarded CATS points for completing work the required standard.

Students must submit a completed Declaration of Authorship form at the end of term when submitting your final piece of work. CATS points cannot be awarded without the aforementioned form - Declaration of Authorship form

Application

Experience of using a programming or scripting language is a must. The student should master all the concepts explored in the course Python Programming for Data Science - Introduction prior to enrolling on Intermediate. If you have not particiapted in Python Programming for Data Science - Introduction then you will need to provide details of your previous Python programming exprience. We may need to come back to you seeking further information. 

To enrol, please download a PDF or Word version of the following document 

Enrolment form (editable PDF)

Enrolment form (Word)

Once completed please email the enrolment form to weeklyclasses@conted.ox.ac.uk where we will arrange your enrolment and send you an invoice for payment.

We will close for enrolments 14 days prior to the start date to allow us to complete the course set up. We will email you at that time (14 days before the course begins) with further information and joining instructions. As always, students will want to check spam and junk folders during this period to ensure that these emails are received.

To earn credit (CATS points) for your course you will need to register and pay an additional £30 fee per course. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online.

Level and demands

The Department's Weekly Classes are taught at FHEQ Level 4, i.e. first year undergraduate level, and you will be expected to engage in a significant amount of private study in preparation for the classes. This may take the form, for instance, of reading and analysing set texts, responding to questions or tasks, or preparing work to present in class.

Credit Accumulation and Transfer Scheme (CATS)

To earn credit (CATS points) you will need to register and pay an additional £30 fee per course. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online. Students who register for CATS points will receive a Record of CATS points on successful completion of their course assessment.

Students who do not register for CATS points during the enrolment process can either register for CATS points prior to the start of their course or retrospectively from the January 1st after the current full academic year has been completed. If you are enrolled on the Certificate of Higher Education you need to indicate this on the enrolment form but there is no additional registration fee.

Selection criteria

Before attending this course, prospective students will know:

  • All the requirements and topics covered in the "Python Programming for Data Science - Introduction" course, i.e:
  • The fundamentals of linear algebra: what is a matrix and how matrix addition and multiplication are performed
  • The following fundamental concepts of statistics: mean, median, variance and standard deviation, interquartile range
  • The fundamentals of algebra: real and complex numbers, exponential and logarithm, and trigonometric functions
  • How to perform fundamental Python operations such as variable creation, numerical operations on scalar, vectors and matrices, iteration through a collection, manipulation of elements in a collection.
  • How to use NumPy and pandas to import a dataset and extract important statistics from it using techniques such as split-apply-combine (for example, finding the mean, median or max of a quantitative variable for each category in a categorical variable) 
  • Given a dataset, how to select the appropriate visualisation graph depending on the information to be conveyed, and use the matplotlib library to draw it and add title, captions and figure legends.
  • How to create and add state and behaviour to a class in Python
  • How to use nltk to preprocess a text and convert it to a numerical representation that can be manipulated by information retrieval algorithms.
  • What is, at least conceptually or visually, a derivative and a gradient.