Python Programming for Data Science: Introduction

Overview

Data science is a discipline that uses scientific methods, processes and algorithms to extract meaningful information, knowledge and insights from structured and unstructured data.

The aim of this course is to provide an introduction to programming for data science, using the Python programming language. The course seeks to introduce the basics of the data science process, from collecting data, pre-processing it (cleaning/correcting it), performing exploratory data analyses, visualizing data, and sharing analysis results.

In order to complete the assignment (and in order to get the full benefit from the course), students will need access to a computer capable of running the open-source software used in the course and access to the Internet. A limited amount of class time will be allocated to working on the class assignment, so students should ensure that they have access to a computer outside of class.

The course will rely on Jupyter Notebooks for interactive Python programming as they are widely used in Data Science.

Before attending this course, prospective students will know:

  • the fundamentals of linear algebra: what is a matrix and how matrix addition and multiplication are performed;
  • the following fundamental concepts of statistics: mean, median, variance and standard deviation, interquartile range; 
  • the fundamentals of algebra: real and complex numbers, exponential and logarithm, and trigonometric functions.

Programme details

Course starts: 1 Oct 2024

Week 1: Introduction to Data Science. Course set-up. Intro to Python: numbers

Week 2: Python basics: built-in types, functions and methods, if statement

Week 3: Python data structures: lists and tuples; for loops

Week 4: Python data structures: dictionaries and sets; standard library

Week 5: NumPy and the SciPy ecosystem

Week 6: Pandas for data science I 

Week 7: Pandas for data science II

Week 8: Data visualisation: matplotlib and seaborn

Week 9: Object-oriented programming: classes, inheritance, and applications 

Week 10: Data gathering and cleaning. Text pre-processing for Natural Language Processing (NLP)

Digital Certification

To complete the course and receive a certificate, you will be required to attend and participate in at least 80% of the live sessions on the course and pass your final assignment. Upon successful completion, you will receive a link to download a University of Oxford digital certificate. Information on how to access this digital certificate will be emailed to you after the end of the course. The certificate will show your name, the course title and the dates of the course you attended. You will be able to download your certificate or share it on social media if you choose to do so.

Fees

Description Costs
Course Fee £310.00
Take this course for CATS points £30.00

Funding

If you are in receipt of a UK state benefit, you are a full-time student in the UK or a student on a low income, you may be eligible for a reduction of 50% of tuition fees. Please see the below link for full details:

Concessionary fees for short courses

Tutor

Mr Cristian Soitu

Cristian Soitu is a Postdoctoral Research Fellow at the University of Oxford, supported by a Human Frontier Science Program (HFSP) Fellowship. Prior to this role, he was a postdoctoral fellow at Cold Spring Harbor Laboratory, where he specialized in integrating data from multiple modalities, with a particular focus on image processing and transcriptomics, to investigate neuronal information flow.

Cristian's technical expertise includes extensive experience with Python, having developed computational workflows and human-in-the-loop interactive graphical user interfaces (GUIs) for advanced data processing. He is passionate about creating computational tools to enhance life sciences research, contributing to the advancement of molecular techniques and the analysis of complex biological data.

In addition to his research, Cristian has taught subjects such as mathematics, fluid mechanics, and next-generation sequencing, and has also supervised interns and research staff.

Course aims

1. To learn the basic aspects of Python programming for data science.

2. To gain an appreciation for the end-to-end process of obtaining data, processing it, through to presenting results.

Course Objectives:

  • To be able to build a simple data processing pipeline by the end of the course.

Teaching methods

Each week's session will consist of lectures and hands-on programming exercises, class discussions and interactive programming demonstrations by the lecturer.

Learning outcomes

At the end of the course, the student will be able to write procedural code using the Python language and tools to:

  • import data from local and/or remote sources and preprocess it;
  • extract significant information from the gathered data;
  • visualise the relevant features extracted from the data;

After attending this course, students will know:

  • how to perform fundamental Python operations such as variable creation, numerical operations on scalar, vectors and matrices,  iteration through a collection, manipulation of elements in a collection;
  • how to use NumPy and pandas to import a dataset and extract important statistics from it using techniques such as split-apply-combine (for example, finding the mean, median or max of a quantitative variable for each category in a categorical variable);
  • given a dataset, how to select the appropriate visualisation graph depending on the information to be conveyed, and use the matplotlib and seaborn library to draw it and add title, captions and figure legends;
  • how to create and add state and behaviour to a class in Python;
  • how to use nltk or spaCy to preprocess a text and convert it to a numerical representation that can be manipulated by information retrieval algorithms (e.g. for sentimental analysis, semantic search or machine learning algorithms).

Assessment methods

Students will be asked to submit a portfolio of three exercises for their coursework assignment. I will give the first exercise on week 5 for early submission, the second and the third on weeks 8 and 9.

In order to complete the assignment (and in order to get the full benefit from the course), students will need access to a computer capable of running the open source software used in the course and access to the Internet. Only a limited amount of class time will be allocated to working on the assignment, so students should ensure that they have access to a computer outside of class.

Coursework is an integral part of all weekly classes and everyone enrolled will be expected to do coursework in order to benefit fully from the course. Only those who have registered for credit will be awarded CATS points for completing work the required standard.

Students must submit a completed Declaration of Authorship form at the end of term when submitting your final piece of work. CATS points cannot be awarded without the aforementioned form - Declaration of Authorship form

Application

To earn credit (CATS points) for your course you will need to register and pay an additional £30 fee per course. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online.

Please use the 'Book' or 'Apply' button on this page. Alternatively, please complete an enrolment form (Word) or enrolment form (Pdf).

Level and demands

Experience in using a programming or scripting language is beneficial. The basic elements of programming using the Python programming language will be introduced throughout the course. However, each student should consider that this course requires a certain amount of homework (2–3 hours per week) to familiarise with the concepts explained during the class. This is especially true for students who are not familiar with programming. This is a course on data science, so I will discuss some mathematical concepts even though I will try to keep these to a minimum. Expect some exposition to (1) linear algebra (e.g. matrices operations), (2) statistics, and (3) calculus. 

The Department's Weekly Classes are taught at FHEQ Level 4, i.e. first year undergraduate level, and you will be expected to engage in a significant amount of private study in preparation for the classes. This may take the form, for instance, of reading and analysing set texts, responding to questions or tasks, or preparing work to present in class.

Credit Accumulation and Transfer Scheme (CATS)

To earn credit (CATS points) you will need to register and pay an additional £30 fee per course. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online. Students who register for CATS points will receive a Record of CATS points on successful completion of their course assessment.

Students who do not register for CATS points during the enrolment process can either register for CATS points prior to the start of their course or retrospectively from the January 1st after the current full academic year has been completed. If you are enrolled on the Certificate of Higher Education you need to indicate this on the enrolment form but there is no additional registration fee.