Python Pandas for Data Manipulation

Overview

This online day school aims to provide learners with a comprehensive understanding of data manipulation using the Pandas library.

The day starts by teaching you how to read and save data from and to different file formats, such as CSV files, Excel sheets, and JSON files. You will also learn how to clean up data by dealing with missing values, duplicate values, sorting based on specific columns, and replacing specific values. In addition, you will gain an understanding of index, multi- and hierarchical index as well as multi-row headers. 

The day also covers different ways to select data based on row or column values, which is equivalent to SQL select statements with various filtering conditions. We will explore how to transpose, join, concatenate, merge and reshape tables, with various important concepts and configurations to perform these operations. Additionally, you will learn how to create pivot tables and apply the GroupBy operator, explaining what these concepts are, why they are useful and how to apply them and obtain their results. 

Furthermore, you will understand how to create summaries, binning and aggregations of data by applying existing or user-defined functions. Finally, the day will cover how to generate basic plots and visualisations. By the end of the day, you will have gained the necessary skills to work with data using Pandas, a widely used library in the field of data science.

Basic knowledge of Python programming and familiarity with Python data types and data structures, such as dictionaries and lists, is expected to benefit from this day.

Please note: this event will close to enrolments at 23:59 BST on 1 May 2024.

Programme details

All times BST (UTC+1)

10am:
Part 1: Introduction, Data Cleaning and Preprocessing.
●    Reading and saving data from different sources.
●    Dealing with missing values and duplicates.
●    Sorting data based on specific columns.
●    Replacing specific values in data.
●    Data transformation (e.g. scaling and normalisation)

11.20am:
Tea/coffee break

11.40am:
Part 2: Data Selection and Manipulation.
●    Understanding different types of index and hierarchical index.
●    Multiple ways of selecting data based on rows or columns.
●    Transposing, joining, concatenating, and reshaping tables.
●    Using apply(), transform() and filter() functions for data manipulation.

1pm:
Lunch break

2pm:
Part 3: Data Aggregation and Summarization.
●    Creating pivot tables and applying the GroupBy operator.
●    Creating summaries, binning, and aggregations of data using built-in and user-defined functions.
●    Exploring different types of table join and merge operations.
●    Dealing with time-series data (i.e. date/time data).

3.20pm:
Tea/coffee break

3.35pm:
Part 4: Data Visualisation.
●    Generating basic plots and visualisations using  Pandas and Matplotlib.
●    Understanding different types of charts and their applications.
●    Creating custom visualisations using Seaborn and Plotly (if time permits).
●    Exporting plots and visualisations in different file formats.

4.55pm:
Course conclusion and wrap up

5pm:
Course disperses

Fees

Description Costs
Course fee £115.00

Funding

If you are in receipt of a UK state benefit or are a full-time student in the UK you may be eligible for a reduction of 50% of tuition fees.

Concessionary fees for short courses

Tutor

Dr Noureddin Sadawi

Dr Noureddin Sadawi specialises in machine/deep learning and data science. He has several years’ experience in various areas involving data manipulation and analysis. He received his PhD from the University of Birmingham. He is the winner of two international scientific software development contests - at TREC2011 and CLEF2012.

Noureddin is an avid scientific software researcher and developer with a passion for learning and teaching new technologies. He is an experienced scientific software developer and data analyst. Over the last few years, he has been using R and Python as his preferred programming languages.

He has also been involved in several projects spanning a variety of fields such as bioinformatics, textual/image/video data analysis, drug discovery, omics data analysis and computer network security. He has taught at multiple universities in the UK and has worked as a software engineer in different roles. Currently he holds the following part-time roles: senior content developer and lecturer at the University of London; international trainer with O'Reilly and Pearson; short course trainer and instructor at Goldsmiths University, London as well as a lecturer at the University of Oxford. He is the founder of SoftLight LTD, a London-based company that specialises in data science and machine/deep learning where he works as a consultant providing advice and expertise in these areas. Currently he is a member of the organising committee of this international conference: https://ilcict.ly/. A list of his publications can be found here.

Application

Please use the 'Book' or 'Apply' button on this page. Alternatively, please contact us to obtain an application form.

IT requirements

The University of Oxford uses Microsoft Teams for our learning environment, where students and tutors will discuss and interact in real time. Joining instructions will be sent out prior to the start date. We recommend that you join the session at least 10-15 minutes prior to the start time – just as you might arrive a bit early at our lecture theatre for an in-person event.

If you have not used the Microsoft Teams app before, once you click the joining link you will be invited to download it (this is free). Once you have downloaded the app, please test before the start of your course. If you are using a laptop or desktop computer, you will also be offered the option of connecting using a web browser. If you connect via a web browser, Chrome is recommended.

Please note that this course will not be recorded.