Bioinformatics with Linux and Python (BILP01)
11 May 2020 - 15 May 2020£275.00 - £550.00
A fundamental part of bioinformatics (in contrast to simply computational biology) is the idea of scaling and automation. We want to arrange our tools into pipelines which can be executed with minimal supervision. Reliable automation of this type is key to many of the things that we want from our analyses; chiefly the ability to reproduce our results, and to extend them to other datasets.
In this course we will examine two different systems for automating bioinformatic analyses. For situations where we are mostly running existing command line tools, bash scripting will allow us to build pipelines with minimal overhead. We’ll start with simple command lines and see how the Linux environment – though not designed with biology in mind – is well suited to the type of automation we need.
For situations where we don’t have an existing tool available, and hence need to implement our own logic, bash quickly becomes unwieldy – it’s theoretically possible to write complex programs in bash, but the experience is painful! It’s much better to use a more modern programming language, and for most biological tasks Python fits the bill.
This course is aimed at complete beginners – no previous Linux or Python experience is required. It will be helpful if you have a basic knowledge of molecular biology so that you will be able to follow the examples – i.e. you should know what DNA and protein sequences look like, what an intron is, etc. The course is most likely to be useful if you have some idea of the type of analyses that you will need to automate. If you have any questions about whether this course is likely to be suitable/useful, drop an email to the course tutor email@example.com and we will figure it out.
Venue – PR informatics head office, 53 Morrison Street, Glasgow, G5 8LB – Google map
Availability – 15 places
Duration – 5 days
Contact hours – Approx. 35 hours
ECT’s – Equal to 3 ECT’s
Language – English
To book ‘COURSE ONLY’ with the option to add the additional ‘ACCOMMODATION PACKAGE’ please scroll to the bottom of this page.
Other payment options are available please email firstname.lastname@example.org
PLEASE READ – CANCELLATION POLICY: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact email@example.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees (and accommodation fees if booked through PR informatics) will be credited. However, PR informatics will not be held responsible/liable for any travel fees, accommodation costs or other expenses incurred to you as a result of the cancellation. Because of this PR informatics strongly recommends any travel and accommodation that is booked by you or your institute is refundable/flexible and to delay booking your travel and accommodation as close the course start date as economical viable.
Teaching sessions will be a mixture of live demonstrations, follow-along instructions, and from-scratch programming exercises. The course has a very practical emphasis, so the goal will be to spend lots of time working on problems with assistance from the tutor.
Assumed quantitative knowledge
Very basic math – I don’t think we have anything more complicated than addition/subtraction and powers.
Assumed computer background
No previous Linux or programming experience required, the course is suitable for complete beginners, as long as they are comfortable using a keyboard. Naturally, experience using a command line would be helpful, but not necessary. Students with some previous experience using Linux or Python might find the initial sessions familiar, but hopefully they will be a useful reminder.
Equipment and software requirements
Students will work on their own laptops, connecting to a teaching server for the Linux portion of the course. Windows users will need to install PuTTY from here:
in order to connect.
For the Python sessions, all students should install the Anaconda distribution:
UNSURE ABOUT SUITABLILITY THEN PLEASE ASK firstname.lastname@example.org
Sunday 10th – Meet at 43 Cook Street, Glasgow G5 8JN at between 17:00 – 21:00.
Monday 11th – Classes from 09:30 to 17:30
Session 1 – connecting to the server and basic Linux commands
In the first session we briefly cover the design of Linux: how is it different from Windows/OSX and how is it best used? We’ll then jump straight onto the command line and learn about the layout of the Linux filesystem and how to
navigate it. We’ll describe Linux’s file permission system (which often trips u pbeginners), how paths work, and how we actually run programs on the command line. We’ll learn a few tricks for using the command line more efficiently, and how to deal with programs that are misbehaving. We’ll finish this session by looking at the built in help system and how to read and interpret manual pages.
Session 2 – assembling Linux commands into pipelines
Many data types we want to work with in bioinformatics are stored as tabular plain text files, and here we learn all about manipulating tabular data on the command line. We’ll start with simple things like extracting columns, filtering and sorting, searching for text before moving on to more complex tasks like searching for duplicated values, summarizing large files, and combining simple tools into long commands. Aliases, shell redirection, pipes, and shell scripting will all be introduced here.
Tuesday 12th – Classes from 09:30 to 17:30
Session 3 – introduction to bash scripting and variables
In this session we will introduce the idea of a script – a text file that combines commands to be run as a batch. We will get to grips with the basic idea by converting some of the complex command lines that we composed in the previous session into scripts. This gives us an opportunity to discuss the pros and cons of scripting. An important idea introduced in this session is that of a variable – a bit of information that can be passed into scripts. Sometimes variables can be files, or lists of files, which allows us to build our own custom command line tools.
Session 4 – biological pipelines and data formats
In this session we will apply the approaches that we learned in the previous three sessions to biology-specific tools, looking at Eutils for sequence retrieval and EMBOSS for biological data file manipulation. A discussion of file format, focussing on FASTA and genbank format, will be necessary.
Wednesday 13th – Classes from 09:30 to 17:30
Session 5 – introduction to Python, text and files
In this session students learn to write very simple programs that produce output to the terminal, and in doing so become comfortable with editing and running Python code. This session also introduces many of the technical terms that we’ll rely on in future sessions. I run through some examples of tools for working with text and show how they work in the context of biological sequence manipulation. We also cover different types of errors and error messages, and learn how to go about fixing them methodically. We’ll finish by looking at how to get data in and out of our programs using files.
Session 6 – lists and loops in Python
A discussion of the limitations of the techniques learned in session 3 quickly reveals that flow control is required to write more sophisticated fileprocessing programs, and I introduce the concept of loops. We look at the way in which Python loops work, and how they can be used in a variety of contexts. We explore the use of loops and lists together to tackle some more difficult problems.
Thursday 14th – Classes from 09:30 to 17:30
Session 7 – conditions in Python
I use the idea of decision-making as a way to introduce conditional tests, and outline the different building-blocks of conditions before showing how conditions can be combined in an expressive way. We look at the different ways that we can use conditions to control program flow, and how we can structure conditions to keep programs readable.
Session 8 – writing functions in Python
We discuss functions that we’d like to see in Python before considering how we can add to our computational toolbox by creating our own. We examine the nuts and bolts of writing functions before looking at best-practice ways of making them usable. We also look at a couple of advanced features of Python – named arguments and defaults
Friday 15th – Classes from 09:30 to 16:00
Session 9 – paired data and dicts in Python
We discuss a few examples of key-value data and see how the problem of storing them is a common one across bioinformatics and programming in general. We learn about the syntax for dictionary creation and manipulation before talking about the situations in which dictionaries are a better fit that the data structures we have learned about thus far.
The afternoon of Friday 15th is a programming workshop to allow for leaving early for travel if nescessary.