Metabarcoding and Reproducible Bioinformatics Course

Home/Metabarcoding and Reproducible Bioinformatics Course

Welcome to EvoHull

EvoHull Metabarcoding and Reproducible Bioinformatics Course

The EMRBC will run from Monday 12th September to Weds 14th September 2016 in the EvoHull bioinformatics labs. Attendees will be a mix of internal PhD students, staff and visitors from other universities. This will be a small group, maximum 12 students, taught in an active-learning environment. The course will focus on metabarcoding of eukaryotic samples, particulaarly animals, using universal primers and Illumina sequencing technology. Key staff will be Dr Dave Lunt and Dr Christoph Hahn. Follow us for announcements on Twitter @evohull

You will be taught using the EvoHull Bioinformatics Lab computers running Ubuntu. In addition you will have access to the University of Hull VIPER High Performance Computer network as required.

The 3-day course

The course begins with an introduction to the command line, python and shell scripts, data file manipulations. It assumes no prior knowledge.
We move on to an introduction to reproducible science, how to (a) make your life easier and (b) make sure you can publish a reproducible piece of science
We will discuss the state of environmental genomics and metabarcoding, strategies for the future, approaches for different types of biological problem, primers and sequencing strategies. We then use all these skills to process, quality control and analyse NGS sequence data. You will work hands-on with messy real-world metabarcoding sequence data, and turn it into biological knowledge.

The 5-day course

There is also the opportunity, if you have your own data, to stay for two more days to analyse it in detail working alongside us. Please contact us to discuss this option.


Attendees will be charged £225 for 3 days or £325 for 5 days. This does not include meals and accommodation, though we can arrange University accommodation B&B for you. There will of course be copious quantities of coffee and biscuits provided to sustain all bioinformatics activities. For food, in addition to the University refectory selling hot food and sandwiches, the campus is close to a wide variety of reasonably-priced restaurants, cafes and bars.


If you are interested in attending EMRBC then please email Dr Dave Lunt ASAP, have a chat, and and I will help you to arrange things. Since there are only 12 places this course may fill up quickly.


This is the content of last year’s course and will be similar to the 2016 course. We will be expanding and refining the later stages of the course, although the exact examples taught will be decided over the summer.

1 Aims, background, and approaches
1.1 Background thinking/reading
1.2 Discussion: What kinds of questions can you ask?
1.3 DNA barcoding and metabarcoding
2 Introduction to the command line for bioinformatics
2.1 Navigating the file system
2.2 Editing, inspecting, and searching within text files
2.4 Search, replace, and write output to a new file
2.4.1 sed the stream editor
2.4.2 echo
2.4.3 text-processing scripts
2.5 Simple analysis scripts (programs)
2.5.1 Python scripts
2.5.2 Shell scripts
2.5.3 The point of scripts
2.6 Running programs from the terminal
2.6.1 For reference; things you may encounter
2.7 Tasks- review of command line skills
2.8 Some reading if you wish to extend your knowledge
3 Reproducible research & keeping lab records
3.1 The Jupyter notebook
3.2 Version control with Git
3.3 Docker containers
4 BLAST-based sequence assignment
4.1 A basic BLAST search
4.2 BLAST output
5 Phylogenetic-based sequence assignment
5.1 Phylogenetic placement and tree parsing
5.2 Example: pplacer analysis
6 The metaBEAT pipeline in a Jupyter notebook
7 NGS data, fastq files, sequence quality, trimming
8 Example: Analysis eDNA fish metabarcoding data
9 Comparing assignment approaches
10 Other methods of metabarcoding analysis
10.1 Bayesian classifiers
10.2 Community analysis strategies
11 Finish, pub debrief