Precision Medicine Bioinformatics: From raw genome and transcriptome data to clinical interpretation (PMBI01)
29 October 2018 - 2 November 2018£275.00 - £550
Precision medicine refers to the use of prevention and treatment strategies that are tailored to the unique features of each individual and their disease. Analysis of high throughput genome and transcriptome data is major component of new large-scale precision medicine efforts. This analysis involves the identification of specific genome or transcriptome features that predispose an individual to disease, predict response to specific therapy, or influence diagnosis and prognosis. During this course (PMBI01), students will perform an end-to-end precision medicine analysis of real human genome (WGS and Exome) and transcriptome (RNA-seq) data. Students will start with raw sequence data for a hypothetical patient, learn to use the tools needed to analyze this data on the cloud, and interpret results in a clinical context. The goal of the analysis will be to identify personalized therapeutic options for this patient as well as identifying any prognostic or diagnostic implications in the data. After completing the course, students should be in a position to (1) understand raw sequence data formats, (2) perform bioinformatics analyses on the cloud, (3) run complete analysis pipelines for alignment, variant calling, annotation, and RNA-seq, (4) visualize and interpret whole genome, exome and RNA-seq results, (5) leverage the identification of passenger variants for immunotherapy (e.g. personalized cancer vaccines) and disease monitoring applications, and (6) begin to place these results in a clinical context by use of variant knowledgebases. The data, tools, and analysis will be most directly relevant to human cancer genomics and bioinformatics. However, many of the skills and concepts covered will be applicable to other human diseases and even those studying non-human organisms. All course materials (including copies of presentations, practical exercises, data files, and example scripts prepared by the instructing team) will be provided electronically to participants.
This workshop is primarily aimed at researchers and technical workers with a background in biology who want to learn fundamental bioinformatics skills for genomics with a particular emphasis on medical research applications. The course is essentially a crash course in bioinformatics for next generation sequence data analysis. It would also be useful for students with a computational background who seek an introduction to genomics technology and analysis approaches. In general, it is suitable for anyone working with genome or transcriptome data in the context of disease research. Attendees are encouraged to bring their own data or project outlines for discussion. Some time during the course will be dedicated to consultation with a team of instructors from the McDonnell Genome Institute.
Venue – PR informatics head office, 53 Morrison Street, Glasgow, G5 8LB – Google map
Availability – 30 places
Duration – 5 days
Contact hours – Approx. 37 hours
ECT’s – Equal to 3 ECT’s
Language – English
We offer COURSE ONLY and ACCOMMODATION PACKAGES;
• COURSE ONLY – Includes lunch and refreshments.
• ACCOMMODATION PACKAGE (to be purchased in addition to the course only option) – Includes breakfast, lunch, welcome dinner Monday evening, farewell dinner Thursday evening, refreshments and accommodation. Self-catering facilities are available in the accommodation. Accommodation is approximately a 6-minute walk from the PR informatics head office. Accommodation is multiple occupancy (max 3-4 people) single sex en-suite rooms. Arrival Sunday 20th May (after 5pm) and departure Friday 25th May (accommodation must be vacated by 9am). An additional nights accommodation can be purchased, departure 9am Saturday morning email for details.
To book ‘COURSE ONLY’ with the option to add the additional ‘ACCOMMODATION PACKAGE’ please scroll to the bottom of this page.
Other payment options are available please email firstname.lastname@example.org
Cancellation policy: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact email@example.com Failure to attend will result in the full cost of the course being charged. In the unfortunate event that PRinformatics must cancel this course due to unforeseen circumstances a full refund for the course will be credited. However PRinformatics cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.
The workshop is delivered over ten half-day sessions (see the detailed curriculum below). Each session consists of roughly a 30 minute lecture followed by two hours of practical exercises, with breaks at the organizer’s discretion.
Assumed quantitative knowledge
Students should have enough biological background to appreciate the examples and exercise problems, and have at least some interest in working with next generation sequence (NGS) data.
Assumed computer background
No programming or scripting experience is necessary, but some previous expertise using a Linux terminal and/or R will be most welcome.
IMPORTANT: If you have not used a Linux command line before, it is highly recommended that you complete at least one of the following brief primers shortly before arriving at the course:
Equipment and software requirements
All examples will be run in a Linux environment on the cloud. Students will use their own laptops to access the cloud, and need only to have R, RStudio, IGV, and some terminal software installed (e.g. Terminal for Mac, PuTTY, or similar).
R, RStudio, and IGV are supported by both PC and MAC and can be downloaded for free by following these links:
There are many terminal software options available depending on your operating system. If you are linux, you likely already have several options. On Mac, the included ‘Terminal’ application will work nicely. On Windows, there are also many options such as PuTTY (http://www.putty.org/).
While this course will use cloud compute infrastructure, everything presented can also be performed on a local compute cluster. Detailed installation and configuration instructions will be provided to help students set up their own pipelines when they return to their own institutions.
UNSURE ABOUT SUITABLILITY THEN PLEASE ASK firstname.lastname@example.org
Sunday 15th – Check in at Pollock Halls of Residence, Edinburgh University, between 16:00 and 20:00.
Monday 16th – Classes from 09:00 to 17:00
Session 1. Introduction to precision medicine genomics and bioinformatics.
In this session, students will be introduced to key concepts of genomics and their application to precision medicine in cancer. These concepts will be demonstrated by drawing from real precision medicine exercises undertaken by our own Genomics Tumor Board at Washington University. An introduction to next-generation sequencing platforms and related bioinformatics approaches will also be provided. Core concepts and tools introduced: fundamentals of genome and transcriptome analysis, next-generation sequencing, precision/personalized medicine approaches in cancer.
Session 2. Introduction to genomics data, file formats, QC, and cloud analysis.
In this session, students will be introduced to a hypothetical patient case and related samples to be analyzed throughout the course. Students will be provided with an introduction to the whole genome, exome, transcriptome and other data sets we have generated for this test case. Information on where to get the raw data and how to access it (and other test data) will be provided. Using this data as an example, the students will learn fundamentals of NGS data formats. The students will also be introduced to accessory files needed for analysis including reference genomes, reference transcriptomes, and annotation files. Tools for QC analysis of raw data will be demonstrated. Since most analysis will be performed on the cloud, each student will learn how to launch and log into their own cloud compute environment. Students will learn how to install bioinformatics tools in this environment and learn to use some of the most broadly useful tool kits for NGS data. Core concepts and tools introduced: file formats (Fasta, FastQ, SAM/BAM/CRAM, VCF, GTF), bedtools, Picard, samtools, fastQC, cloud computing (AWS, EC2).
Tuesday 17th – Classes from 09:00 to 17:00
Session 3. Primary genome data analysis (sequence alignment and visualization).
In this session, we will start to complete analysis of our patient data at the command line. Students will log into the cloud, and starting with their own copy of the patient data will align the whole genome and exome data to a reference genome. Following alignment, students will conduct a second quality analysis of the data and learn to visualize alignments in IGV. Core concepts and tools introduced: alignment algorithms, reference indexes, BWA, BWA-mem, alignment indexes, alignment flags, genome browsers, duplicate marking, alignment merging and sorting, IGV.
Session 4. Whole genome and exome variant calling and annotation.
In this session, we will introduce different algorithms for identifying sequence variations of various types from either whole genome or exome data (or both). Both germline and somatic variant calling will be covered. For each, students will learn strategies for identifying false positives and increasing confidence in individual predictions by manual or secondary examination of the alignments. Variant types detected will include single nucleotide variants (SNVs), small insertions and deletions (indels), copy number variants (CNVs) and structural variants (SVs). Students will learn strategies for visualizing and presenting variants of each type in a patient report. After producing filtered variant results of each type, annotation methods and resources relevant to each variant type will be demonstrated. Core concepts and tools introduced: germline variation, somatic variation, variant calling, false positives, false negatives, alignment artifacts, manual review, svviz, manta, GATK, Strelka, MuTect, VarScan, CopyCat, Lumpy.
Wednesday 18th – Classes from 09:00 to 17:00
Session 5. RNA-seq analysis (introduction, alignment and abundance estimation).
In this session, students will learn about fundamentals of RNA-seq data analysis and perform initial QC and alignment of the patient’s transcriptome. Appropriate sample comparisons for RNA-seq in a precision medicine context will also be discussed. Core concepts and tools introduced: reference transcriptomes, spliced alignment algorithms, RNA-seq data trimming, RNA assembly algorithms, RNASeqQC, HISAT, StringTie.
Session 6. RNA-seq analysis (fusions, differential expression, and clustering).
The uses of transcriptome data in a precision are remarkably varied. Students will pursue several strategies in this section. Fusion detection, an RNA-seq specific variant detection approach will be performed on our patient’s data. The expression abundance results from the previous section will be used to identify a list of highly expressed genes. Comparison to RNA-seq data from a cohort of related samples will be used to identify expression outliers in our patient. Expression clustering algorithms will be used to stratify our patient into a known prognostic group. More advanced classification and pathway based approaches to stratification will be briefly introduced. Core concepts and tools introduced: outlier analysis, expression clustering, patient stratification, heatmaps, Ballgown.
Thursday 19th – Classes from 09:00 to 17:00
Session 7. Prioritization, visualization and interpretation.
In this session, students will learn about procedures for refining the final results obtained from the previous analyses of our patient’s data. Genome and transcriptome variant observations will be prioritized according to various annotation strategies. These vary from algorithmic predictions of pathogenicity to intersecting with results from population databases. Students will also learn how to integrate results from the DNA and RNA-seq analyses. For example, variants will be prioritized according to their expression status, allele specific expression bias, and the abundance of associated genes. Fusions predicted in the RNA will be confirmed in the DNA. Visualization techniques will be used to place variant observations from our patient in the context of a cohort of previously sequenced patients with the same disease. A group discussion will tackle how to approach creating a final clinical interpretation for our example patient. Core concepts and tools introduced: allele specific expression, clonality, GenVizR, gnomad, CADD, bam-readcount, integrate.
Session 8. Gene/variant knowledgebases and clinical actionability.
In this session, students will learn the fundamentals of interpreting genome and transcriptome observations in a clinical context. The final candidate observations for our example patient will be examined using various clinical interpretation tools and databases. Core concepts and tools introduced: Druggability, actionability, sensitivity, resistance, predictive variants, diagnostic variants, prognostic variants, predisposing variants, the ACMG and AMP guidelines for clinical actionability, variant knowledgebases, CBioPortal, CIViC, ClinVar, DGIdb, PharmGKB.
Friday 20th – Classes from 09:00 to 16:00
Session 9. Leveraging passenger variants (monitoring and immunogenomics).
Up until this point, we have been focused on identifying, annotating and interpreting variants that are relevant to disease. These are variants that are deemed functional, actionable, or of some known clinical relevance. What about those variants that may be unusual or unique to this patient but of no known significance? What about the “passenger” variants? In this section, we will explore two broad strategies that leverage passenger cancer variants in a clinically useful way. First, we will examine their potential use in tracking response to therapy. Second, we will explore the possible immunogenomic implication of passenger variants by designing a personalized cancer vaccine for our example patient. Core concepts and tools introduced: cfDNA, serial analysis, immunotherapy, pVacTools.
Session 10. Application to your own data
Optional free afternoon to cover previous modules or consult with the team of instructors. In this session, students will be free to work on their own, or in groups on the previously covered sections. Furthermore, students can consult with the team of instructors on their own experiments or get practical advice for analyzing their own data. Our hope is to make this session as interactive and useful as possible.