Free Computers!*

*Terms and Conditions may apply

Dan MacLean

This talk

Discussing some options in ‘free’ computing resources in bioinformatics

Not This Talk

Best Practice Guide to Bioinformatics Analyses

Free?

  • Free as in ‘free beer’ (gratis)
  • Trade offs in cost/performance
  • Expertise still needed

Pick Two

  • Cheap
  • Power
  • Accessible

Three options

  • Amazon Web Services (AWS)
  • Galaxy (usegalaxy.org)
  • Google Colab and Jupyter

AWS

Amazon Web Services

  • Cheap*, Power, Accessible

cbre.com

*amazing value but always some charge

Amazon Web Services

  • Command line based
  • Only restriction is your knowledge

aws terminal

Galaxy

https://usegalaxy.org

A web tool for bioinformatics

  • Cheap, Power, Accessible
  • Graphical user interface
  • Low barrier to tool use
  • Publicly available

https://galaxyproject.org/

Galaxy Demo

  1. Perform sequence Quality Control
    1. Upload some FASTQ reads
    2. Run FASTQC
    3. Examine output
  2. Assemble PacBio Data
    1. Get data from history
    2. Run minimap and miniasm
    3. Run assemblystats

The Main Page

usegalaxy.org

Using Galaxy

Workflows

Your Galaxy

Google Colab

Google Colab

  • Cheap(-ish), Power(-ish), Accessible(-ish)
  • A generous helping of RAM (usually 12-15 GB)
  • A CPU with good processing power
  • Access to a GPU (Graphics Processing Unit) for intensive calculations
  • About 100 GB of temporary storage
  • Temporary

google.com

Jupyter Notebooks

Literate Computing

jupyter.org

Using Google Drive for Persistence

from google.colab import drive
drive.mount('/content/drive')

Installing software with package managers

App stores for useful software

pypi.org

Package Managers

  • apt
  • pip
  • conda
  • bioconda

apt

  • Colab runs Ubuntu, therefore update the system with apt
  • For tools that seem fundamental to a computer:
    • gzip
    • wget
    • system wide libraries: libgcc
    • programming languages: java , R
!apt-get update
!apt-get install gzip wget

pip

  • default for Python
  • e.g. !pip install biopython
  • limited to Python packages

conda and bioconda

  • More powerful and general
  • Swiss army knife for scientific software
  • Environments prevent conflict
  • bioconda special section of bioinformatics tools
  • 7000 packages maintained by bioinformatics community

Installing conda and bioconda

Install conda in a code cell:

!pip install -q condacolab
import condacolab
condacolab.install() # expect a kernel restart

Configure it to use bioconda as a package source:

!conda config --add channels defaults
!conda config --add channels bioconda
!conda config --add channels conda-forge

Install the packages you want:

!conda install fastqc samtools bcftools bwa

Google Colab Demo

  1. Setup machine
    1. Install conda and bioconda
    2. Install packages needed
    3. Connect the GDrive
  2. Perform sequence Quality Control
    1. Extract Reads Uploaded Archive
    2. Run FASTQC
    3. Examine output
  3. Call Sequence Variants
    1. Align reads to reference using bwa
    2. Call variants with samtools and bcftools
    3. Examine output

Google Colab Demo

Summary

  • There are free computers for bioinformatics analysis
  • Three places to find them are AWS, Galaxy and Colab
  • Each has its trade offs
  • Colab is probably the best

Acknowledgements