Resources for Learning Bioinformatics and Computational Biology
Website by Mike Saint-Antoine
Most Important
- The Sequences by Eliezer Yudkowsky
- This might seem like an unusual one to start with because it isn't specifically about bioinformatics / programming / biology.
Rather, it's a collection of essays about how to reason about the world and figure things out using the information
that you have. But this is the real skill to be mastered, much more important than memorizing some lines of code or math formulas.
Everything else on this list is basically a tool that can be put toward that general goal.
- Python (Caleb Curry)
- One of the most important skills to learn for bioinformatics is computer programming. The two most commonly used
programming languages in the field are Python and R, and there's some debate over which one aspiring bioinformaticians should start with.
Personally, I think it's better to start with Python in order to get a solid introduction to computer science fundamentals, but that's just my opinion.
This beginner-friendly tutorial series from Caleb Curry is the best introduciton to Python that I know of.
- Python Data Analysis - NumPy, Pandas, Matplotlib (freeCodeCamp)
- After learning the very basics of Python, the next thing to learn is data analysis and visualization. This is a good tutorial covering
some of the common Python libraries used for this purpose. This skill is particularly good to have because it's useful in a wide variety of fields,
not just bioinformatics.
- R (freeCodeCamp)
- R is the other commonly used language in bioinformatics besides Python. While Python is a general purpose language, R is specifically for
statistics and data analysis / visualization, and has lots of bioinformatics packages that make things easy.
- Algorithms and Data Structures (freeCodeCamp)
- Bioinformatics overlaps significantly with computer science, so it's worth asking: how much computer science should you learn? Which of the intermediate/advanced CS topics
will be useful in the context of bioinformatics? In my opinion, data structures / algorithms are the most important advanced CS topics to cover for bioinformatics (moreso than operating systems, compilers, etc).
A good data structures / algorithms course will get you really thinking like a computer scientist, and thinking about how solve problems in the most efficient way possible in terms of memory and runtime, which is especially
important when you're dealing with huge datasets in genomics or transcriptomics.
- Statistics (StatQuest)
- One of the most important skills in bioinformatics (and research in general) is having a solid understanding of statistics at the intuitive level,
so that you can actually reason about it instead of just memorizing steps to follow and plugging numbers into formulas. This is a great course for developing
that statistical intuition.
- Machine Learning Basics (StatQuest)
- Same thing with machine learning -- the goal should be to develop an intuitive understanding of how the methods work, so they don't seem mysterious. This is a great series for developing that intuition.
- Transcriptomics / RNA-seq: Overview and Data Analysis (StatQuest)
- The same guy who made the statistics and machine learning courses also has a course covering the basics of RNA-seq and how to analyze the data that comes out of that experiment.
- Bioinformatics 101 (Bioinformagician)
- This is a great introductory series covering some general principles in bioinformatics, as well as some common data formats and software packages in R.
- University Course: Systems Biology (MIT)
- Great course from MIT on math modeling, mostly of gene expression and its regulation. This is a bit different from the other resources I've listed (less data analysis, more math),
and it might not be necessary if you're doing regular/traditional bioinformatics. But I think it's still good to have some background in it.
- An Introduction to Systems Biology: Design Principles of Biological Circuits (Uri Alon)
- Great textbook. Similar to the MIT course in that it's mostly about math models of gene expression and regulatation. Again, different from traditional bioinformatics (which is mostly data analysis), but good to know.
- Neural Networks From Scratch (Andrej Karpathy)
- This is a great course about how to create a deep learning / neural network model completely from scratch in Python, so that you understand how it works at every step. It really is the best course in my opinion for teaching the fundamental principles of
deep learning and the math involved for training a neural network with gradient descent. The math is actually surprisingly simple considering how powerful it is when done at a large scale, and this series makes it easy to understand. It isn't specifically about biology
or bioinformatics, but I'm including it as an important resouce, because lately AI has been having an impact on every field, including biology, and this is a great resource for understanding how it works, so that it will seem like simple but clever math, rather than magic.
- The Selfish Gene (Richard Dawkins)
- Great book on evolution from a gene-centric view. Really a masterpiece, great for professional scientists as well as laypeople. Evolution by Darwinian selection ultimately underpins every topic in biology,
and this book does a great job of explaining how these simple principles give rise to much of the complexity of biology. Definitely recommend it, whether your research is directly related to evolution or not.
Programming Basics
Machine Learning
Biology Review
Bioinformatics and Computational Biology
Mathematical Biology
Textbooks
Regular Books
Podcasts
Blogs
Specific Essays / Blog Posts / Articles
My Courses
- Intro to Bioinformatics
- Basic introduction to bioinformatics for people who already know a bit of programming in Python, but don't necessarily have any biology background.
The goal is to give a broad overview of the field, first covering some fundamental biology principle, and then doing two realistic data analysis projects: a differential expression analysis and a GWAS analysis.
- Melanoma Image Classifier with PyTorch
- Short course about how to make a Python machine learning model that can classify images of melanomas and benign moles, using PyTorch. Good for getting some practice with PyTorch, as well as
developing a better understanding of machine learning principles with a realistic project.
- Neural Nets from Scratch in Julia
- Course on how to make a basic deep learning / neural network package in the Julia programming language, completely from scratch, with no imports. I think this is a helpful project for understanding
both the principles of deep learning / neural nets / gradient descent, as well as programming in Julia. This is closely related to SimpleGrad.jl,
and educational gradient-tracking package that I created in Julia. The course basically shows how to make this package completely from scratch, covering the fundamental deep learning principes at each step.
- Modeling Gene Regulatory Networks
- Course on basic math modeling and simulation of gene regulatory networks in Python, covering both deterministic and stochastic modeling techniques.
- Basic Python
- I mentioned at the top of this page that I think the best way to learn Python as a beginner is Caleb Curry's course. But I also wanted my Youtube channel to be somewhat "self-contained,"
so that if someone came to the channel wanting to learn bioinformatics they could find everything they needed to get started. So this is my own series covering the very basics of Python, that basically serves as a starting point and covers the pre-requisites in case
anyone wants to watch my other tutorials, but doesn't know Python yet. This course starts from the very basics, and is intended for people who have never written a line of code before in their life.