O’Reilly, 2015.
Ideology: Data Skills for Robust and Reproducible Bioinformatics
How to Learn BioinformaticsWhy Bioinformatics? Biology’s Growing Data
Learning Data Skills to Learn Bioinformatics
New Challenges for Reproducible and Robust Research
Reproducible Research
Robust Research and the Golden Rule of Bioinformatics
Adopting Robust and Reproducible Practices Will Make Your Life Easier, Too
Recommendations for Robust Research
Recommendations for Reproducible Research
Continually Improving Your Bioinformatics Data Skills
Prerequisites: Essential Skills for Getting Started with a Bioinformatics Project
Setting Up and Managing a Bioinformatics ProjectProject Directories and Directory Structures
Project Documentation
Use Directories to Divide Up Your Project into Subprojects
Organizing Data to Automate File Processing Tasks
Markdown for Project Notebooks
Chapter 3Remedial Unix Shell
Why Do We Use Unix in Bioinformatics? Modularity and the Unix Philosophy
Working with Streams and Redirection
The Almighty Unix Pipe: Speed and Beauty in One
Managing and Interacting with Processes
Command Substitution
Working with Remote MachinesConnecting to Remote Machines with SSH
Quick Authentication with SSH Keys
Maintaining Long-Running Jobs with nohup and tmux
Working with Remote Machines Through Tmux
Git for ScientistsWhy Git Is Necessary in Bioinformatics Projects
Installing Git
Basic Git: Creating Repositories, Tracking Files, and Staging and Committing Changes
Collaborating with Git: Git Remotes, git push, and git pull
Using Git to Make Life Easier: Working with Past Commits
Working with Branches
Continuing Your Git Education
Bioinformatics DataRetrieving Bioinformatics Data
Data Integrity
Looking at Differences Between Data
Compressing Data and Working with Compressed Data
Case Study: Reproducibly Downloading Data
Practice: Bioinformatics Data Skills
Unix Data ToolsUnix Data Tools and the Unix One-Liner Approach: Lessons from Programming Pearls
When to Use the Unix Pipeline Approach and How to Use It Safely
Inspecting and Manipulating Text Data with Unix Tools
Advanced Shell Tricks
The Unix Philosophy Revisited
A Rapid Introduction to the R LanguageGetting Started with R and RStudio
R Language Basics
Working with and Visualizing Data in R
Developing Workflows with R Scripts
Further R Directions and Resources
Working with Range DataA Crash Course in Genomic Ranges and Coordinate Systems
An Interactive Introduction to Range Data with GenomicRanges
Working with Ranges Data on the Command Line with BEDTools
Chapter 10Working with Sequence Data
The FASTA Format
The FASTQ Format
Nucleotide Codes
Base Qualities
Example: Inspecting and Trimming Low-Quality Bases
A FASTA/FASTQ Parsing Example: Counting Nucleotides
Indexed FASTA Files
Working with Alignment DataGetting to Know Alignment Formats: SAM and BAM
Command-Line Tools for Working with Alignments in the SAM Format
Visualizing Alignments with samtools tview and the Integrated Genomics Viewer
Creating Your Own SAM/BAM Processing Tools with Pysam
Chapter 12Bioinformatics Shell Scripting, Writing Pipelines, and Parallelizing Tasks
Basic Bash Scripting
Automating File-Processing with find and xargs
Make and Makefiles: Another Option for Pipelines
Out-of-Memory Approaches: Tabix and SQLiteFast Access to Indexed Tab-Delimited Files with BGZF and Tabix
Introducing Relational Databases Through SQLite
Where to Go From Here?