Buffalo Vince. Bioinformatics Data Skills

zip file
size 2,87 MB
contains epub document(s)

added by engineer 12/08/2015 22:22
info modified 12/09/2015 20:36

Buffalo Vince. Bioinformatics Data Skills

O’Reilly, 2015.

Ideology: Data Skills for Robust and Reproducible Bioinformatics
How to Learn Bioinformatics
Why Bioinformatics? Biology’s Growing Data
Learning Data Skills to Learn Bioinformatics
New Challenges for Reproducible and Robust Research
Reproducible Research
Robust Research and the Golden Rule of Bioinformatics
Adopting Robust and Reproducible Practices Will Make Your Life Easier, Too
Recommendations for Robust Research
Recommendations for Reproducible Research
Continually Improving Your Bioinformatics Data Skills
Prerequisites: Essential Skills for Getting Started with a Bioinformatics Project
Setting Up and Managing a Bioinformatics Project
Project Directories and Directory Structures
Project Documentation
Use Directories to Divide Up Your Project into Subprojects
Organizing Data to Automate File Processing Tasks
Markdown for Project Notebooks
Chapter 3Remedial Unix Shell
Why Do We Use Unix in Bioinformatics? Modularity and the Unix Philosophy
Working with Streams and Redirection
The Almighty Unix Pipe: Speed and Beauty in One
Managing and Interacting with Processes
Command Substitution
Working with Remote Machines
Connecting to Remote Machines with SSH
Quick Authentication with SSH Keys
Maintaining Long-Running Jobs with nohup and tmux
Working with Remote Machines Through Tmux
Git for Scientists
Why Git Is Necessary in Bioinformatics Projects
Installing Git
Basic Git: Creating Repositories, Tracking Files, and Staging and Committing Changes
Collaborating with Git: Git Remotes, git push, and git pull
Using Git to Make Life Easier: Working with Past Commits
Working with Branches
Continuing Your Git Education
Bioinformatics Data
Retrieving Bioinformatics Data
Data Integrity
Looking at Differences Between Data
Compressing Data and Working with Compressed Data
Case Study: Reproducibly Downloading Data
Practice: Bioinformatics Data Skills
Unix Data Tools
Unix Data Tools and the Unix One-Liner Approach: Lessons from Programming Pearls
When to Use the Unix Pipeline Approach and How to Use It Safely
Inspecting and Manipulating Text Data with Unix Tools
Advanced Shell Tricks
The Unix Philosophy Revisited
A Rapid Introduction to the R Language
Getting Started with R and RStudio
R Language Basics
Working with and Visualizing Data in R
Developing Workflows with R Scripts
Further R Directions and Resources
Working with Range Data
A Crash Course in Genomic Ranges and Coordinate Systems
An Interactive Introduction to Range Data with GenomicRanges
Working with Ranges Data on the Command Line with BEDTools
Chapter 10Working with Sequence Data
The FASTA Format
The FASTQ Format
Nucleotide Codes
Base Qualities
Example: Inspecting and Trimming Low-Quality Bases
A FASTA/FASTQ Parsing Example: Counting Nucleotides
Indexed FASTA Files
Working with Alignment Data
Getting to Know Alignment Formats: SAM and BAM
Command-Line Tools for Working with Alignments in the SAM Format
Visualizing Alignments with samtools tview and the Integrated Genomics Viewer
Creating Your Own SAM/BAM Processing Tools with Pysam
Chapter 12Bioinformatics Shell Scripting, Writing Pipelines, and Parallelizing Tasks
Basic Bash Scripting
Automating File-Processing with find and xargs
Make and Makefiles: Another Option for Pipelines
Out-of-Memory Approaches: Tabix and SQLite
Fast Access to Indexed Tab-Delimited Files with BGZF and Tabix
Introducing Relational Databases Through SQLite

Where to Go From Here?