0% found this document useful (0 votes)

31 views7 pages

Bioinformatic Programmer Cheat Sheet

The document provides a comprehensive list of basic filesystem commands, package management commands, and job management commands for Linux, particularly in a bioinformatics context. It includes instructions for working with files, modules, and programs, as well as specific commands for tools like bcftools, samtools, and snakemake. Additionally, it outlines steps for RNA sequencing analysis, including data downloading, pipeline setup, and resource allocation on a server.

Uploaded by

flamingpizzaboy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views7 pages

Bioinformatic Programmer Cheat Sheet

Uploaded by

flamingpizzaboy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

pBasic filesystem commands find <directory> -iname <word> – finds if the

pwd – prints the present working directory word is in the name of a file in the directory
ls <optional path> – lists the files in the current ls -1 | wc -l – gives the number of files in the
directory or the file path current directory
rm -rf <path> – deletes the file listed in the path md5sum <filename> > <filenamemd5sum.txt>
cd <path> – change directory to path – generate md5sum file
cd - – goes back to previous directory md5sum -c <filenamemd5sum.txt> – verify the
cd .. – goes up one directory contents of md5 file
cd ~ – goes to home directory gzip -d <file.gz> – unzip a gzip file
cd ../ – goes t tar -tf <file.tar> – view the contents of a tar file
without extracting it
cp <source path> <destination path> – copies tar -xvf <file.tar> – extract a tar file
file at source path to new destination path tar -xvzf <file.tar.gz> – extract a tar.gz file
touch <filename> – creates a file with filename
nano <filename> – opens filename to edit Working with packages
cat <filename> – prints contents of file pip list – Allows you to see the versions that
zcat <filename> – prints contents of a you have installed for pip
compressed file without decompressing it pip show <package> – Gives information for an
head -n <number> <filename> – prints the individually installed pip package
specified number of rows at the beginning of a <package> --version – Tells the version of a
file specified package
tail -n <number> <filename> – prints the
specified number of rows at the end of a file Working with modules
less <filename> – displays the contents a file module spider – Lists all available modules
one screen at a time module spider <module> – Gives details for
ln -s <original> <symlink> – creates symlink specific module
from original file module load <module> – Loads module
ssh <user>@login.tscc.sdsc.edu – logging in to
server Working with programs
scp <local/path> bcftools
<user>@login.tscc.sdsc.edu:<destination/path bcftools view -s <sample1,sample2> <file.vcf>
> – copy files from local to server > <filtered.vcf> – creates a subset of the
source ~/.bashrc – reload changes to the original vcf file
bashrc without logging out bcftools merge <vcf1.vcf.gz> <vcf2.vcf.gz> -o
chmod +x <filename> – makes file executable <combined.vcf.gz> – merges multiple vcfs into
sh <filename.sh> – runs shell script one combined vcf
conda
Learning info about files on linux conda create --name <environment> – creates
ll <optional path> – displays detailed directory an environment with that name
listings in path conda env create -f <environment.yml> –
readlink -f <symlinkName> – view symbolic link creates an environment from the yml file
file <broken_symlink> – view where broken conda list – lists all of the packages in the
symbolic link points to current conda environment
df -h <server_filepath> – checks total/available conda activate <environment> – activates the
space on a storage server conda environment with that name
du -sh <filepath> – Calculates the size of a file conda deactivate – deactivates your current
lfs quota -uh <user> <scratch_filepath> – tells conda environment
you how much space you have left in scratch conda env export > <environment.yml> –
which <executable_command> – tells you the creates yml file from current conda
filepath of the command that you are executing environment
grep <word> <filename> – finds word inside
filename
conda create --name <environment-name> -- vcftools --gzvcf <input_file.vcf.gz> --chr <chr#
clone <environment/path> – clones or #> --to-bp <end_pos> --out <output_prefix> -
environment from path
conda create --override-channels -c defaults -n -recode --remove-filtered-all --from-bp
py27 python=2.7 – creates an environment <start_pos> --recode-INFO-all – makes a
named py27 with python2 smaller vcf from a larger vcf at a specific
galyleo position
galyleo launch --account csd742 --qos condo --
partition condo --cpus 2 --time-limit 168:00:00 Dealing with tmux
--env-modules slurm,cpu/0.17.3 --conda-init tmux ls – lists current tmux sessions
~/.anaconda/etc/profile.d/conda.sh -m 14 tmux attach -d -t <session id> – reattaches a
Runs a jupyter notebook (edit for personal use previous tmux session based on id
and initialize in the conda environment you Ctrl+B then [ – scroll (q to quit)
plan to use) Ctrl+B then D – detach tmux session
gatk Ctrl+B C – Create a new window
picard CreateSequenceDictionary -R <fasta Ctrl+B X – Kill active pane
name> – creates dictionary for a fasta file Ctrl+B N or P – Move to the next or previous
github window
git clone <url.git> – installs a .git file tmux kill-session -t <targetSession> – Kills a
plink specific session
plink --pca --allow-extra-chr --vcf
<vcf_path.vcf.gz> – gets the files necessary for Running a job
a pca analysis on a combined vcf srun -N 1 -n 4 --mem 16G -t 168:00:00 -p
samtools platinum -q hcp-csd742 -A csd742 --pty bash
samtools view -H <bam filepath> – view the – (example) edit for individual use
header of a bam file squeue -u <user> – Lists job from user
samtools fastq <original.bam> > <new.fastq> – scancel -u <user> – Kills all jobs from user
converts bam file to fastq scancel <job_id> – Kills job id
samtools quickcheck <file.bam> – checks to exit- log out of server
make sure a bam file is not corrupted
samtools view -h -o <file.sam> <file.bam> –
converts a bam file into a sam
samtools view -bS <file.sam> > <file.bam> –
converts a sam file into a bam
tabix -p vcf <vcf_filepath.vcf.gz> – create an
index file for a vcf file
snakemake
snakemake -j 20 --cluster "sbatch -N 1 -n 2 --
mem 8G -t 24:00:00 -p condo -q condo -A
csd742" --rerun-incomplete --latency-wait 120
– Runs the program (edit for personal use)
snakemake -j 50 --cluster "sbatch -N 1 -n 1 --
mem 1G -t 8:00:00 -p platinum -q hcp-csd742 -
A csd742" --rerun-incomplete --latency-wait
120 --use-singularity --singularity-args "-B
/tscc/projects/ps-gleesonlab5/,/tscc/projects/ps-
gleesonlab7/,/tscc/projects/ps-gleesonlab8/,/
tscc/lustre/ddn/scratch/ton011/" – Runs a
snakemake file that uses singularity and conda
(edit for personal use)
Ctrl+C = cancel snakemake
vcftools
Toan’s Notes Pipelines are on ps-gleesonlab7
Activating jupyter notebook
conda activate Jupyter RNA seq: incoming raw data folder, gleeson8
ll -a (list hidden files)
Can derived a bam from fasq, can derived your
squeue -A csd742 (look at resources people fasq from bam
are using)
Login to Gleeson lab server:
seff <job ID> (look in specific nodes) cd /tscc/projects/ps-gleesonlab8/User/toan

Resource allocation:
snakefile, check for threads (small n) srun -N 1 -n 4 --mem 16G -t 168:00:00 -p
RNA seq, N=1, n=16 condo -q condo -A csd742 --pty bash

Scratch Space:
cd /tscc/lustre/ddn/scratch/ton011

Checking for scratch space availability:

lfs quota -uh ton011
/tscc/lustre/ddn/scratch/ton011

Resource allocation methylseq:

Mkdir analysis: create a new folder for
analysis in the WGS_DNM_pipeline folderls srun -N 1 -n 32 --mem 100G -t 168:00:00 -p
condo -q condo -A csd742 --pty bash
Quiting: Terminate run
Ctrl+B then D – detach tmux session Ctrl + C
Exit on home q=window
Kill Job
Scancel “Job ID”

How to check job ID?

squeue -A csd742?

Seff <Job ID>

RNA Seq Pipeline:

Follow direction of Github!!!!!!

Download the pipeline into your user fulder: Downloading raw data

Access “Incoming_raw_data” folder Gleeson8

cookiecutter git+ssh://[email protected]/Gleeson-
Lab/rna_seq_pipeline.git

name it with the date because you will need to

install this every time you run RNA seq. After
running the cookie cuter line, it will prompt you
on how to do it

Incoming raw data folder:

Ps-gleesonlab8/incoming_raw_data/
20241105_yuji Mkdir for new folder to put file in
Cd (keep tapping to the end because its Cd to that folder
nested) Download data using wget -r
This will show all the fastq file
Example:
You need to run the pipeline on all the fastq PACS2_RNA seq 5/15/25
(use a for loop) Your sequencing data is available on the FTP
server:
Your sequencing data is available on the FTP server:
Nextflow ftp://igm-storage.ucsd.edu/
250512_LH00444_0340_B22W5G5LT4

Username: gleeson
Password: tiJHtKP4y1
RNA_seq_PACS2

wget -r -nH --cut-dirs=1 --no-parent

ftp://gleeson:tiJHtKP4y1@igm-
storage.ucsd.edu/250512_LH00444_0340_B2
2W5G5LT4/ -P
/tscc/projects/ps-gleesonlab8/Incoming_raw_d
ata/20250515_Toan_ASOSFARI_PACS2_RNA
Seq_IGM
Edit the last part in custom_fasta to put in your
reference of interest (.fa file)

NOTE: Need to index the Fasta before you run

Edit the last part of

/tscc/nfs/home/xiy010/miniconda3/bwa index
[custom fasta]

Dry run using changed input: snakemake -n

Putting in a new reference genome:

Change the snake_conf.yaml, custom_fasta
More helpful commands Platinum: 14 days, 15 GB Ram per GPUs

seff <logID> – tell you the resources a job is

using
seff 5847641

bash script for checking the available

cores&memory on platinum and gold nodes

bash
/tscc/projects/ps-gleesonlab9/user/yix/scripts_t
emplates/check_avail.sh

Conda: max resources= 7 days, 7 GB ram per

GPUs

ST790 2015 Spring LecNotes
No ratings yet
ST790 2015 Spring LecNotes
212 pages
Linux For Bioinformatics (2012), Paul Stothard
100% (1)
Linux For Bioinformatics (2012), Paul Stothard
36 pages
Linux: Cut & Paste More Linux Commands
100% (1)
Linux: Cut & Paste More Linux Commands
16 pages
RHCSA 9 Command Reference Guide
100% (3)
RHCSA 9 Command Reference Guide
19 pages
OFSAA OIDF Application Pack 8.0.5 User Guide
No ratings yet
OFSAA OIDF Application Pack 8.0.5 User Guide
148 pages
07 Linux Commands
No ratings yet
07 Linux Commands
56 pages
Distributed Database
No ratings yet
Distributed Database
22 pages
Rhcsa Cheatsheet
100% (1)
Rhcsa Cheatsheet
12 pages
Linux Intermediate: ITS Research Computing Center C. D. Poon, PH.D
No ratings yet
Linux Intermediate: ITS Research Computing Center C. D. Poon, PH.D
77 pages
Basic Unix Commands:: Command Example Description
No ratings yet
Basic Unix Commands:: Command Example Description
4 pages
Linux Boot Camp
No ratings yet
Linux Boot Camp
27 pages
Basic Linux Alignement
No ratings yet
Basic Linux Alignement
37 pages
Ipython
No ratings yet
Ipython
14 pages
BIG DATA TPs
No ratings yet
BIG DATA TPs
28 pages
IDAB Lab2 2019
No ratings yet
IDAB Lab2 2019
30 pages
Introduction To Terminal
No ratings yet
Introduction To Terminal
56 pages
Linux Command Quick Reference Page
No ratings yet
Linux Command Quick Reference Page
10 pages
29 Linux Commands You Must Know: Ls - List
No ratings yet
29 Linux Commands You Must Know: Ls - List
15 pages
Evelyn's LINA Basic Guide To LINUX Commands
No ratings yet
Evelyn's LINA Basic Guide To LINUX Commands
32 pages
Computing Cheat Sheets
No ratings yet
Computing Cheat Sheets
18 pages
Linux Comands
No ratings yet
Linux Comands
3 pages
Big Data Analytics Dissertation
100% (2)
Big Data Analytics Dissertation
7 pages
SAP HANA Modeling Guide For SAP HANA XS Advanced Model en
No ratings yet
SAP HANA Modeling Guide For SAP HANA XS Advanced Model en
146 pages
Voltaire - La Henriada
No ratings yet
Voltaire - La Henriada
441 pages
Linux Commands
No ratings yet
Linux Commands
8 pages
Advanced Databases Course Guide
No ratings yet
Advanced Databases Course Guide
721 pages
Linux Commands
No ratings yet
Linux Commands
9 pages
Some Importantes Commands Linux
No ratings yet
Some Importantes Commands Linux
4 pages
Bash SSH
No ratings yet
Bash SSH
18 pages
HPC User Guide: Sydney University
No ratings yet
HPC User Guide: Sydney University
2 pages
UNIX & Windows Networking Lab Guide
No ratings yet
UNIX & Windows Networking Lab Guide
7 pages
Linux & Git Cheat Sheet
No ratings yet
Linux & Git Cheat Sheet
6 pages
Linux Commands Cheat Sheet New
No ratings yet
Linux Commands Cheat Sheet New
2 pages
Unix Commands for Basis Consultants
No ratings yet
Unix Commands for Basis Consultants
8 pages
Commands of Rhel
No ratings yet
Commands of Rhel
15 pages
Linux Command Line Reference Guide
No ratings yet
Linux Command Line Reference Guide
21 pages
Linux Command
No ratings yet
Linux Command
6 pages
SQLServer Guide
No ratings yet
SQLServer Guide
179 pages
CC - Unit-5
No ratings yet
CC - Unit-5
26 pages
Cut & Paste More Linux Commands: Command Description
No ratings yet
Cut & Paste More Linux Commands: Command Description
30 pages
Script Base de Datos Ejemplo Northwind
No ratings yet
Script Base de Datos Ejemplo Northwind
63 pages
Basic UNIX/Linux Commands Guide
No ratings yet
Basic UNIX/Linux Commands Guide
12 pages
#6 Adding File Upload To A Form
No ratings yet
#6 Adding File Upload To A Form
10 pages
SmokkieRom Lite V7.zip - Log
No ratings yet
SmokkieRom Lite V7.zip - Log
61 pages
Linux Intro PDF
No ratings yet
Linux Intro PDF
6 pages
RC Israsas 1618251322151
No ratings yet
RC Israsas 1618251322151
2 pages
Linux Basics for Beginners
No ratings yet
Linux Basics for Beginners
16 pages
Google Cloud PCA Master Cheat Sheet
No ratings yet
Google Cloud PCA Master Cheat Sheet
29 pages
Lecture 1 B Introduction - To - Unix - 2025
No ratings yet
Lecture 1 B Introduction - To - Unix - 2025
26 pages
Bash Commands
100% (1)
Bash Commands
5 pages
ABAP Interview Questions
No ratings yet
ABAP Interview Questions
148 pages
Oracle 12c DBA Guide for IT Professionals
100% (1)
Oracle 12c DBA Guide for IT Professionals
51 pages
RSLTE031 - Neighbor HO Analysis-RSLTE-ECI-2-Day-rslte LTE17A Reports RSLTE031 Danubyu
No ratings yet
RSLTE031 - Neighbor HO Analysis-RSLTE-ECI-2-Day-rslte LTE17A Reports RSLTE031 Danubyu
24 pages
Commands
No ratings yet
Commands
12 pages
Linux Intro
No ratings yet
Linux Intro
6 pages
Linux Commands
No ratings yet
Linux Commands
36 pages
Users Guide-Record Manager
No ratings yet
Users Guide-Record Manager
104 pages
Essential Linux Commands
No ratings yet
Essential Linux Commands
3 pages
Hwontlog
No ratings yet
Hwontlog
7 pages
Data Analytics and Visualization Question Bank
No ratings yet
Data Analytics and Visualization Question Bank
16 pages
Threadsafe Man: Command Description
No ratings yet
Threadsafe Man: Command Description
7 pages
Linux Command Lines
No ratings yet
Linux Command Lines
7 pages
Linux Command Reference: File Commands
No ratings yet
Linux Command Reference: File Commands
2 pages
Linux Commands for Power Users
No ratings yet
Linux Commands for Power Users
6 pages
Linux Intro
No ratings yet
Linux Intro
6 pages
CIS017-1 - Data Modelling Scenarios - Various Conceptual Data Models - Solutions
No ratings yet
CIS017-1 - Data Modelling Scenarios - Various Conceptual Data Models - Solutions
5 pages
Ems SQL Storage
No ratings yet
Ems SQL Storage
5 pages
Dbforbix Setup Guide: Andrea Dalle Vacche Guide Relative To Release: 0.1
No ratings yet
Dbforbix Setup Guide: Andrea Dalle Vacche Guide Relative To Release: 0.1
18 pages
CS - Xii - PB - Ii - Set-A
No ratings yet
CS - Xii - PB - Ii - Set-A
13 pages
Linux Commands 1
No ratings yet
Linux Commands 1
8 pages
6-ICT Technician-1690279240.Assessors Observation Checklist Dbms
No ratings yet
6-ICT Technician-1690279240.Assessors Observation Checklist Dbms
3 pages
DM4ML Quiz
No ratings yet
DM4ML Quiz
7 pages
Linux Command Line Quick Ref
No ratings yet
Linux Command Line Quick Ref
7 pages
Sample Data Dictionary
No ratings yet
Sample Data Dictionary
3 pages
Sage X3 - User Guide - HTG-Creating A Copy of A Folder PDF
50% (2)
Sage X3 - User Guide - HTG-Creating A Copy of A Folder PDF
14 pages
Linux Command Line Cheat Sheet
No ratings yet
Linux Command Line Cheat Sheet
14 pages
ACID Properties: Atomicity
No ratings yet
ACID Properties: Atomicity
2 pages
Understanding Big Data
No ratings yet
Understanding Big Data
14 pages
Top of Form Enter Your
No ratings yet
Top of Form Enter Your
8 pages
Linux Directory Structure Guide
No ratings yet
Linux Directory Structure Guide
2 pages

Bioinformatic Programmer Cheat Sheet

Uploaded by

Bioinformatic Programmer Cheat Sheet

Uploaded by

pBasic filesystem commands find <directory> -iname <*word*> – finds if the

Checking for scratch space availability:

Resource allocation methylseq:

How to check job ID?

Seff <Job ID>

Follow direction of Github!!!!!!

Access “Incoming_raw_data” folder Gleeson8

name it with the date because you will need to

Incoming raw data folder:

wget -r -nH --cut-dirs=1 --no-parent

NOTE: Need to index the Fasta before you run

Edit the last part of

Dry run using changed input: snakemake -n

Putting in a new reference genome:

seff <logID> – tell you the resources a job is

bash script for checking the available

Conda: max resources= 7 days, 7 GB ram per

You might also like

pBasic filesystem commands find <directory> -iname <word> – finds if the