Thursday, May 10, 2018

Downloading sequences from NCBI:

This downloads the GenBank file and puts it into a file called CP011547.gbk (Just change the accession number in the first line to download any other sequence):
i=CP011547
curl -s  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${i}&rettype=gb&retmode=txt">$i.gbk

The sequence as nucleotide fasta:
curl -s  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${i}&rettype=fasta&retmode=txt">$i.fna

The CDS as protein fasta:
curl -s  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${i}&rettype=fasta_cds_aa&retmode=txt">$i.cds.faa

The CDS as nucleotide fasta:
curl -s  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${i}&rettype=fasta_cds_na&retmode=txt">$i.cds.fna

Friday, February 16, 2018

Brief Bioinformatics Bash

Multi FASTA to one Sequence per FASTA

Write each Sequence of a multisequence FASTA to a separate FASTA file.


multifasta2Singlefastas.sh
#/bin/bash
while read line
do
    if [[ ${line:0:1} == '>' ]]
    then
        outfile=${line#>}.fa
        echo $line > $outfile
    else
        echo $line >> $outfile
    fi
done < $1


 Don't forget to chmod +x multifasta2Singlefastas.sh

Usage:
 ./multifasta2Singlefastas.sh input.fasta

Brief Bioinformatics Bash

FASTA to single line FASTA

Write the wrapped sequences of a FASTA file to a single line; example:

Before:
>Seq1
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
>Seq2
GCAGTGCAGTGCAGTGCAGT
GCAGTGCAGTGCAGTGCAGT

After:
>Seq1
ACGTACGTACGTACGTACGTACGTACGTACGTACGT
>Seq2
GCAGTGCAGTGCAGTGCAGTGCAGTGCAGTGCAGTGCAGT


multifasta2Singleline.sh
#/bin/bash
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' < $1 | grep -v '^$'


 Don't forget to chmod +x multifasta2Singleline.sh

Usage:
 ./multifasta2Singleline.sh input.fasta > output.fasta