bonohu blog

Category: shell

grep -v

Written by bonohu in shell on 金 19 4月 2019.

grep has various options. I frequently use -v option to filter lines without the keyword.

grep -v human data.txt

where data.txt is a bunch of data to be greped. We can filter out lines with keyword human.

Other example is to filter the lines which have a value …

fgrep

Written by bonohu in shell on 木 18 4月 2019.

It turned out that the command I regularly use to search a bunch of data is not known to others.

I frequently use fgrep to grep against a list of keywords (i.e. IDs).

fgrep -f keywords.txt data.txt

where keywords.txt contains a list of keywords ('one keyword …

Tiny Perl script for formatting TSA entries

Written by bonohu in shell on 月 03 9月 2018.

When we submit relatively large entries to DDBJ, we use Mass Submission System (MSS). We will use MSS for submitting transcriptome sequence assembly (TSA) to DDBJ.

After extracting the list of IDs from header lines of FASTA by command like

% perl -nle 'print $1 if(/^\>(\S+)/) hoge.fasta > id.txt' …

Removing version information in IDs

Written by bonohu in shell on 金 31 8月 2018.

Identifiers (IDs) in public databases often contain version information. For example, .16 in ENSG00000100644.16 from Ensembl and .1 in NM_001243084.1 from RefSeq. Such version information can be an obstacle to join entries from different databases. So, version information should be trimmed before joining. The file that contains such …

Local reverse complement

Written by bonohu in shell on 日 29 7月 2018.

To get reverse complement for specific DNA string is frequently needed in molecular biology. There were some web interface to do that, but it is not secure. Calculating reverse complement locally is ideal solution for that issue.

In the search of an example code with practical usefulness in GitHub, I …

Moving to cleanly installed High Sierra

Written by bonohu in shell on 木 12 7月 2018.

As there was a mechanical trouble in my main machine, MacBookPro, I am moving to new MacBookPro with cleanly installed High Sierra (10.13.6). Below is a log for my future replication...

First of all, default shell was changed with chsh command to /bin/zsh.

After installing Homebrew, coreutils …

join command option for generating tab-delimited file

Written by bonohu in shell on 金 15 6月 2018.

When we want to join two files by a same key into one file, we can use join command. We can join lines by a first column value of tab-delimited files.

join -j 1 file1 file2

Indeed, join command itself is very useful, default output is not tab-delimited text, but …

Join files by key

Written by bonohu in shell on 日 25 3月 2018.

Joining two files by key in the first column of files can be easily done by using UNIX command below.

join -j 1 file1.txt file2.txt

This is very useful command, but the output is space-delimited by default. In order to get the output by tab-delimited, following option for …

Retrieve a subset of sequence dataset

Written by bonohu in shell on 土 24 3月 2018.

In order to extract a set of sequence from FASTA-formatted file (both in nucleotides and peptides), several commands can be used to do so. In recent years, I regularly use blastdbcmd in NCBI BLAST suite. To run this command, the file must be indexed by makeblastdb with the option below …

uniq -c option

Written by bonohu in shell on 金 23 3月 2018.

When I want to count the number of redundant words in a file (hoge.txt), I have used simple Perl code like this(count.pl).

#!/usr/bin/perl
while(<>) {
        my($word) = split;
        $num{$word}++;
}
foreach (sort keys %num) {
        print "$_\t$num …