Category: shell

grep -v

Written by bonohu in shell on 金 19 4月 2019.

grep has various options. I frequently use -v option to filter lines without the keyword.

grep -v human data.txt

where data.txt is a bunch of data to be greped. We can filter out lines with keyword human.

Other example is to filter the lines which have a value …

Continue reading »


fgrep

Written by bonohu in shell on 木 18 4月 2019.

It turned out that the command I regularly use to search a bunch of data is not known to others.

I frequently use fgrep to grep against a list of keywords (i.e. IDs).

fgrep -f keywords.txt data.txt

where keywords.txt contains a list of keywords ('one keyword …

Continue reading »



Removing version information in IDs

Written by bonohu in shell on 金 31 8月 2018.

Identifiers (IDs) in public databases often contain version information. For example, .16 in ENSG00000100644.16 from Ensembl and .1 in NM_001243084.1 from RefSeq. Such version information can be an obstacle to join entries from different databases. So, version information should be trimmed before joining. The file that contains such …

Continue reading »


Local reverse complement

Written by bonohu in shell on 日 29 7月 2018.

To get reverse complement for specific DNA string is frequently needed in molecular biology. There were some web interface to do that, but it is not secure. Calculating reverse complement locally is ideal solution for that issue.

In the search of an example code with practical usefulness in GitHub, I …

Continue reading »


Moving to cleanly installed High Sierra

Written by bonohu in shell on 木 12 7月 2018.

As there was a mechanical trouble in my main machine, MacBookPro, I am moving to new MacBookPro with cleanly installed High Sierra (10.13.6). Below is a log for my future replication...

First of all, default shell was changed with chsh command to /bin/zsh.

After installing Homebrew, coreutils …

Continue reading »



Join files by key

Written by bonohu in shell on 日 25 3月 2018.

Joining two files by key in the first column of files can be easily done by using UNIX command below.

join -j 1 file1.txt file2.txt

This is very useful command, but the output is space-delimited by default. In order to get the output by tab-delimited, following option for …

Continue reading »


Retrieve a subset of sequence dataset

Written by bonohu in shell on 土 24 3月 2018.

In order to extract a set of sequence from FASTA-formatted file (both in nucleotides and peptides), several commands can be used to do so. In recent years, I regularly use blastdbcmd in NCBI BLAST suite. To run this command, the file must be indexed by makeblastdb with the option below …

Continue reading »


uniq -c option

Written by bonohu in shell on 金 23 3月 2018.

When I want to count the number of redundant words in a file (hoge.txt), I have used simple Perl code like this(count.pl).

1
2
3
4
5
6
7
8
#!/usr/bin/perl
while(<>) {
        my($word) = split;
        $num{$word}++;
}
foreach (sort keys %num) {
        print "$_\t$num …

Continue reading »