Written by bonohu in shell on 金 19 4月 2019.
grep
has various options.
I frequently use -v
option to filter lines without the keyword.
where data.txt
is a bunch of data to be greped.
We can filter out lines with keyword human.
Other example is to filter the lines which have a value …
Continue reading »
Written by bonohu in shell on 木 18 4月 2019.
It turned out that the command I regularly use to search a bunch of data is not known to others.
I frequently use fgrep
to grep against a list of keywords (i.e. IDs).
fgrep -f keywords.txt data.txt
where keywords.txt
contains a list of keywords ('one keyword …
Continue reading »
Written by bonohu in shell on 月 03 9月 2018.
When we submit relatively large entries to DDBJ, we use Mass Submission System (MSS).
We will use MSS for submitting transcriptome sequence assembly (TSA) to DDBJ.
After extracting the list of IDs from header lines of FASTA by command like
% perl -nle 'print $1 if(/^\>(\S+)/) hoge.fasta > id.txt' …
Continue reading »
Written by bonohu in shell on 金 31 8月 2018.
Identifiers (IDs) in public databases often contain version information.
For example, .16
in ENSG00000100644.16
from Ensembl and .1
in NM_001243084.1
from RefSeq.
Such version information can be an obstacle to join entries from different databases.
So, version information should be trimmed before joining.
The file that contains such …
Continue reading »
Written by bonohu in shell on 日 29 7月 2018.
To get reverse complement for specific DNA string is frequently needed in molecular biology.
There were some web interface to do that, but it is not secure.
Calculating reverse complement locally is ideal solution for that issue.
In the search of an example code with practical usefulness in GitHub,
I …
Continue reading »
Written by bonohu in shell on 木 12 7月 2018.
As there was a mechanical trouble in my main machine, MacBookPro, I am moving to new MacBookPro with cleanly installed High Sierra (10.13.6). Below is a log for my future replication...
First of all, default shell was changed with chsh
command to /bin/zsh
.
After installing Homebrew, coreutils …
Continue reading »
Written by bonohu in shell on 金 15 6月 2018.
When we want to join two files by a same key into one file, we can use join
command. We can join lines by a first column value of tab-delimited files.
Indeed, join
command itself is very useful, default output is not tab-delimited text, but …
Continue reading »
Written by bonohu in shell on 日 25 3月 2018.
Joining two files by key in the first column of files can be easily done by using UNIX command below.
join -j 1 file1.txt file2.txt
This is very useful command, but the output is space-delimited by default.
In order to get the output by tab-delimited, following option for …
Continue reading »
Written by bonohu in shell on 土 24 3月 2018.
In order to extract a set of sequence from FASTA-formatted file (both in nucleotides and peptides), several commands can be used to do so.
In recent years, I regularly use blastdbcmd
in NCBI BLAST suite. To run this command, the file must be indexed by makeblastdb
with the option below …
Continue reading »
Written by bonohu in shell on 金 23 3月 2018.
When I want to count the number of redundant words in a file (hoge.txt
), I have used simple Perl code like this(count.pl
).
| #!/usr/bin/perl
while(<>) {
my($word) = split;
$num{$word}++;
}
foreach (sort keys %num) {
print "$_\t$num … |
Continue reading »