Tiny Perl script for formatting TSA entries

When we submit relatively large entries to DDBJ, we use Mass Submission System (MSS). We will use MSS for submitting transcriptome sequence assembly (TSA) to DDBJ.

After extracting the list of IDs from header lines of FASTA by command like

% perl -nle 'print $1 if(/^\>(\S+)/) hoge.fasta > id.txt'

tiny Perl script called TSA-SRA-writer.pl to write formatted text for DDBJ submission might be useful. User should note that only the latter (but huge) body of the formatted text is described by this script, and user needed to add the header part of that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#!/usr/bin/env perl
# TSA-SRA-writer.pl
# STDIN: list of IDs

# organism name
my $organism     = "Hoge hoge";
# tissue name
my $tissue_type  = "fuga";

while(<STDIN>) {
        chomp;
        my $seqid = $_;
        print "$seqid\tsource\t1..E\torganism\t$organism\n";
        print "\t\t\tmol_type\tmRNA\n";
        print "\t\t\ttissue_type\t$tissue_type\n";
        print "\t\t\tsubmitter_seqid\t$seqid\n";
        print "\t\t\tff_definition\t".'@@[organism]@@ RNA, @@[submitter_seqid]@@'."\n";
}

Written by bonohu in shell on 月 03 9月 2018.