tag:blogger.com,1999:blog-72237130613429253002024-03-14T08:34:17.408+05:30Bioinformatics Made Simple.comPriyanka Paulhttp://www.blogger.com/profile/17644327799399630455noreply@blogger.comBlogger176125tag:blogger.com,1999:blog-7223713061342925300.post-4927075488444599692020-09-02T23:06:00.007+05:302020-09-02T23:23:56.902+05:30How to create a 3D pie chart in R<div style="line-height: 30px; text-align: justify;">
Pie Chart, represented in the circular chart symbol, is easy to understand complex data. Each section of the circle shows the data value proportions. The Pie charts can be of two-dimensional view or three-dimensional views based upon the taste of the user. In this example, I am going to use R package plotrix to draw a 3D pie chart.
<br/>
<h3>
Prerequisite
</h3>
We need the following R libraries to run the script
<ul>
<li><b> plotrix</b></li>
</ul>
<h3>
File format
</h3>
<ul>
<pre style="border: 1px dashed rgb(153, 153, 153); color: black; font-family: "andale mono", "lucida console", monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>Category Number of genes Color
Transcription_O 24 #ed2f52
Transport_O 13 #efc023
Catalytic 31 #008080
Phosphorylation 5 #8FBC8B
Cell Wall 7 #AFEEEE
Defense 16 #CD853F
Secondary metabolites 4 #A0522D
Unknown 26 #9ACD32
Miscellaneous 28 #D8BFD8
Uncharacterized 10 #E6E6FA
</code></pre>
</ul>
<h3>
Script
</h3>
<script src="http://gist-it.appspot.com/https://github.com/sanjaysingh765/Generals/blob/master/3Dpie.R"></script>
<h3>
Result
</h3>
<div class="separator" style="clear: both;"><a href="https://1.bp.blogspot.com/-92KFpwUjfYU/X0_Z6iXh3nI/AAAAAAAACII/1lgIqZMO_Xk0aWY6nVUsRvhjjUFUhjhWACLcBGAsYHQ/s1800/diet3d.png" style="display: block; padding: 1em 0; text-align: center;"><img alt="" border="0" width="320" data-original-height="900" data-original-width="1800" src="https://1.bp.blogspot.com/-92KFpwUjfYU/X0_Z6iXh3nI/AAAAAAAACII/1lgIqZMO_Xk0aWY6nVUsRvhjjUFUhjhWACLcBGAsYHQ/s320/diet3d.png"/></a></div>
<div style="border-bottom: 1px solid rgb(204, 204, 204); border-top: 1px solid rgb(204, 204, 204); color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">How to add function descriptions to FASTA sequences <a href="http://bioinformatics-made-simple.blogspot.com/2019/08/how-to-add-function-descriptions-to.html">HERE</a></div>
</div></div>संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-90587733496144999012020-04-09T00:11:00.001+05:302020-09-02T23:10:34.250+05:30KEGG Sequence Downloader : retrieve gene sequences in Fasta format from KEGG database<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
I wanted to download the gene sequence of <a href="http://www.bioinformatics-made-simple.com/search/label/NCBI">tobacco from NCBI</a>. Since NCBI also contains the isoform and some other unwanted genes, therefore I choose to get it from KEGG. Although KEGGREST is a wonderful <a href="http://www.bioinformatics-made-simple.com/search/label/R">R package</a> to retrieve the data from KEGG, but it limits the retrieval. The following bash script can help to download the thousands of sequences in a single go without any limitation. Although this is a crude solution and there must be an efficient way to do it but it worked for me. Basically, this bash script works in three steps:<br />
<ul>
<li>Split IDs in a given chunk </li>
<li>Download fasta sequences as HTML file </li>
<li> Clean HTML file and save the result
</li>
</ul>
<h3>
Uses</h3>
<pre class="brush:perl">bash KEGG_sequence_downloader.sh query_file number_of_sequence
</pre>
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
How to download only viridiplantae miRNA from miRBase <a href="http://www.bioinformatics-made-simple.com/2016/01/how-to-download-only-viridiplantae.html">HERE</a></div>
<h3>
Script</h3>
<br />
<center>
<table border="1px" style="width: 100%px;">
<tbody>
<tr>
<th align="middle" bgcolor="#db3341"><span style="color: #f2edf2;">Script name</span></th>
<th align="middle" bgcolor="#db3341"><span style="color: #f2edf2;">Download</span></th>
</tr>
<tr>
<td align="middle" bgcolor="#e9e9f2"><span style="color: #0e040f; font-size: large;">KEGG_sequence_downloader.sh</span></td>
<td align="middle"><a href="https://github.com/sanjaysingh765/KEGG_sequence_downloader" target="_blank"><img src="https://lh5.googleusercontent.com/-Xe4gdBTZel0/UWyDos_-aGI/AAAAAAAABK4/oIzSC2EyOdo/s144/readmore1.png" /></a></td>
</tr>
</tbody></table>
</center>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-82428318237574572882020-01-22T06:22:00.004+05:302020-09-02T23:15:15.027+05:30Easiest way to find number of cluster in gene expression data<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
Gene clustering is a common method to find the groups of the gene with similar expression patterns. However, it is not always easy to decide the number of clusters in the whole datasets. The following R script uses the most popular methods for determining the optimal clusters. This R script uses "<i><b>TF_average.csv</b></i>" as input and saves the result as "<i><b>optial_cluster.png</b></i>".<br />
<br />
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
How to perform Non-metric multidimensional scaling (NMDS) analysis script <a href="http://www.bioinformatics-made-simple.com/2017/10/how-to-perform-non-metric.html">HERE</a></div>
<h3>
Prerequisite
</h3>
We need the following R libraries to run the script
<br />
<ul>
<li><b>factoextra</b></li>
<li><b>NbClust</b></li>
</ul>
<h3>
optimal_cluster_finder.R
</h3>
<br />
<script src="http://gist-it.appspot.com/https://github.com/sanjaysingh765/Generals/blob/master/optimal_cluster_finder.R"></script>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-48649392769691949892020-01-18T00:09:00.002+05:302020-01-22T06:26:23.764+05:30Easiest way to calculate Ka Ks ratio and divergence time<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
The Ka/Ks ratio is used to estimate the nature of evolution among neutral, purifying selection and beneficial mutations acting on a set of homologous protein-coding genes. Although there are several wonderful programs out there to calculate the Ka, Ks and Ka/Ks ratio but <i><b>kaks</b></i> function of R package <i><b>seqinr</b></i> is the easiest way to do it. The function<i><b> kaks</b></i> use the method published by Li et al (J. Mol. Evol., 36:96-99, 1993) to calculate the Ka, Ks and Ka/Ks ratio.
In the following script take aligned the <a href="http://www.bioinformatics-made-simple.com/2012/07/perl-script-7-how-to-extract-fasta.html" target="_blank">fasta </a>sequences in the clustal format and, finally, R package <i><b>seqinr</b></i> will be used to calculate the Ka, Ks and Ka/Ks ratio.
<br />
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Cheat Sheet to Install and work with R on Ubuntu <a href="http://www.bioinformatics-made-simple.com/2019/02/cheat-sheet-to-install-and-work-with-r.html">HERE</a></div>
<h3>
Prerequisite
</h3>
We need the following R libraries to run the script
<br />
<ul>
<li><b>seqinr</b></li>
<li><b>ape</b></li>
<li><b>phangorn</b></li>
</ul>
<h3>
Ka/Ks_calculator.R
</h3>
<br />
<script src="http://gist-it.appspot.com/https://github.com/sanjaysingh765/Generals/blob/master/Ka_Ks_calculator.R"></script>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-70732442816912500382019-09-10T21:47:00.003+05:302020-01-18T01:06:14.319+05:30Draw a heatmap with Custom Symbol in Cell<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://2.bp.blogspot.com/-38G7sdMwzgM/XXfIq6T6S6I/AAAAAAAAB8Q/tWMVe6Ro_jwJu6feQ4y8xXgEh4XSocmXACLcBGAs/s1600/Expression.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="750" height="256" src="https://2.bp.blogspot.com/-38G7sdMwzgM/XXfIq6T6S6I/AAAAAAAAB8Q/tWMVe6Ro_jwJu6feQ4y8xXgEh4XSocmXACLcBGAs/s320/Expression.png" width="320" /></a></div>
<a href="http://www.bioinformatics-made-simple.com/2018/04/how-to-make-heatmap-with-multiple.html" target="_blank">Heatmap </a>is a good way to save some space when you want to compose a figure with lots of panels. I got some gene expression data which were supposed to insert in a big figure. Although I can easily create a bar graph for that, but I choose the draw a heatmap to save some space. The easiest way was to create this <a href="http://www.bioinformatics-made-simple.com/2014/07/easiet-way-to-create-heat-map-in-excel.html" target="_blank">heatmap by the Excel </a>but I choose ggplot2 R package to draw the heatmap because it was easy to handle the big data and customize the annotation. My aim was to draw the heatmap and annotate the cell where the difference of gene expression is statistically significantly from the control. I choose star (*) to show the cells which are significantly different from control.<br />
<h3>
Prerequisite </h3>
We need the following R libraries to run the script<br />
<ul>
<li>ggplot2 </li>
<li>plyr </li>
<li>scales
</li>
</ul>
</div>
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Cheat Sheet to Install and work with R on Ubuntu <a href="http://www.bioinformatics-made-simple.com/2019/02/cheat-sheet-to-install-and-work-with-r.html">HERE</a></div>
<h3>
Sample data</h3>
<br />
<script src="http://gist-it.appspot.com/https://github.com/sanjaysingh765/Generals/blob/master/expression.csv"></script>
<br />
<h3>
ggplot2_heatmap.R</h3>
<br />
<script src="http://gist-it.appspot.com/https://github.com/sanjaysingh765/Generals/blob/master/ggplot2_heatmap.R"></script>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-71373202451490765042019-08-28T23:36:00.002+05:302019-09-10T21:48:38.866+05:30How to add function descriptions to FASTA sequences<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
Short descriptions in fasta sequence help us to quickly gain insight into important information about a sequence. Automatic assignment of Human Readable Descriptions (AHRD) is popularly used to add select descriptions and Gene Ontology terms that are concise, informative and precise. This bash script run DIAMOND to search homology to three different databases (TAIR, uniprot_sprot, and uniprot_trembl), then run AHRD and, finally perform the addition of new information fasta description.<br />
<br />
<h3>
USES </h3>
<br />
<blockquote class="tr_bq">
bash run_AHRD.sh fasta_input
</blockquote>
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Extract Part of a FASTA Sequences with Position by python script <a href="http://www.bioinformatics-made-simple.com/2013/10/actually-i-have-hundreds-of-protein.html">HERE</a></div>
<h3>
Dependencies</h3>
<br />
<div>
<ul>
<li>Install AHRD</li>
<li>Install DIAMOND (move to <b>dist</b> directory)</li>
<li>Download database sequences for TAIR, uniprot_sprot, and uniprot_trembl. make the DIAMOND blast database and name them uniprot_sprot, arabidopsis, and uniprot_trembl and move the file to a new directory<b> database.</b></li>
<li>Download and decompress the <b>resources.tar </b>in the working directory<b>.</b></li>
</ul>
</div>
</div>
<br />
<h3>
run_AHRD.sh</h3>
<br />
<script src="http://gist-it.appspot.com/https://github.com/sanjaysingh765/Generals/blob/master/run_AHRD.sh"></script>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-50249242143818771202019-08-26T21:34:00.001+05:302019-08-28T23:37:48.186+05:30How to rename fasta headers according to a matching name list<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<a href="http://http//www.bioinformatics-made-simple.com/2012/02/fabox-best-free-fasta-file-editor.html">FaBox</a> has several utilities to manipulate the FASTA sequence. I wanted to replace the FASTA header with the new header or description which are saved in a file. Although I can do it with FaBox, but it handles difficult when the number of sequences is huge. This PERL script will rename the fasta sequence as per store in another file.<br />
<h3 style="text-align: left;">
Header</h3>
Header and new FASTA header should be separated by TAB
<br />
<pre style="border: 1px dashed #999999; color: black; font-family: "andale mono" , "lucida console" , "monaco" , "fixed" , monospace; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>M54089d protein1
M54089c protein2
M54089b protein3
M54089a protein4
</code></pre>
<br />
<h3 style="text-align: left;">
Sequence </h3>
FASTA should be in one line
<br />
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Convert Multi line Fasta file into a Single line FASTA File <a href="http://www.bioinformatics-made-simple.com/2012/02/perl-script-2-convert-multi-fasta-file.html">HERE</a></div>
<pre style="background-color: #eeeeee; border: 1px dashed #999999; color: black; font-family: "andale mono" , "lucida console" , "monaco" , "fixed" , monospace; font-size: 12px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>>M54089d
MEQCRQGSRQNGSVTSGKGLALRAGHGGPSPEPVGCRWTARAAPAARAGRRVPAGGRTGNGSFGGLPRASHSQLRTGTDKGNPTV
>M54089c
MINFDHLFACLHGHYGEVENKLKCILHYFGRICSSMPLGYVSFERKVLSLECTPSCIPYPKEKAWSQSNISLCPIEITISGLIEDQSREAIEVDFANMYLGGGALVRGCVQQEEIRFMINPELIAGMLFLPCMADNEAVEIVGTERFSSYTGRLTKHFVASWINSSVISINSFSKMMASWDFNMIKMLKTPVEGPLLIFCRLVILQLHLKKLRKHRKTS
>M54089b
MIGRADIEGSKSNVAMNAWLHKPVIPVVTFLTPLASNSEGLKIVRPRFHGSYSYWKSESNELLPSVPHEISVRVELILGHLRYLLTDVPPQPNSPPDNVFRRIGLQASLGSKKRGSAPLPLHGISKITLEVVVFHFRLSAPTYTTPLKSFTKSD
>M54089a
MNGLTRFHCPCLLSSETTAKGTGLAESAGKEDPVELDSSRLCEMT
</code></pre>
<br />
<h3 style="text-align: left;">
Script </h3>
This PERL script will ask for header list and FASTA sequences (file format given above) and save the FASTA file with new header in <b>result.fasta</b>
<br />
<script src="http://gist-it.appspot.com/https://github.com/sanjaysingh765/Generals/blob/master/fasta_header_replace.pl"></script>
If you are working with unix based system, then this AWK one-liner will be very useful
<pre style="font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; color: #000000; font-size: 12px;border: 1px dashed #999999;line-height: 14px;padding: 5px; overflow: auto; width: 100%"><code>awk 'FNR==NR{ a[">"$1]=$2;next}$1 in a{ sub(/>/,">"a[$1]"|",$1)}1' header_list.txt sequence.fasta
</code></pre>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-41065410271734336862019-08-16T20:06:00.003+05:302019-08-26T21:43:21.918+05:30How to get gene expression value from Arrayexpress<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="line-height: 30px; text-align: justify;">
ArrayExpress has a wonderful <a href="https://bioconductor.org/packages/release/bioc/html/ArrayExpress.html" target="_blank">R package </a>for data search, download and analysis but it doesn't always work in perfection. Therefore it always good to have an alternative to download the raw data/CEL and analyze it. This R script will simply take the accession number in command line argument and save the expression data in a file named as <span style="color: #008800; font-family: "bitstream vera sans mono" , "courier" , monospace; font-size: 16px; text-align: left;">Gene_expression.txt.</span><br />
<blockquote class="tr_bq">
<span style="color: #a5a4a4; font-style: italic; font-weight: bold; text-align: center;">How to download expression data set from NCBI GEO </span><a href="http://www.bioinformatics-made-simple.com/2018/03/how-to-download-expression-data-set.html" style="font-style: italic; font-weight: bold; text-align: center;">HERE</a></blockquote>
</div>
</div>
<h3 style="text-align: left;">
Prerequisite </h3>
<br />
We need the following R libraries to run the script<br />
<ul style="text-align: left;">
<li>ArrayExpress </li>
<li>aff</li>
</ul>
<h3 style="text-align: left;">
Uses </h3>
<br />
<blockquote class="tr_bq">
Rscript script_name accession_number
</blockquote>
<h3 style="text-align: left;">
Limitations </h3>
<div>
<ul style="text-align: left;">
<li>This script is written specifically for Arabidopsis data sets, therefore, you have to modify it as per your requirement. </li>
</ul>
</div>
<br />
<h3 style="text-align: left;">
Script </h3>
<br />
<script src="http://gist-it.appspot.com/https://github.com/sanjaysingh765/Generals/blob/master/Arrayexpress_analysis.R"></script>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-64350652025183479012019-07-25T17:49:00.001+05:302019-08-16T20:10:22.854+05:30Easiest way to download multiple sequences from NCBI<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
NCBI and me have shared several tricks to download large set of sequence from database <a href="https://www.ncbi.nlm.nih.gov/guide/howto/dwn-records/">HERE</a> and <a href="http://www.bioinformatics-made-simple.com/2012/07/some-easy-ways-to-download-multiple.html">HERE</a>, respectively. In this post. I am going to share another easy way to download multiple sequences from NCBI. This script will take the file accession list ( one accession number in each line) and download sequence in individual files. Finally, concatenate those files in a single <a href="http://www.bioinformatics-made-simple.com/2012/02/perl-script-2-convert-multi-fasta-file.html">multiline fasta</a> file and delete them.
<div style="text-align: center; font-style:italic; color: #a5a4a4; font-weight:bold; border-top: 1px solid #ccc; border-bottom: 1px solid #ccc; padding:30px; margin:30px; ">How to BLAST multiple sequences against NCBI database using PERL script <a href="http://www.bioinformatics-made-simple.com/2012/02/perl-script-4-how-to-blast-multiple.html">HERE</a></div>
<pre class="brush:perl">
#!/bin/bash
while read i; do curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${i}&rettype=fasta&retmode=txt">$i.fasta; done < id.list
#find . -name '*.fasta' -exec cat {} \; >protein.fas
cat *.fasta >protein.fas
rm *.fasta
</pre>
<br />
</div>
</div>संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-49368157760814064662019-02-24T00:48:00.001+05:302019-08-26T21:43:14.612+05:30How to perform parallel BLAST<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-zGRIEYyZs9A/TrA2w4NOvZI/AAAAAAAAACY/qYDBWNdtZQw/s1600/ncbi_blast.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="100" data-original-width="200" src="https://2.bp.blogspot.com/-zGRIEYyZs9A/TrA2w4NOvZI/AAAAAAAAACY/qYDBWNdtZQw/s1600/ncbi_blast.gif" /></a></div>
<a href="http://www.bioinformatics-made-simple.com/2011/11/sequence-similarity-search-i-basic.html" target="_blank">BLAST</a> can be time-consuming especially when it includes a large number of the query sequence. Therefore, parallel BLAST can be useful. This Bash script can perform parallel BLASt in three common steps<br />
<ul>
<li>Split fasta file into files with a given number of sequences </li>
<li>BLAST them in parallel fashion</li>
<li>Combine their result</li>
</ul>
Since it doesn't include any additional software except <a href="http://www.bioinformatics-made-simple.com/2012/06/how-to-install-ncbi-blast-on-window-7.html" target="_blank">BLAST+</a>, therefore, it is easy to use. If you want to parse your BLAST result you can always use <a href="http://www.bioinformatics-made-simple.com/2012/07/ncbi-blast-parser-extract-query-and.html" target="_blank">NCBI BLAST parser from here</a>.
<h2>
<b>USES</b></h2>
This bash script require following commands<br />
<b>script_name: </b>Name of given script<br />
<b>blast_type: </b>What kind of BLAST you want to perform e.g. blastn<br />
<b>query_file: </b>Name of query file<br />
<b>database_name: </b>Name of the database<br />
<b>number_of_sequence_each_file: </b>how many sequences you want per query file
<h2>SCRIPT</h2>
<pre class="brush:perl">#!/bin/bash
########################################################################################################################
#
# split fasta file into files with given number of sequences
# blast them in parallel fashion
# combine their result
########################################################################################################################
prog="$1"
query="$2"
database="$3"
num_seq="$4"
if [[ $# -lt 3 ]] ; then
printf "\033[1;31mGive me a proper command\033[0m\n"
printf "\033[1;31mUsage: script_name blast_type query_file database name number_of_sequence_each_file\033[0m\n\n"
exit;
else
start=`date +%s`
#split fasta sequences in given number of sequences per file
awk -v a_seq=$num_seq 'BEGIN {n_seq=0;} /^>/ {if(n_seq%a_seq==0){file=sprintf("myseq%d.fa",n_seq);} print >> file; n_seq++; next;} { print >> file; }' < $query
#run blast
ls *.fa | parallel -a - $prog -query {} -db $database -out {.}.out -evalue 0.001 -num_descriptions 1 -num_alignments 1 -num_threads 8
cat *.out >$query.blast.result
while true; do
read -p "Do you want to delete intermediate files?" yn
case $yn in
[Yy]* ) rm -f *.out *.fa; break;;
[Nn]* ) exit;;
* ) echo "Please answer yes or no.";;
esac
done
runtime=$((end-start))
printf "\033[1;31mYour analysis was done in $((($(date +%s)-$start)/60)) minutes\033[0m\n"${reset}
printf "\033[1;31mTHANKS\033[0m\n"${reset}
fi
</pre>
<br /></div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-71844244573174273882019-02-08T22:09:00.000+05:302019-11-06T02:17:05.725+05:30Cheat Sheet to Install and work with R on Ubuntu<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
Tips and tricks to Install and work with R on Ubuntu<br />
<h3>
Run these commands in terminal</h3>
<pre class="brush:perl">#Update and Install R
sudo apt-get update
sudo apt-get install r-base r-base-dev
sudo apt-get upgrade
# Know R version
R --version
</pre>
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Extract Part of a FASTA Sequences with Position by python script <a href="http://www.bioinformatics-made-simple.com/2013/10/actually-i-have-hundreds-of-protein.html">HERE</a></div>
<h3>
Run these commands in <span style="color: red;">R</span> terminal</h3>
<pre class="brush:perl"># Know R version
sessionInfo()
# Know version of specific packages
packageVersion("ggplot2")
# check installed packages
installed.packages()
# list all packages where an update is available
old.packages()
# update all available packages
update.packages()
# update, without prompts for permission/clarification
update.packages(ask = FALSE)
update.packages(checkBuilt = TRUE, ask = FALSE, repos = "https://cran.rstudio.com")
# update only a specific package use install.packages()
install.packages("ggplot2")
# install packages from bioconductor
source("https://bioconductor.org/biocLite.R")
biocLite("ComplexHeatmap")
# install multiple packages from bioconductor
source("https://bioconductor.org/biocLite.R")
biocLite("ComplexHeatmap", "ggplot2")
# install packages from github
library(devtools)
install_github("jokergoo/ComplexHeatmap")
# start library
library('ggplot2')
</pre>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-70778914147779390702019-01-28T00:06:00.005+05:302019-02-24T00:52:42.551+05:30Get multiple strings from a file and replace them in another file with AWK<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<div style="text-align: center;">
<h3 style="text-align: left;">
Get multiple strings from a file and replace them in another file</h3>
</div>
I have multiple strings (old strings) and their sustitution (new strings) in tab limited format in a file named as <b>string</b>. I want to replace them in another file named as <b>Inputfile </b>and save in <b>Outfile. </b></div>
</div>
<h3>
Script</h3>
<pre class="brush:perl">awk '
NR==FNR {
old[NR] = $1
gsub(/&/,RS,$2)
new[NR] = $2
next
}
{
for (i=1; i in old; i++) {
gsub(old[i],new[i])
}
gsub(RS,"\\&")
print
}
' string Inputfile >Outfile
</pre>
<br />
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Extract Part of a FASTA Sequences with Position by python script <a href="http://www.bioinformatics-made-simple.com/2013/10/actually-i-have-hundreds-of-protein.html">HERE</a></div>
<h3>
Terminal screenshot</h3>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://2.bp.blogspot.com/-6T4G-GOF6ks/XEsoHddqJRI/AAAAAAAAB4Y/8wt1Jz97mXQzrvo_Mz8v1x4d5lCEu3WPgCLcBGAs/s1600/string.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="506" data-original-width="774" src="https://2.bp.blogspot.com/-6T4G-GOF6ks/XEsoHddqJRI/AAAAAAAAB4Y/8wt1Jz97mXQzrvo_Mz8v1x4d5lCEu3WPgCLcBGAs/s1600/string.png" /></a></div>
<br />संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-36702693550559951562019-01-25T00:18:00.000+05:302019-08-16T20:09:41.461+05:30Examples For Sed Linux Command In Text Manipulation and File Handling<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<br />
<script language="JavaScript">
function show(c) {
if (document.getElementById && document.getElementById(c)!= null)
node = document.getElementById(c).style.display='';
else if (document.layers && document.layers[c]!= null)
document.layers[c].display = '';
else if (document.all)
document.all[c].style.display = '';
}
function hide(c) {
if (document.getElementById && document.getElementById(c)!= null)
node = document.getElementById(c).style.display='none';
else if (document.layers && document.layers[c]!= null)
document.layers[c].display = 'none';
else if (document.all)
document.all[c].style.display = 'none';
}
</script>
<center><a href="" class="myButton">Click on red strip to expand it </a></center><br/>
<style>
.myButton {
-moz-box-shadow: 3px 4px 0px 0px #8a2a21;
-webkit-box-shadow: 3px 4px 0px 0px #8a2a21;
box-shadow: 3px 4px 0px 0px #8a2a21;
background-color:#c62d1f;
-moz-border-radius:18px;
-webkit-border-radius:18px;
border-radius:18px;
border:1px solid #d02718;
display:inline-block;
cursor:pointer;
color:#ffffff;
font-family:Times New Roman;
font-size:17px;
font-weight:bold;
padding:7px 25px;
text-decoration:none;
text-shadow:0px 1px 0px #810e05;
}
.myButton:hover {
background-color:#f24437;
}
.myButton:active {
position:relative;
top:1px;
}
</style>
<br />
<table>
<tbody>
<tr><td align="middle" bgcolor="#db3341" span="" style="color: #f2edf2; font-size: 16; font-weight: bold;"><div onclick="show('1')">
1. Replace all occurrence </div>
</td></tr>
</tbody></table>
<div id="1" style="display: none;">
1. Replace all occurrence <b>Ram</b> in <b><gwmw class="ginger-module-highlighter-mistake-type-1" id="gwmw-15484243819174273741082">Inputfile</gwmw></b> with <b>Shyam</b> and save the result in <b>Outfile</b>
<br />
<pre class="brush:perl">sed 's/Ram/Shyam/' Inputfile >Outfile</pre></div>
<table>
<tbody>
<tr><td align="middle" bgcolor="#db3341" span="" style="color: #f2edf2; font-size: 16; font-weight: bold;"><div onclick="show('2')">
2. Replace all occurrence of multiple string </div>
</td></tr>
</tbody></table>
<div id="2" style="display: none;">
2. Replace all occurrence <b>Ram</b> and <b>Sita</b> in <b><gwmw class="ginger-module-highlighter-mistake-anim ginger-module-highlighter-mistake-type-1" id="gwmw-15484243826440093207744">Inputfile</gwmw></b> with <b>Shyam</b> and <b>Geeta</b> respectively, and save the result in <b>Outfile</b>
<br />
<pre class="brush:perl">sed -e 's/Ram/Shyam/; s/Sita/Geeta/' Inputfile >Outfile</pre>
</div>
<table>
<tbody>
<tr><td align="middle" bgcolor="#db3341" span="" style="color: #f2edf2; font-size: 16; font-weight: bold;"><div onclick="show('3')">
3. Reading sed commands from a file </div>
</td></tr>
</tbody></table>
<div id="3" style="display: none;">
3. <b>Reading Commands From a File</b>: I have multiple strings saved in <b>Inputfile</b> and want to replace them with multiple strings saved in a file <b>string</b> and save the in <b>Outfile</b>
<br />
<pre class="brush:perl">sed -f string Inputfile >Outfile</pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://4.bp.blogspot.com/-LkDW7CPVkUM/XEoZbqT4krI/AAAAAAAAB3o/TzZ_aDHKhc42-ZeDS3tFQeHJgbi4eMSewCLcBGAs/s1600/Screenshot%2Bfrom%2B2019-01-24%2B14-58-07.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="198" data-original-width="336" src="https://4.bp.blogspot.com/-LkDW7CPVkUM/XEoZbqT4krI/AAAAAAAAB3o/TzZ_aDHKhc42-ZeDS3tFQeHJgbi4eMSewCLcBGAs/s1600/Screenshot%2Bfrom%2B2019-01-24%2B14-58-07.png" /></a></div>
<h4></div>
<table>
<tbody>
<tr><td align="middle" bgcolor="#db3341" span="" style="color: #f2edf2; font-size: 16; font-weight: bold;"><div onclick="show('4')">
4. <b>Substituting Flags</b>: to control the replacement in file </div>
</td></tr>
</tbody></table>
<div id="4" style="display: none;">
4. <b>Substituting Flags</b>: There are 4 kinds of substitutions:</h4>
<ul>
<li>g, replace all occurrences. </li>
<li>A number, the occurrence number for the new text that you want to substitute. </li>
<li>p, print the original content. </li>
<li>w file: means write the results to a file.</li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://1.bp.blogspot.com/-16ABLPKzEUM/XEodYhYpyRI/AAAAAAAAB4A/W_oZCn3nZVUdrqRJdwePDsU5ZlstUq_kQCLcBGAs/s1600/Screenshot%2Bfrom%2B2019-01-24%2B15-15-44.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-width="783" src="https://1.bp.blogspot.com/-16ABLPKzEUM/XEodYhYpyRI/AAAAAAAAB4A/W_oZCn3nZVUdrqRJdwePDsU5ZlstUq_kQCLcBGAs/s1600/Screenshot%2Bfrom%2B2019-01-24%2B15-15-44.png" /></a></div></div>
<br />
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Extract Part of a FASTA Sequences with Position by python script <a href="http://www.bioinformatics-made-simple.com/2013/10/actually-i-have-hundreds-of-protein.html">HERE</a></div>
</div>
<table>
<tbody>
<tr><td align="middle" bgcolor="#db3341" span="" style="color: #f2edf2; font-size: 16; font-weight: bold;"><div onclick="show('5')">
5. <b>Limiting sed in a file </b></div>
</td></tr>
</tbody></table>
<div id="5" style="display: none;">
5. <b>Limiting sed in a file</b>: By default, Sed command processes your entire file. However, you can limit the sed command to process your file in different ways:<br />
<br />
<ul style="text-align: left;">
<li>A range of lines. </li>
- following command will replace string in <b>second line</b> only <pre class="brush:perl">sed '2s/Ram/Shyam/' Inputfile >Outfile</pre>
<li>A pattern that matches a specific range.</li>
- following command will replace string in <b>second and third line</b> only <pre class="brush:perl">sed '2,3s/Ram/Shyam/' Inputfile >Outfile</pre>
- Following command will replace string in <b>second line to the end of the file</b> only <pre class="brush:perl">sed '2,$s/Ram/Shyam/' Inputfile >Outfile</pre>
</ul></div>
<table>
<tbody>
<tr><td align="middle" bgcolor="#db3341" span="" style="color: #f2edf2; font-size: 16; font-weight: bold;"><div onclick="show('6')">
6. <b>Delete Lines in a file with sed </b></div>
</td></tr>
</tbody></table>
<div id="6" style="display: none;">
<b>6. Delete Lines in a file with <b>sed </b></b>: The <b>delete (d) flag</b> deletes the text from the stream, not the original file.<br />
- following command will delete <b>second line</b></div>
<pre class="brush:perl">sed '2d' Inputfile >Outfile</pre>
- following command will delete <b>second and third line</b>
<br />
<pre class="brush:perl">sed '2,3d' Inputfile >Outfile</pre>
- following command will delete <b>second line to the end</b> of the file.<br />
<pre class="brush:perl">sed '2,$d' Inputfile >Outfile</pre>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-34544950580322509212018-07-05T19:59:00.000+05:302019-01-28T00:07:46.120+05:30How to compare multiple sets using UpsetR <div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://1.bp.blogspot.com/-0hTj0N0hQrY/Wz4s-0ad7YI/AAAAAAAAB0Q/IzB-jEd_wH4VREsKHXmVQBn3dhYhcm5ngCLcBGAs/s1600/output.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="706" data-original-width="1600" height="141" src="https://1.bp.blogspot.com/-0hTj0N0hQrY/Wz4s-0ad7YI/AAAAAAAAB0Q/IzB-jEd_wH4VREsKHXmVQBn3dhYhcm5ngCLcBGAs/s320/output.png" width="320" /></a></div>
<h1>
Why UpSet</h1>
Everyday I face the problems that need to understand the relationships between sets. Ven diagram always a great job if the number of sets is limited (like up to 4)but it gets clumsy when the number of sets increases. A Venn diagram with multiple sets is difficult to interpret and easy to be lost. So <b><a href="https://www.ncbi.nlm.nih.gov/pubmed/28645171" target="_blank">UpSet</a></b> is another "visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections".<br />
<h1>
How to use UpSet</h1>
The source code of Python implementation of UpSet can be downloaded from <b><a href="https://github.com/ImSoErgodic/py-upset" target="_blank">HERE</a></b> while R version is <b><a href="https://github.com/hms-dbmi/UpSetR" target="_blank">HERE</a></b>. The web version of UpSet can be used from <a href="http://vcg.github.io/upset/?dataset=0&duration=1000&orderBy=subsetSize&grouping=groupByIntersectionSize&selection=" target="_blank"><b>HERE</b></a> or <b><a href="https://gehlenborglab.shinyapps.io/upsetr/" target="_blank">HERE</a></b>. Obviously web versions are easy to use for any project but unfortunately, our taste doesn't match always. I just modified two main functions which draw the main plots. New functions give more flexibility to the plot such as<br />
<ol>
<li>can automatically calculate the number of unique colours for each comparison.</li>
<li>colours of numbers can be changed and it doesn't have to be the same as bar color.</li>
<li> fonts are also changeable. </li>
</ol>
<h1>
Prerequisite</h1>
We need following R libraries to run the script<br />
<ol style="text-align: left;">
<li><b>UpSetR </b></li>
<li><b>ggplot2 </b></li>
<li><b>grid </b></li>
<li><b>RColorBrewer </b></li>
<li><b>extrafont</b></li>
</ol>
<h1>Downloads</h1>
All files can be downloaded from here<br />
<blockquote class="embedly-card">
<h4>
<a href="https://github.com/sanjaysingh765/UpsetR_modified">UpsetR_modified</a></h4>
UpsetR_modified - This repository contains the R script the make plot using UpsetR library. Two functions were modified to make the package more flexible.</blockquote>
<script async="" charset="UTF-8" src="https://raw.githubusercontent.com/embedly/embedly-jquery/master/jquery.embedly.js"></script>
</div>
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Extract Part of a FASTA Sequences with Position by python script <a href="http://www.bioinformatics-made-simple.com/2013/10/actually-i-have-hundreds-of-protein.html">HERE</a></div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-71078024152503538512018-05-19T01:22:00.000+05:302019-01-25T01:00:16.493+05:30How to make a group bar graph with error bars and split y axis <div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-3IJbiYcnZzI/Wv8wDXf5JOI/AAAAAAAABzY/4Z3lkEa0EmEL1zFogsZETDUN-yxNfssAQCLcBGAs/s1600/heatmap1.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src=https://4.bp.blogspot.com/-3IJbiYcnZzI/Wv8wDXf5JOI/AAAAAAAABzY/4Z3lkEa0EmEL1zFogsZETDUN-yxNfssAQCLcBGAs/s1600/heatmap1.jpg" width="320" height="256" data-original-width="1500" data-original-height="1200" /></a></div>I would like to draw a group bar graph with error bars and split y axis to show both smaller and larger values in same plot. Although plotrix has function to do that but I don't know how to moifiy their aweful looking graphs. I got a good solution <a href="http://sickel.net/blogg/?p=688"><b>HERE</b></a>. I just modified it as per my taste and requirement. It need <b>gplots</b>, <b>extrafont</b> and<b> RColorBrewer</b> and produce a high resolution beautiful chart.
<blockquote><a href="http://www.bioinformatics-made-simple.com/2017/10/how-to-perform-non-metric.html">How to perform Non-metric multidimensional scaling (NMDS) analysis</a></blockquote>
<script src="https://gist.github.com/sanjaysingh765/33932075a728a19320ff416d0b0ade85.js"></script>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-10083825900069242142018-04-11T02:05:00.005+05:302019-08-16T19:48:24.387+05:30How to make a Heatmap with multiple annotation<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<div class="separator" style="clear: both; text-align: center;"><a href="https://raw.githubusercontent.com/sanjaysingh765/complexheatmap/master/blog.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://raw.githubusercontent.com/sanjaysingh765/complexheatmap/master/blog.png" width="320" height="213" data-original-width="800" data-original-height="533" /></a></div>I was interested to make a heatmap with multiple annotation with least interference. It will create an heatmap with multiple annotation such as: genotype, treatment, gene, class. Legends related to all annotations are given right side of the heatmap. Please visit<a href="https://github.com/sanjaysingh765/complexheatmap"> this page</a> for all file related to this R script.
We need R libraries<b> extrafont</b>, <b>ComplexHeatmap</b>,<b> circlize</b> and <b>RColorBrewer</b> to run this script.<br />
<blockquote>
1. <a href="http://www.bioinformatics-made-simple.com/2014/07/easiet-way-to-create-heat-map-in-excel.html" target="_blank">Easiet Way to Create A Heat Map In Excel</a></blockquote>
<script src="http://gist-it.appspot.com/github/sanjaysingh765/complexheatmap/blob/master/complexheap.R"></script>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-61910769593280774032018-03-02T03:33:00.000+05:302018-04-11T02:07:48.192+05:30How to download expression data set from NCBI GEO<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
NCBI has provided a wonderful tool <b><a href="https://www.youtube.com/watch?v=EUPmGWS8ik0">GEO2R</a></b> to do analysis of microarray data sets but sometime I need the normalized data sets to check the expression of given probe across experiments. In this case I can use this given R script which can help to download the expression data for whole experiment in a simple text file that can be used as in excel or other similar program. This script assume that you have installed <b> GEOquery</b> at your machine. If GEOquery is not pre installed then run this command
<br />
<pre class="brush:perl">source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")</pre>
<br />
Script can be simply run from your terminal
<br />
<pre class="brush:perl">Rscript GEO_expression.R accession_number</pre>
<blockquote>
1. <a href="http://www.bioinformatics-made-simple.com/2016/11/how-to-download-all-sra-samples-or.html" target="_blank">Easiest Way to Download All Sra Samples or Multi Experiment file from NCBI SRA database</a></blockquote>
<script src="http://gist-it.appspot.com/github/sanjaysingh765/GEO_expression.R/blob/master/GEO_expression.R"></script>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-67923234608593768742017-10-10T23:11:00.002+05:302018-03-02T03:36:19.455+05:30How to perform Non-metric multidimensional scaling (NMDS) analysis<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<b>Requirements</b><br />
We need R libraries <b>vegan</b>, <b>ggplot2</b>, <b>extrafont</b> to run this script. <br /><br />
<b>R scripts</b><br />
<script src="http://gist-it.appspot.com/github/sanjaysingh765/NMDS_analysis/blob/master/NMDS.R"></script>
<br />
<br />
For complete result of analysis and raw file, please visit <b><a href="https://github.com/sanjaysingh765/NMDS_analysis">HERE</a></b>
<br />
<br />
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
How To Predict CRISPR-Cas9 target site in R <a href="http://www.bioinformatics-made-simple.com/2017/04/how-to-predict-crispr-cas9-target-site.html">HERE</a></div>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com2tag:blogger.com,1999:blog-7223713061342925300.post-39203962491053315532017-07-06T02:07:00.003+05:302017-10-10T23:12:44.848+05:30Easiest Way to Download All Sra Samples or Multi Experiment file from NCBI SRA database- II<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
Previously I shared a <a href="http://www.bioinformatics-made-simple.com/2016/11/how-to-download-all-sra-samples-or.html">easy way to download the data files from NCBI SRA database</a>. Although it took only <b>wget</b> to download the data files but it required lots of link editing. Here I am going to share another script which can download all files of any biostudy or a single file, depending upon the accession number provided to the script. This bash script will also simultaneously convert the native sra file into fastq files also. Hope it will be some help.
<h2>sra_download.sh</h2>
<pre class="brush:perl">
#!/bin/bash
id=$1
if [ "$#" -eq "0" ]
then
echo "No accession number provided"
exit 1
else
echo -n "Please wait...."
esearch -db sra -query $id | efetch --format runinfo | cut -d ',' -f 1 | grep SRR | xargs fastq-dump --split-files
echo
echo -n "Download complete...."
echo
fi
</pre>
Save the script as sra_download.sh and run as
<pre class="brush:perl"> bash sra_download.sh SRA_accession_number </pre>
You can use any kind of accession number including SRR3114162, PRJNA309373 or SRP068795. I have tested this script on Ubuntu and assume that <a href="ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/">Entrez Direct </a> and <a href="https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software">NCBI SRA Toolkit</a> is installed on your machine.
</div>
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
Extract Part of a FASTA Sequences with Position by python script <a href="http://www.bioinformatics-made-simple.com/2013/10/actually-i-have-hundreds-of-protein.html">HERE</a></div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-87797718059891726882017-04-28T23:45:00.002+05:302017-10-10T23:12:26.046+05:30How To Predict CRISPR-Cas9 target site in R<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
Although there are several <a href="http://www.bioinformatics-made-simple.com/2012/02/5-bioinformatics-link-directories-you.html">online bioinformatics tools</a> to predict the target site for CRISPR-Cas9 but I was looking for a offline solution. I am going to share a R script that use two R libraies <b>CRISPRseek</b> and <b>msa</b>. The best part is that it also predict the restriction enzyme sites in target region therefore it will help in downstream analysis of mutant screening. Hope this script will help some one. As always R script is heavily commented to get it easy.
<br />
<br />
<script src="http://gist-it.appspot.com/github/sanjaysingh765/crispr_target_prediction_in_R/blob/master/crispr_target_prediction.R"></script>
</div>
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
How To Perform Basic Multiple Sequence Alignments In R <a href="http://www.bioinformatics-made-simple.com/2017/03/how-to-perform-basic-multiple-sequence.html">HERE</a></div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-82355481479788718122017-03-20T01:41:00.000+05:302020-01-18T00:01:10.362+05:30How To Perform Basic Multiple Sequence Alignments In R<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
There are several servers out there to perform the multiple sequence alignment and visualization. I was looking for all in one <a href="http://bioinfo-tool.blogspot.com/p/online-bioinformatics-tools.html">offline tool</a> to perform the multiple sequence alignment and generate a publication quality image.
We can use <b><a href="https://bioconductor.org/packages/devel/bioc/vignettes/msa/inst/doc/msa.pdf">msa package</a></b> for both amino acid and DNA alignment. Following Rscript can be used for this purpose. Script is heavily commented to easy understand.Hope it will be helpful for others too.
<br /><br />
<script src="http://gist-it.appspot.com/github/sanjaysingh765/Pairwise_Sequence_Alignment_With_R/blob/master/alignment.R"></script>
</div>
<div style="text-align: center; font-style:italic; color: #a5a4a4; font-weight:bold; border-top: 1px solid #ccc; border-bottom: 1px solid #ccc; padding:30px; margin:30px; ">Sequence Similarity Search - I : Basic Local Alignment Search Tool (BLAST) <a href="http://www.bioinformatics-made-simple.com/2011/11/sequence-similarity-search-i-basic.html">HERE</a></div>
</div>संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-23753760911771349982016-11-17T20:46:00.000+05:302018-03-02T03:35:57.827+05:30Easiest Way to Download All Sra Samples or Multi Experiment file from NCBI SRA database<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<a href="http://www.ebi.ac.uk/ena">European Nucleotide Archive</a> is good place to start to download the raw fastq files. But it is not easy to download multiple run files from NCBI SRA database. I recently learn to download in relatively easy way. I want to download these four run together from NCBI SRA database : SRR122247,SRR122248,SRR122249, SRR122250. Format of basic url is like that<br />
<pre style="font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; color: #000000; font-size: 12px;border: 1px dashed #999999;line-height: 14px;padding: 5px; overflow: auto; width: 100%"><code>ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/{SRR|ERR|DRR}/<first 6 characters of accession>/<accession>/<accession>.sra
</code></pre>
Where
{SRR|ERR|DRR} should be either ‘SRR’, ‘ERR’, or ‘DRR’ and should match the prefix of the target .sra file
Described in details <a href="https://www.ncbi.nlm.nih.gov/books/NBK158899/" target="_blank"><b>HERE</b></a>.
So my final urls will look like this
<pre class="brush:perl">
ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR122/SRR122247/SRR122247.sra
ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR122/SRR122248/SRR122248.sra
ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR122/SRR122249/SRR122249.sra
ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR122/SRR122250/SRR122250.sra
</pre>
Now save this urls in a text file (url_list.txt, for example). Run the <b>wget</b> like this
<pre class="brush:perl">wget -i url_list.txt</pre>
<blockquote>run <b>sudo apt-get install wget</b> from terminal if you don't have wget</blockquote>
</div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-83181577453256783212016-06-15T20:37:00.001+05:302017-04-15T07:36:10.098+05:30Journal Citation Reports 2015 / 2016<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-OkIZqJbWNRU/T_IP8-2jutI/AAAAAAAAAuE/CIVrmz-0TnA/s1600/journal+impact+factor+2012.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="213" src="https://4.bp.blogspot.com/-OkIZqJbWNRU/T_IP8-2jutI/AAAAAAAAAuE/CIVrmz-0TnA/s320/journal+impact+factor+2012.jpg" width="320" /></a></div>
<div style="text-align: justify;">
<span style="font-size: large;">Recently, Thomson Reuter announced <span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white; line-height: 18px; text-align: -webkit-auto;">Journal Citation Reports®</span><span style="background-color: white; line-height: 18px; text-align: -webkit-auto;"> for year 2016. Actually </span></span><span style="background-color: white; font-family: "times" , "times new roman" , serif; line-height: 18px;">Journal Citation Reports®</span><span style="background-color: white; font-family: "times" , "times new roman" , serif; line-height: 18px;"> </span><span style="background-color: white; text-align: -webkit-auto;"><span style="font-family: "times" , "times new roman" , serif;"><span style="line-height: 18px;">offers a systematic means to critically evaluate the world's leading journals, with quantifiable, statistical information based on citation data by compiling articles' cited references. Thus in scientific community journal impact is now considering as a quality measure of published work. JCR is actually contains the journal impact factor, number of times they cited in different journals and other </span><span style="line-height: 17.27272605895996px;">statistical</span><span style="line-height: 18px;"> values. So this list of journal impact factors is made by the data gathered throughtout 2015 that is why you can also say it journal impact factor 2015 download. You can view this SCI </span></span></span><span style="background-color: white; font-family: "times" , "times new roman" , serif; line-height: 17.27272605895996px;">journal impact factor 2015</span> <span style="background-color: white; text-align: -webkit-auto;"><span style="font-family: "times" , "times new roman" , serif;"><span style="line-height: 18px;"> in other tab by clicking <b><a href="https://drive.google.com/file/d/0B-PwDYXWDBdgNURaeGRyU1U1YkE/view?usp=sharing" target="_blank">HERE</a> or you can download </b></span></span></span><span style="background-color: white; font-family: "times" , "times new roman" , serif; line-height: 17.27272605895996px;">journal impact factor 2015 <b><a href="https://drive.google.com/uc?export=download&id=0B-PwDYXWDBdgNURaeGRyU1U1YkE" target="_blank">HERE</a></b></span></span></div>
<br />
<br />
<br />
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0tag:blogger.com,1999:blog-7223713061342925300.post-56054491948040276972016-01-19T23:08:00.003+05:302017-04-15T07:35:42.791+05:30How to download only viridiplantae miRNA from miRBase<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
There is no direct way to download the organism specific miRNA from miRBase database. So I extracted the miRNA of viridiplantae plant from miRBase using some unix command. Steps are as follows<br />
<ul>
<li> Download the information regarding organisms from <a href="ftp://mirbase.org/pub/mirbase/CURRENT/organisms.txt.gz" target="_blank"><b>HERE</b></a>.</li>
<li>Download the mature miRNA sequence from <a href="ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz" target="_blank"><b>HERE</b></a></li>
<li>Extract both files in same directory</li>
<li>Download the fasta dereplicating python script from <b><a href="https://drive.google.com/uc?export=download&id=0B7s04L2fqhwDTXY1UjFTOHAybjg" target="_blank">HERE</a></b> </li>
<li>Now run the bash script given from the same directory</li>
<pre class="brush:perl">#!/bin/bash
#script to extact plant mirna from mirbase database
# convert fasta to tab
awk 'BEGIN{RS=">"}{gsub("\n"," ",$0); print ">"$0}' mature.fa >mature.tab
#extract the organisms belong to Viridiplantae. You can extract the miRNA for other
# organism too by changing the word "Viridiplantae"
grep Viridiplantae organisms.txt >plants_mirbase.txt
# extract name of plants
awk '{ print $3 " " $4 }' plants_mirbase.txt >plant_name.txt
#extract mirna for plants
grep -f plant_name.txt mature.tab >plant_mirna.tab
#convert tab to fasta
awk '{print ""$1" "$2" "$3" "$4" "$5"\n"$6}' plant_mirna.tab > plant_mirna.rna
#convert RNA to DNA
sed '/^[^>]/ y/uU/tT/' plant_mirna.rna >plant_mirna.fasta
#dereplicate mirna file
python derep.py -i plant_mirna.fasta
#cleaning fasta header
cat derep_plant_mirna.fasta | awk -F ';' '{print $1}' >plant_mature_mirna_unique.fasta
rm mature.tab
rm plants_mirbase.txt
rm plant_mirna.tab
rm plant_mirna.rna
rm plant_name.txt
rm derep_plant_mirna.fasta
echo mature mirna from all plants are in plant_mirna.fasta!!!
echo unique mature mirna from all plants are in plant_mature_mirna_unique.fasta!!!
echo all job done!!!
</pre>
</ul>
Basically the above bash script extract the miRNA from plant deposited to miRBase database and save them to a file <b>plant_mirna.fasta. </b>In second part, it remove the duplicate miRNAs and save them in another file<b> plant_mature_mirna_unique.fasta.</b></div>
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
How to remove duplicate sequences from FASTA file <a href="http://www.bioinformatics-made-simple.com/2013/04/how-to-remove-duplicate-sequences-from.html" target="_blank">HERE</a></div>
</div>
संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com3tag:blogger.com,1999:blog-7223713061342925300.post-40034377380740423692015-12-28T04:32:00.000+05:302016-06-15T20:42:52.128+05:30BLAST Database creation error<div dir="ltr" style="text-align: left;" trbidi="on"><a name='more'></a>
<div style="line-height: 30px; text-align: justify;">
<a href="http://2.bp.blogspot.com/-zGRIEYyZs9A/TrA2w4NOvZI/AAAAAAAAACY/qYDBWNdtZQw/s1600/ncbi_blast.gif" imageanchor="1"><img border="0" src="http://2.bp.blogspot.com/-zGRIEYyZs9A/TrA2w4NOvZI/AAAAAAAAACY/qYDBWNdtZQw/s320/ncbi_blast.gif" /></a>
I was trying to create a <a href="http://www.bioinformatics-made-simple.com/2013/05/how-to-blast-multiple-sequences-against.html" target="_blank">BLAST database</a> but I got this error
<br />
<pre class="brush:perl">Building a new DB, current time: 12/03/2015 09:44:18
New DB name: plant_protein
New DB title: /home/sanjay/bin/Genomes/plant_protein_from_plantgdb.fa
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
volume: plant_protein
file: plant_protein.pin
file: plant_protein.phr
file: plant_protein.psq
BLAST Database creation error: FASTA-Reader: No residues given</pre>
Then I looked whether my any FASTA sequence is empty or not by running this command
<br />
<pre class="brush:perl">grep -c "^$" ~/bin/Genomes/plant_protein_from_plantgdb.fa</pre>
I found that there is one sequence which have only FASTA header. To remove the <a href="http://www.bioinformatics-made-simple.com/2013/09/how-to-remove-all-empty-fasta-sequences.html" target="_blank">empty FASTA</a> sequence I run this command
<br />
<pre class="brush:perl">awk 'BEGIN {RS = ">" ; FS = "\n" ; ORS = ""} $2 {print ">"$0}' ~/bin/Genomes/plant_protein_from_plantgdb.fa >~/bin/Genomes/plant_protein_from_plantgdb.fasta</pre>
And finally I got the happy success message
<br />
<pre class="brush:perl">Building a new DB, current time: 12/03/2015 09:48:01
New DB name: plant_protein
New DB title: /home/sanjay/bin/Genomes/plant_protein_from_plantgdb.fasta
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 980219 sequences in 24.2583 seconds.
</pre>
<br />
<div style="border-bottom: 1px solid #ccc; border-top: 1px solid #ccc; color: #a5a4a4; font-style: italic; font-weight: bold; margin: 30px; padding: 30px; text-align: center;">
How to install <a href="http://www.bioinformatics-made-simple.com/2012/02/5-server-for-batch-blast-against-ncbi.html" target="_blank">NCBI BLAST</a> program on your computer <a href="http://www.bioinformatics-made-simple.com/2012/06/how-to-install-ncbi-blast-on-window-7.html">HERE</a></div>
</div>
</div>संजयhttp://www.blogger.com/profile/13208510103131669624noreply@blogger.com0