Kevin's GATTACA World

HGVS nomenclature

2021-10-05T18:10:00.006+08:00

Was recently asked about HGVS nomenclature reporting. The fun thing about biology is that there's going to be exceptions to the rule or some shenanigans that you didn't expect when setting out a rule.

"The Human Genome Variation Society (HGVS) provides standardized recommendations for describing human sequence variants, which are widely accepted in the scientific community, especially in the practice of clinical molecular pathology.1 Use of the HGVS nomenclature system is a de facto recommendation for clinical reporting of sequence variants.2, 3 Being a core component of the clinical report, incorrect HGVS nomenclature can have a negative impact on patient care, such as misdiagnosis or clinical trial ineligibility. HGVS nomenclature has been traditionally computed manually by pathologists from Sanger sequencing electropherograms. However, manually computing HGVS nomenclature is time consuming, complex, and error prone, particularly with insertion and deletion (indel) variants, resulting in inconsistencies across laboratories."

Source:Clinical Implementation and Validation of Automated Human Genome Variation Society (HGVS) Nomenclature System for Next-Generation Sequencing–Based Assays for Cancer

In the 25th Anniversary Special Issue of Human Mutation, Den Dunnen et al. (2016) publish an update of the Human Genome Variation Society (HGVS) recommendations for the description of sequence variants (http://www.HGVS.org/varnomen). One of the issues discussed is how widespread HGVS nomenclature is used and, when used, whether published variant descriptions correctly follow the recommendations. An EGFR (OMIM# 131550) lung cancer testing scheme assessed in January 2016 by the United Kingdom National External Quality Assessment Scheme (UK NEQAS) for Molecular Genetics demonstrates the current variability in the use and interpretation of the HGVS guidelines by diagnostic laboratories based across the globe.

Source: HGVS Nomenclature in Practice: An Example from the United Kingdom National External Quality Assessment Scheme

Shall explore this tool

hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update

https://github.com/biocommons/hgvs

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers | Nature Methods

2021-09-28T19:02:00.008+08:00

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers | Nature Methods

https://www.nature.com/articles/s41592-021-01254-9

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers | Nature Methods

The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not ...

www.nature.com

https://github.com/GoekeLab/bioinformatics-workflows

GitHub - GoekeLab/bioinformatics-workflows: minimal example implementations for bioinformatics workflow managers

Workflow managers provide an easy and intuitive way to simplify pipeline development. Here we provide basic proof-of-concept implementations for selected workflow managers. The analysis workflow is based on a small portion of an RNA-seq pipeline, using fastqc for quality controls and salmon for ...

github.com

Orientation Bias artifect

2021-09-27T17:46:00.002+08:00

strand bias and orientation bias – GATK (broadinstitute.org)

The read orientation artifact, also known as the orientation bias artifact, arises due to a chemical change in the nucleotide during library prep that results in, for example, G base-paring with A. This kind of artifact has a clear signature (e.g. C to A SNP that occurs predominantly for the middle C in the DNA sequence CCG), and it’s singlestranded in nature. Downstream, this artifact manifests as low allele fraction SNPs whose evidence for the alt allele consists almost entirely F1R2 reads or F2R1 reads. A read pair is F1R2 (forward 1st, reverse 2nd) if the sequence of bases in Read 1 maps to the forward strand of the reference (F1), and the sequence of Read 2 to the reverse strand
of the reference (R2). F2R1 is defined similarly

if someone has read the dragonbioit used guide in illumina, it just mentioned orientation bias, ignore the strand bias.

Benchmarking variants and comparing truth sets: List of useful tools and publications

2021-09-17T02:23:00.001+08:00

Just realised that other than vcf-compare and bedtools intersect

there's other options

https://github.com/RealTimeGenomics/rtg-tools

https://github.com/Illumina/hap.py

Also there's actually new variant callers ..

Molina-Mora, J.A., Solano-Vargas, M. Set-theory based benchmarking of three different variant callers for targeted sequencing. BMC Bioinformatics 22, 20 (2021). https://doi.org/10.1186/s12859-020-03926-3

Krishnan, V., Utiramerur, S., Ng, Z. et al. Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays. BMC Bioinformatics 22, 85 (2021). https://doi.org/10.1186/s12859-020-03934-3

Additional file 23: File 3

. verify_variants.py

Zook, Justin M et al. “An open resource for accurately benchmarking small variant and reference calls.” Nature biotechnology vol. 37,5 (2019): 561-566. doi:10.1038/s41587-019-0074-6

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`

hgvs.readthedocs.io/

hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update.
Wang M, Callenberg KM, Dalgleish R, Fedtsov A, Fox NK, Freeman PJ, Jacobs KB, Kaleta P, McMurry AJ, Prlić A, Rajaraman V, Hart RK.Hum Mutat. 2018 Dec;39(12):1803-1813. doi: 10.1002/humu.23615. Epub 2018 Sep 5.PMID: 30129167 Free PMC article.
- Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer.
  den Dunnen JT.Curr Protoc Hum Genet. 2016 Jul 1;90:7.13.1-7.13.19. doi: 10.1002/cphg.2.PMID: 27367167

A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature.
Hart RK, Rico R, Hare E, Garcia J, Westbrook J, Fusaro VA.Bioinformatics. 2015 Jan 15;31(2):268-70. doi: 10.1093/bioinformatics/btu630. Epub 2014 Sep 30.PMID: 25273102 Free PMC article.
VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions.
Freeman PJ, Hart RK, Gretton LJ, Brookes AJ, Dalgleish R.Hum Mutat. 2018 Jan;39(1):61-68. doi: 10.1002/humu.23348. Epub 2017 Oct 17.PMID: 28967166 Free PMC article.
Clinical Implementation and Validation of Automated Human Genome Variation Society (HGVS) Nomenclature System for Next-Generation Sequencing-Based Assays for Cancer.
Callenberg KM, Santana-Santos L, Chen L, Ernst WL, De Moura MB, Nikiforov YE, Nikiforova MN, Roy S.J Mol Diagn. 2018 Sep;20(5):628-634. doi: 10.1016/j.jmoldx.2018.05.006. Epub 2018 Jun 21.PMID: 29936258

Running Kraken2 and creating a Krona report

2021-07-15T22:37:00.002+08:00

Had to work with Ion Torrent BAMs for this but I think it's applicable to everything

Needed to run this on unmapped reads so running this first.

After that the next script is fairly simple

Will share the install when I have time. A major hiccup for me was realising not all pre-built db works with Kraken2

This command allows you to see what apps are consuming internet. ss -p

2021-05-13T00:53:00.002+08:00

This command allows you to see what apps are consuming internet.

ss -p

GenomSys Banks on MPEG-G Standard to Make Genome Analysis Mobile

2021-03-15T17:47:00.006+08:00

This software company did something that I didn't expect... putting the variant calling on the phone itself ..

"The variant calling is run directly on the phone, extracting the data from the file on your phone, processed on the phone, and only at the end the VCF file could be shared in the cloud for annotation and reporting by an accredited physician," Ascari said. The physician is necessary to assure that the result is of "diagnostic quality," he added.

I honestly expected CRAM & cloud based calling with encrypted exchange (made feasible with 5G mobile network) of bam reads with an app store like customised reports for what you want to know about your genetics.

Other than security concerns, I don't see why I would want variant calling on the phone though.

Future of Genomics: 10 bold predictions

What 5G mobile networks portends for the future of personal genomics

A novel BRCA2 splice variant identified in a young woman

2020-11-12T18:07:00.000+08:00

Sharing a newly published article where a novel c.682‐2delA variant involving the AG consensus at the 3′ end of BRCA2 intron 8 was detected. The case involved a 33‐year‐old Italian breast cancer patient belonging to a HBOC family (BRCAPro score: 88%) with no other known pathogenic BRCA mutation.

A novel BRCA2 splice variant identified in a young woman - Nicolussi - - Molecular Genetics & Genomic Medicine

de novo assembly of Ion Torrent Reads

2020-11-04T11:23:00.003+08:00

I am intrigued that this Genome assembly guide includes mention of SOLID and Ion Torrent ... although not much information is given on how to work on them for genome assembly.

That said, perhaps the SPADES plugin provided with the sequencer solves most of everyone's immediate needs ... Wondering how improve on evaluating the assemblies

IonCRAM: a reference-based compression tool for ion torrent sequence files

2020-10-29T21:03:00.003+08:00

IonCRAM: a reference-based compression tool for ion torrent sequence files https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487613/

IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long term archiving. For the BAM files, IonCRAM could achieve a space saving of about 43%. This space saving is superior to what achieved with the CRAM format by about 8–9%.

Future research for reducing the space consumption of the Ion Torrent BAM files would include the binning of the flow signal and quality values. The idea of binning was initially introduced by Illumina [27] to reduce the space consumption of the quality values. This initiative was immediately followed by intensive research to optimize the binning procedure and address its effect on the downstream analysis, especially on the variant calling step [28–31]. We think that the binning of flow signals and quality data of Ion Torrent would also be successful, provided that the manufacturer contribute to this research. We added an option to IonCRAM for binning the flow signals, in a similar way to the binning method implemented in [26], and measured its effect on compression (Supplementary File 1). We left the step for investigating the effect of this binning on the downstream analysis to further research.

It is worth mentioning that IonCRAM has not been only used for the test data in the paper, it has also been used to compress and backup thousands of files for the Saudi Human Genome Program. IonCRAM is an open source and it is available for free along with the related test data at the tool website http://ioncram.saudigenomeproject.com.

Static ip on Jessie Raspbian with dhcpcd.conf

2020-10-29T18:31:00.000+08:00

Static IP address templates for dhcpcd.conf

https://www.raspberrypi.org/forums/viewtopic.php?t=140252

######################################################
# TEMPLATE: A static IP address only when no DHCP
#
#           The profile name is arbitrary. Use "fred"
#           if you want. Not much we can put as
#           default servers, but set them up as
#           you usually would.
######################################################
interface eth0
fallback nodhcp

profile nodhcp
static ip_address=10.0.0.1/8
static routers=10.0.0.1
static domain_name_servers=10.0.0.1

Trying to do this

https://www.diyhobi.com/share-raspberry-pi-wifi-internet-ethernet/

Future of Genomics: 10 bold predictions

2020-10-29T18:26:00.005+08:00

Curious about research priorities and opportunities for human genomics for the next decade? You should read on.

The National Human Genome Research Institute (NHGRI) this week published its “Strategic vision for improving human health at The Forefront of Genomics” in the journal Nature.

The strategic vision culminates with 10 bold predictions for human genomics by 2030. Crafted to be both inspirational and aspirational, the predictions are intended to provoke thoughtful discussions (and even debate) about what might be possible in the coming decade.

I must say that I expected people to store their encrypted genome sequences on smartphones anduse 5G networks to launch on the fly analyses a few years back. Not sure if we will have that by 2030!

it's also a good time to review the 5 years prediction from 2015
Read about gold genomes and platinum genomes...

The 2020 NHGRI Strategic Vision is available online at genome.gov/2020SV.

FAAH-OUT This woman feels no pain!

2019-04-08T23:29:00.002+08:00

"A woman in Scotland can feel virtually no pain due to a mutation ...
At age 65, the woman sought treatment for an issue with her hip, which turned out to involve severe joint degeneration despite her experiencing no pain. At age 66, she underwent surgery on her hand, which is normally very painful, and yet she reported no pain after the surgery. Her pain insensitivity was diagnosed by Dr Devjit Srivastava, Consultant in Anaesthesia and Pain Medicine at an NHS hospital in the north of Scotland and co-lead author of the paper....
which the researchers have described for the first time and dubbed FAAH-OUT. "
https://www.sciencedaily.com/releases/2019/03/190327203450.htm

Journal Reference:

Abdella M. Habib, Andrei L. Okorokov, Matthew N. Hill, Jose T. Bras, Man-Cheung Lee, Shengnan Li, Samuel J. Gossage, Marie van Drimmelen, Maria Morena, Henry Houlden, Juan D. Ramirez, David L.H. Bennett, Devjit Srivastava, James J. Cox. Microdeletion in a pseudogene identified in a patient with high anandamide concentrations and pain insensitivity. British Journal of Anaesthesia, 2019; DOI: 10.1016/j.bja.2019.02.019

1-liner bash to rename spaces in filenames or folder names

2019-03-28T22:43:00.002+08:00

Source: https://stackoverflow.com/questions/2709458/how-to-replace-spaces-in-file-names-using-a-bash-script
#non-recursive method

for f in *\ *; do mv "$f" "${f// /_}"; done

HDD clean up; Does it spark joy?

2019-03-28T20:47:00.001+08:00

size:gigantic
in Windows Explorer to look for > 128 Mb sized files

Koala Genome assembled on AWS

2018-09-29T16:46:00.002+08:00

Excerpted from AWS blog

Five years ago, a research team led by Dr. Rebecca Johnson (Director of the Australian Museum Research Institute) set out to learn more about koala populations, genetics, and diseases. As a biologically unique animal with a limited appetite, maintaining a healthy and genetically diverse population are both key elements of any conservation plan. In addition to characterizing the genetic diversity of koala populations, the team wanted to strengthen Australia’s ability to lead large-scale genome sequencing projects.
Inside the Koala Genome
Last month the team published their results in Nature Genetics. Their paper (Adaptation and Conservation Insights from the Koala Genome) identifies the genomic basis for the koala’s unique biology.

This work was performed on AWS. The research team used cfnCluster to create multiple clusters, each with 500 to 1000 vCPUs, and running Falcon from Pacific Biosciences. All in all, the team used 3 million EC2 core hours, most of which were EC2 Spot Instances.

BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters

2018-09-11T20:06:00.003+08:00

https://academic.oup.com/bioinformatics/article/30/23/3402/207237

Justin Chu Sara Sadeghi Anthony Raymond Shaun D. Jackman Ka Ming NipRichard Mar Hamid Mohamadi Yaron S. Butterfield A. Gordon Robertson Inanç Birol

Bioinformatics, Volume 30, Issue 23, 1 December 2014, Pages 3402–3404,https://doi.org/10.1093/bioinformatics/btu558

Published:

20 August 2014

Abstract

Large datasets can be screened for sequences from a specific organism, quickly and with low memory requirements, by a data structure that supports time- and memory-efficient set membership queries. Bloom filters offer such queries but require that false positives be controlled. We present BioBloom Tools, a Bloom filter-based sequence-screening tool that is faster than BWA, Bowtie 2 (popular alignment algorithms) and FACS (a membership query algorithm). It delivers accuracies comparable with these tools, controls false positives and has low memory requirements.

Availability and implementaion:www.bcgsc.ca/platform/bioinfo/software/biobloomtools

JD: Sr. Software DevOps Engineer at Guardant Health

2018-03-20T00:57:00.003+08:00

https://jobs.smartrecruiters.com/GuardantHealth/743999667525776-sr-software-devops-engineer
Gotta love this line
“We wanted flying cars and instead we got 140 characters” is a much-repeated complaint about Silicon Valley. But with all due respect to flying cars, we believe that our mission is even more critical.

notable skills in the JD to pursue
Ansible / Chef
Docker

This paragraph sounds exactly like what I face on a daily basis

Your troubleshooting skills are excellent, and you enjoy a good daily challenge in supporting rapid growth and a diverse set of end user needs. You have the ability to maintain day to day support while running various key projects that move the business forward by automating and creating new tools that facilitate management of the environment.

Exploring the 1000 genome dataset with Hail on Amazon EMR and Amazon Athena

2018-02-23T00:14:00.001+08:00

Blog post from Roy Hasson

https://aws.amazon.com/blogs/big-data/genomic-analysis-with-hail-on-amazon-emr-and-amazon-athena/?nc1=b_rp

Genomics analysis has taken off in recent years as organizations continue to adopt the cloud for its elasticity, durability, and cost. With the AWS Cloud, customers have a number of performant options to choose from. These options include AWS Batch in conjunction with AWS Lambda and AWS Step Functions; AWS Glue, a serverless extract, transform, and load (ETL) service; and of course, the AWS big data and machine learning workhorse Amazon EMR.

For this task, we use Hail, an open source framework for exploring and analyzing genomic data that uses the Apache Spark framework. In this post, we use Amazon EMR to run Hail. We walk through the setup, configuration, and data processing. Finally, we generate an Apache Parquet–formatted variant dataset and explore it using Amazon Athena.

Job: DATA SCIENTIST--WATSON HEALTH-CAMBRIDGE, MA.

2017-12-08T12:02:00.001+08:00

15 months ago by

cshevlin • 40

Spotted this ad in Biostars .. https://www.biostars.org/p/210796/

The IBM Watson Health business division is now looking for talented individuals destined to usher in the next era of healthcare. We live in a moment of remarkable change and opportunity. The convergence of data and technology is transforming healthcare and life sciences organizations in every way. New roles are being created that never existed before to meet the demands of this transformation.

Link: http://ibm.biz/BdruNd

We are now looking for a Genomic Data Scientist to join our team.

You will have an opportunity to work directly with the team building new healthcare solutions using genomic analytics and serving oncologists, pathologists and other specialists caring for patients. You will help define, design, and build those solutions and apply your expertise to work in different analytical and statistical models.

Key Responsibilities: Develop tools to transform load and validate data Strategizes new uses for data and its interaction with data design Perform data studies of new and diverse data sources Find new uses for existing data sources Discover “stories” told by the data and presents them to other scientists and business managers Generate algorithms and create computer models**

Ideal Candidates will possess the following:

Candidates should foremost have a strong background in data mining and statistics. Hands-on background in programming and using databases and tools to mine data including practical experience in extracting, transforming and load data as well as developing statistical and analytical models. Candidates must have demonstrated capacity to adapt to demanding and high pressure projects and adaptability to client’s needs. Background on bioinformatics Experience in Healthcare or Life Sciences

Learn more about IBM Watson Health and what we are doing …. And apply Now to explore this opportunity with us!

*U.S. Department of Veterans Affairs Enlists IBM’s Watson in the War on Cancer Public-Private Partnership Will Help Doctors Scale Precision Medicine Access for up to 10,000 VA Cancer Patients http://www-03.ibm.com/press/us/en/pressrelease/50061.wss

IBM and New York Genome Center’s new cancer tumor repository aims to revolutionize treatment

IBM's Watson to help doctors devise optimal cancer treatment

Employment Type

Full-Time

Required Technical and Professional Expertise At least 2 years of experience in data mining At least 1 year of experience with one or more data/statistics tools, such as Python, R, SPSS, Perl At least 1 year of programming Demonstrated ability in effective communication skills

Fluent in English

Preferred Technical and Professional Experience 3 years of experience with one or more data/statistics tools such as Python, R, SPSS 1 year of experience with relational databases, such as DB2, NoSQL, etc

Meet Nephele: Harness the Power of the Cloud for Your Microbiome Data Analysis

2017-08-13T13:21:00.003+08:00

Nephele is a project from the National Institutes of Health (NIH) that brings together microbiome data and analysis tools in a cloud computing environment. It aims to address a major challenge facing researchers today — namely, analyzing, transferring, and storing biomedical "big data" — through the use of cloud-based resources

Why Use Nephele?

Liberating: Nephele enables you to break free from constraints imposed on high-throughput computational analysis
Simple: Nephele is designed to be a no-hassle, easy-to-use tool to support your research
Sophisticated: Nephele is the most intuitive, advanced and secure microbiome analysis platform designed by our experienced computational biologists and software development team to provide exceptional capability with little effort on your part
Fast: Nephele speeds up microbiome data analysis and paves the path to getting to your results
Economical: Nephele's on-demand, pay-as-you-go setup offers a cost-effective alternative to using of dedicated resources for your microbiome data analysis

Ready to get started? Visit https://nephele.niaid.nih.gov/ and enter your email address. Check your inbox for a message with the subject "Your Nephele Promotional Codes."

Stay in touch! Email nephele@mail.nih.gov with your questions and feedback. You can also visit our Google+ community page to connect with other researchers in the microbiome community (https://plus.google.com/communities/107278901311674483366).

Source: https://www.biostars.org/p/204081/

demo bam file Ion Torrent 314 chip of E. coli 400 bp run for download

2017-08-13T13:18:00.002+08:00

BAM file of B22-730 (314v2 E. coli 400 bp run)
Ion Torrent PGM 314v2 run with a mode read length of 400bp and per-base raw read accuracy >99%.

https://s3.amazonaws.com/ion-torrent/pgm/B22-730/B22-730.bam

Source: https://apps.thermofisher.com/apps/publiclib/#/datasets

Creating filtered fastq files of ONLY mapped reads from a BAM file

2017-08-02T17:01:00.002+08:00

Filtering BAM files for mapped or unmapped reads

To get the unmapped reads from a bam file use :

samtools view -f 4 file.bam > unmapped.sam, the output will be in sam

to get the output in bam use : samtools view -b -f 4 file.bam > unmapped.bam

To get only the mapped reads use the parameter 'F', which works like -v of grep and skips the alignments for a specific flag.

samtools view -b -F 4 file.bam > mapped.bam

Source: https://www.biostars.org/p/56246/ Sukhdeep Singh

To do this as efficiently as possible, using BBTools:

reformat.sh in=reads.sam out=mapped.fq mappedonly

Also, BBMap has a lot of options designed for filtering, so it can output in fastq format and separate mapped from unmapped reads, preventing the creation of intermediate sam files. This approach also keeps pairs together, which is not very easy using samtools for filtering.

bbmap.sh ref=reference.fa in=reads.fq outm=mapped.fq outu=unmapped.fq

Source: https://www.biostars.org/p/127992/ Brian Bushnell

Control a fleet of embedded unix systems (eg Raspberry Pi, Orange Pi) using saltstack

2017-04-12T13:55:00.000+08:00

HAHAHA I share the same name as a software project. Bizarre discovery today

https://github.com/unixbigot/kevin
Control a fleet of embedded unix systems (eg Raspberry Pi, Orange Pi) using saltstack

github-based, community-maintained list of cancer clinical informatics resources

2017-04-11T11:21:00.000+08:00

Sean Davis created a github-based, community-maintained list of cancer clinical informatics resources.
"Contributions are welcome!" https://lnkd.in/d-uphUc

For now, it's named as

ci4cc-informatics-resources

https://github.com/seandavi/ci4cc-informatics-resources