Advertisement

Bioinformatics As Modern Tool in Forensic Science for Data Understanding & Investigation in Research

Research Article | DOI: https://doi.org/10.31579/2835-7957/005

Bioinformatics As Modern Tool in Forensic Science for Data Understanding & Investigation in Research

  • Pranav Kumar Ray *

In-Charge Forensic Science Laboratory, Jharkhand Raksha Shakti University, Ranchi Jharkhand.

*Corresponding Author: In-Charge Forensic Science Laboratory, Jharkhand Raksha Shakti University, Ranchi Jharkhand.

Citation: Pranav Kumar Ray (2022). Bioinformatics As Modern Tool in Forensic Science for Data Understanding & Investigation In Research. Clinical Reviews and Case Reports.1(1); DOI:10.31579/2835-7957/005

Copyright: © 2022 Pranav Kumar Ray, This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Received: 06 October 2022 | Accepted: 19 October 2022 | Published: 31 October 2022

Keywords: bioinformatics; DNA

Abstract

Modern day biology is witnessing data explosion with a vast amount of information generated from ongoing genome and sequencing projects. Abundance of data from genome sequences, functional genomics and another high throughput (HTP) technique with the potential of computing has led to rising of a new discipline namely ‘bioinformatics’. 

Bioinformatics is a young but fast-growing field for biological data collection, organization, interpretation and modeling. Tools and techniques for bioinformatics are derived from multidisciplinary combinations of varied disciplines from natural and physical sciences. Previously various disciplines were carved out as and when sufficient specialization was achieved. However, now bioinformatics is borne out of alliance between existing disciplines from life and non-life.  Bioinformatics encompasses new foundations for collection, organization and mining gene/ protein sequences, three dimensional structures and biochemical functions, for modeling biological processes of functioning cells. DNA sequencing performed on an industrial scale has produced a vast amount of data to analyze. Although the Human Genome Project is officially over, improvements in DNA sequencing continue to be made. The field of forensic science is increasingly based on biomolecular data and many European countries are establishing forensic databases to store DNA profiles of crime scenes of known offenders and apply DNA testing. 

Introduction

Paulien Hogeweg and Ben Hesper coined the term ‘Bioinformatics’ in 1978 referring to the study of information processes in biological systems. As an interdisciplinary field bioinformatics draws contributions from biology, chemistry, mathematics, statistics and computer science; to understand life and its processes. With the emergence of disciplines such as genetics, biochemistry, molecular biology, and structural biology, the focus of the study of ‘life’ shifted from the ‘macro’ properties to ‘micro’ properties. Bioinformatics and forensic DNA are inherently interdisciplinary and draw their techniques from statistics and computer science bringing them to bear on problems in biology and law.

The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information."

History

Computers have become essential in molecular biology time since protein sequences have become available. The first bioinformatics databases were constructed a few years after the first protein sequence became available. The first protein sequence reported was bovine insulin after the ground breaking work of Frederick Sanger in 1956. Early contributions to bioinformatics embrace comprehensive volumes of antibody sequences released in works of Elvin A. Kabat in 1970. During the journey from the discovery of DNA to be the source of genetic information and elucidation of double-helical arrangement of DNA molecule to the elucidation of human genome sequence and thereafter, bioinformatics has become an integral part of modern biology. Foundations of bioinformatics were laid in a breakthrough work by Margaret Oakley Day Hoff appropriately regarded as the ‘father of bioinformatics’. A pioneer in the field of bioinformatics’ Day Hoff assembled all sequence data information available to create the first bioinformatics database. Day Hoff compiled one of the first protein sequence databases initially published as ‘Atlas of Protein Sequence and Structure’ in the year 1965. Margaret Oakley Day Hoff pioneered methods of sequence alignment and molecular evolution. Among significant contributions of Day Hoff is the establishment of one-letter code for the amino acids. Research in 80s and early 90s focused primarily on development of value-added derived databases to understand the ‘sequence - structure - function’ relationship.

Chronological Developments in Bioinformatics:

  • 1902: Emil Hermann Fischer wins Nobel prize for showing that amino acids are linked and form proteins. 1911: Pheobus Aaron Theodore Lerene discovers RNA.
  • 1933: Electrophoresis technique for separating proteins in solution introduced by Tiselius.
  • 1941: George Beadle and Edward Tatum identify that genes make proteins. 
  • 1943: first true general- purpose electronic computer (ENIAC) was constructed at the University of Pennsylvania between 1943 and 1946. 
  • 1950: Edwin Chargaff finds base pairing rule for cytosine with guanine and adenine with thymine.
  • 1951: First compiler developed by Grace Murray Hopper. Hopper developed the A-0 for the UNIVAC I. She also helped create the COBOL programming language.
  • 1951: Linus Pauling and Robert Corey propose α-helix and β-sheet protein structure. 
  • 1953: Watson & Crick proposed the double helix structure for DNA based on X-ray crystallographic data obtained by Franklin & Wilkins.
  • 1954: Perutz's group develops heavy atom methods to solve the phase problem in protein crystallography. 
  • 1955: Frederick Sanger analysed sequence of first protein bovine insulin.
  • 1958: First integrated circuit constructed by Jack Kilby at Texas Instruments. Advanced Research Projects Agency (ARPA) formed in US.
  • 1962: Pauling's gave theory of molecular evolution. 
  • 1965: Margaret Day Hoff’s Atlas of Protein Sequences published. 
  • 1966: First bioinformatics system: Margaret Oakley Day Hoff created the first protein sequence database and came up with the PAM model of protein evolution.
  • 1968: Packet-switching network protocols are presented to ARPA.
  • 1970: details of Needleman-Wunsch algorithm for sequence comparison published.
  • 1971: E-mail program invented by Ray Tomlinson.
  • 1972: first recombinant DNA molecule created by Paul Berg, Herbert Boyer, and Stanley N. Cohen. 1973: Brookhaven Protein Data Bank announced. Robert Metcalfe from Harvard University describes ‘Ethernet’ in his Doctoral thesis. 
  • 1974: Vinton Gray ‘Vinton’ Cerf and Robert Elliot Kahn developed the concept of connecting networks of computers into an ‘internet’ and develop Transmission Control Protocol/Internet protocol; TCP/IP. Specification of Internet Transmission Control Program by Vinton Cerf, Yogen Dalal and Carl Sunshine, Network Working Group contains first use of the term internet, as shorthand for internetworking.
  • 1975: Microsoft Corporation is founded by Bill Gates and Paul Allen. Two-dimensional electrophoresis for separation of proteins on SDS -PAGE is combined with separation according to isoelectric points by P. H. O'Farrell.
  • 1976: Unix-to-Unix Copy Protocol developed at Bell Labs. E. M. Southern published details of Southern Blot technique of specific sequences of DNA.
  • 1977: Allan Maxam and Walter Gilbert; Frederick Sanger reports methods for DNA sequencing.
  • 1980: Complete gene sequence of first organism, a single stranded bacteriophage φX174 published. Multi-dimensional NMR for protein structure determination described by Wuthrich et. al.  Genetics Suite of programs for DNA and protein sequence analysis developed. 
  • 1981: Smith-Waterman algorithm for sequence alignment is published. IBM introduces its Personal Computer. 
  • 1982: Genetics Computer Group (GCG), created as a part of the University of Wisconsin, of Wisconsin Biotechnology Center. Gen Bank Released. 
  • 1983: Production of DNA clone (cosmid) libraries by Los Alamos National Laboratory (LANL) and Lawrence Livermore National Laboratory (LLNL). 
  • 1984: Jon Postel's Domain Name System placed on-line. Macintosh announced by Apple Computer. 1985: FASTP / FASTN algorithm published. ‘Genomics' coined by Thomas Roderick appears for the first time to describe the scientific discipline of mapping, sequencing, and analyzing genes. SWISS-PROT database created by Department of Medical Biochemistry, University of Geneva and European Molecular Biology Laboratory EMBL.  PCR reaction is described by Kary Mullis and co-workers.
  • 1986: automated sequencing technique by Leroy Hood.
  • 1987: Use of YAC’s yeast artificial chromosomes described by David T. Burke and coworkers. Physical map of E. coli is published by Y. Kohara and coworkers. PERL - Practical Extraction Report Language released by Larry Wall. 
  • 1988: National Centre for Biotechnology Information, NCBI created at NIH / NLM EMB net network for database distribution.

FASTA algorithm for sequence comparison is published by Pearson and Lipman. Telomere sequence having implications for aging and cancer research is identified at LANL. Human Genome Initiative is started.

  • 1990: BLAST program is implemented. InforMax is founded with company's products address sequence analysis, database and data management, searching, publication graphics, clone construction, mapping and primer design.
  • 1991: CERN research institute in Geneva announces the creation of the protocols which constitute the World Wide Web. Linus Torvalds announces a Unix-Like operating system which later becomes Linux creation. Use of expressed sequence tags ESTs described. Human chromosome mapping data repository, Genome Database GDB is established.
  • 1992: Low-resolution genetic linkage map of entire human genome published. Guidelines for data release and resource sharing announced by DOE and NIH.
  • 1993: International IMAGE Consortium established to coordinate efficient mapping and sequencing of gene-representing cDNAs.
  • 1994: Netscape Communications Corporation founded; releases a commercial version of NCSA's Mozilla. PRINTS database of protein motifs is published by Attwood and Beck. EMBL-EBI European Bioinformatics Institute established, Hinxton, UK. Completion of second-generation DNA clone libraries representing each human chromosome by LLNL and LBNL. 
  • 1995: Microsoft releases version 1.0 of Internet Explorer. Sun releases version 1.0 of Java. Sun and Netscape release version 1.0 of JavaScript. First non-viral whole genome sequenced for the bacterium Haemophilus influenzae. Sequence of smallest bacterium, Mycoplasma genitalium, completed; provides a model of the minimum number of genes needed for independent existence. Physical map with over 15,000 STS markers published.
  • 1996: Saccharomyces cerevisiae genome sequence completed. PROSITE database is reported by Bairoch et.al.

Affymetrix produces the first commercial DNA chips. The sequence of the human T-cell receptor region completed. Archaebacteria- Methanococcus jannaschii genome sequenced; confirms the existence of third major branch of life on earth. 

  • 1997: genome for E. coli published. 
  • 1998: genomes of Caenorhabitis elegans and baker's yeast are published. Swiss Institute of Bioinformatics is established as a non-profit foundation. Craig Venter forms Celera Genomics in Rockville, Maryland.
  • 1999: First Human chromosome 22 completely sequenced. 
  • 2000: Pseudomonas aeruginosa genome published. Arabidopsis thaliana genome sequenced. Drosophila melanogaster genome sequenced. International research consortium publishes chromosome 21 genome, the smallest human chromosome and the second to be completely sequenced. 2001: Human genome published. Human Chromosome 20 completely sequenced.
  • 2002: genome sequence of common house mouse 2.5 Gb published.
  • 2003: Human Genome Project completed. 
  • 2004: Rattus norvegicus Brown Norway laboratory rat draft genome sequence completed.

Importance of Bioinformatics: -

Understanding genetic diversity: Genetic diversity is the total number of genetic characteristics in the genetic makeup of a species, it ranges widely from the number of species to differences within species and can be attributed to the span of survival for a species.

 • Epidemiology: Epidemiology is the study (scientific, systematic, and data-driven) of the distribution (frequency, pattern) and determinants (causes, risk factors) of health-related states and events (not just diseases) in specified populations (neighborhood, school, city, state, country, global).

Vaccinology: Vaccinology is a field of microbiology and immunology covering vaccine development as well as the use of vaccines and their effects on animal health and public health. Developing vaccines is central to the control of infectious diseases of animals and new vaccines have the potential to reduce antibiotic use, prevent losses in livestock production and protect people from zoonotic infections. 

Global health : Global health is the health of populations in the global context; it has been defined as "the area of study, research and practice that places a priority on improving health and achieving equity in health for all people worldwide".

Metabolic reconstruction: A metabolic reconstruction provides a highly mathematical, structured platform on which to understand the systems biology of metabolic pathways within an organism. The integration of biochemical metabolic pathways with rapidly available, annotated genome sequences has developed what are called genome-scale metabolic models.

Systems biology: 

Systems biology is the computational and mathematical analysis and modeling of complex biological systems.

 • Personalized medicine: Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on their predicted response or risk of disease

Fields Related to Bioinformatics:

  • Computational Biology: Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, ecological, behavioural, and social systems. The field is broadly defined and includes foundations in biologyapplied mathematicsstatisticsbiochemistrychemistrybiophysicsmolecular biologygeneticsgenomicscomputer science, and evolution.
  • Genomics: Genomics is any attempt to analyze or compare the entire genetic complement of a species.
  • Proteomics: proteomics to be concerned with: "Qualitative and quantitative studies of gene expression at the level of the functional proteins themselves" that is: "an interface between protein biochemistry and molecular biology".
  • Pharmacogenomics: Pharmacogenomics is the application of genomic approaches and technologies to the identification of drug targets.
  • Pharmacogenetics: Pharmacogenetics is a subset of pharmacogenomics which uses genomic/bioinformatic methods to identify genomic correlates, for example SNPs (Single Nucleotide Polymorphisms), characteristic of particular patient response profiles and use those markers to inform the administration and development of therapies.
  • Cheminformatics: "The combination of chemical synthesis, biological screening, and data-mining approaches used to guide drug discovery and development"
  • Medical Informatics: Medical Informatics: "Biomedical Informatics is an emerging discipline that has been defined as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of medical information." Medical informatics is more concerned with structures and algorithms for the manipulation of medical data, rather than with the data itself.

Uses of Bioinformatics:

  • Store/retrieve biological information (databases) 
  • Retrieve/compare gene sequences
  • Predict function of unknown genes/proteins 
  • Search for previously known functions of a gene
  • Compare data with other researchers 
  • Compile/distribute data for other researchers

DNA testing According to Forensic bioinformatics basic task is to make advancement in setting up the forensic records which is useful to store rough draft of DNA of criminals that is taken from the crime scene and later presented for DNA testing (Ajay et al., 2012). Statistical and technological progressions i.e., learning algorithm based on machine learning, DNA microarray sequencing, and thin film transistor biosensors etc. are used to improve accuracy and authenticity of the results. Now a days genetic tests have been extensively used for detection of mass fatality and forensic evidence as well. A multidisciplinary panel including medical examiners, fingerprint professionals and forensic pathologists gather the data which is then incorporated with the results of genetic testing. 

Results and Discussion

Bioinformatics tools are very helpful in forensics but there is still need to be more careful while generating results from computational tools because at times there is discrepancy arise between set of statistical rules and biological reaction. As the most doubtful results produced in phylogeny reconstructions and Cluster W reconstructed alignment. It is also observed that correct alignments are generated from those sequences which are very closely related with the help of bootstrap method. At the same time, it is expected that the alignments which are produced from biological sequence sets produced inaccuracy in more than half of the alignments so such method is used to determine the constancy of tree topology but not give accurate phylogenetic tree. But with the passage of time, there is an improvement in results and the computational programs are becoming more consistent progressively. Parentage testing and family reunification is also something that comes under category of bioinformatics and forensics. 

Though it’s very useful but many people condemn such test as it interrupts their privacy. In last 20 years, the field of bioinformatics has become more advanced and the objective of production, as well as assemblage of various documentation and investigative tools, has been accomplished. Worldwide, public realm assets such as Gen Bank have become very crucial source for research purpose. Prasad (2008).

Currently, lives of millions of people globally influenced by the forensic DNA technology. This approach is still getting high rate of approval on universal level. Forensics played well in major events like in 9/11 activist assault, the victims were recognized through DNA profiles analysis. 

Nowadays Forensic DNA databases fast expansion put many questions on the standard of data related to placing and its maintenance, uncertainties related to its effectiveness and there are also chances of confidentiality violation of such huge private data Ge et al., (2014). On the other hand, in earlier period various types of transgressions put under DNA investigation and as a result numerous DNA profiles produced which become helpful to generate novel measures i.e. in Familial DNA Database Searching, find similarity between DNA profile of executor’s family member and the data collected from crime scene and the first victorious familial search was carried out in 2004 in UK that confirm Craig Harman is responsible of assassination but many countries are against to use this type of facts i.e. according to Germany viewpoint, it is important for each autonomous society to enjoy freedom and constitutional rights that’s why expansion of forensic database is discouraged Wallace et al., (2014). There is deficiency of funds, professionals, and data protection and also, there is insufficient guidance as well as improper apparatus. 

  • University of Veterinary and Animal Sciences Lahore (UVAS), 
  • Government College University Lahore (GCU) and 
  • University of Punjab (PU)

Initiated DNA forensics research. Center of Excellence in Molecular Biology (CEMB) is a committed laboratory started in 2005 and it deals with cases include crimes, catastrophes, and paternity clashes. Higher Education Commission (HEC) should focus on this field also by acquiring advance strategy in association with law enforcement institute facilitates the forensics. 

References

Clinical Trials and Clinical Research: I am delighted to provide a testimonial for the peer review process, support from the editorial office, and the exceptional quality of the journal for my article entitled “Effect of Traditional Moxibustion in Assisting the Rehabilitation of Stroke Patients.” The peer review process for my article was rigorous and thorough, ensuring that only high-quality research is published in the journal. The reviewers provided valuable feedback and constructive criticism that greatly improved the clarity and scientific rigor of my study. Their expertise and attention to detail helped me refine my research methodology and strengthen the overall impact of my findings. I would also like to express my gratitude for the exceptional support I received from the editorial office throughout the publication process. The editorial team was prompt, professional, and highly responsive to all my queries and concerns. Their guidance and assistance were instrumental in navigating the submission and revision process, making it a seamless and efficient experience. Furthermore, I am impressed by the outstanding quality of the journal itself. The journal’s commitment to publishing cutting-edge research in the field of stroke rehabilitation is evident in the diverse range of articles it features. The journal consistently upholds rigorous scientific standards, ensuring that only the most impactful and innovative studies are published. This commitment to excellence has undoubtedly contributed to the journal’s reputation as a leading platform for stroke rehabilitation research. In conclusion, I am extremely satisfied with the peer review process, the support from the editorial office, and the overall quality of the journal for my article. I wholeheartedly recommend this journal to researchers and clinicians interested in stroke rehabilitation and related fields. The journal’s dedication to scientific rigor, coupled with the exceptional support provided by the editorial office, makes it an invaluable platform for disseminating research and advancing the field.

img

Dr Shiming Tang

Clinical Reviews and Case Reports, The comment form the peer-review were satisfactory. I will cements on the quality of the journal when I receive my hardback copy

img

Hameed khan