BIOKIMIA DI ERA BIG DATA GENOMIK: TANTANGAN, APLIKASI DAN PELUANG INOVASI
Abstract
The completion of human genome project at beginning of 21st century with the advancement of computer technology has transformed Biochemistry into a genomic era. Further, it is accelerated by parallel and massive genome sequencing technology known as next generation sequencing (NGS) that enhances the identification of genetic variants associated with complex diseases such as cancer, diabetes and Alzheimer. Currently, this knowledge has been driving the development of precision and personalized medicine. Wisely applied, it is believed that the explosion of genomic big data can be of great use in advancing the diagnosis, therapy and drug discovery to combat complex diseases.
Downloads
References
diseases: open challenges and new opportunities,” EJIFCC, vol. 29, no. 1, pp. 4–14, 2018.
2 S. S. Jamuar and E.C. Tan, “Clinical application of next-generation sequencing for
Mendelian diseases,” Hum Genomics, vol. 9, no. 10, pp. 1-6, 2015.
3 M. Gwinn, D et al., “Next-Generation Sequencing of Infectious Pathogens,” JAMA, vol.
321, no. 9, pp. 893–894, 2019.
4 R. Kamps et al., “Next-Generation Sequencing in Oncology: Genetic Diagnosis, Risk
Prediction and Cancer Classification,” Int J Mol Sci, vol. 18, no. 2, pp. 1-57, 2017.
5 Ellard, S.; Franco, E. D. Next-Generation Sequencing for the Diagnosis of Monogenic
Diabetes and Discovery of Novel Aetiologies. Genet. Diabetes 2014, 23, 71–86.
6 V. V. Giau et al., “Genetic analyses of early-onset Alzheimer’s disease using next
generation sequencing,” Sci. Rep., vol. 9, no. 1, p. 1-10, 2019.
7 V. Marx, Biology: The big challenges of big data. Nature, vol. 498, no. 7453, pp. 255-260,
2013.
8 A. O’Driscoll et al, “‘Big data’, Hadoop and cloud computing in genomics,” J. Biomed. Inf,
vol. 46, no. 5, pp. 774–781, 2013.
9 D. E. Cane, “Back to Basics: Assigning Biochemical Function in the Post-Genomic Era,”
Chem. Biol., vol. 11, no. 6, pp. 741–743, 2004.
10 Quastel, J. H. The Development of Biochemistry in the 20th Century. Mol. Cell. Biochem.
1985, 69 (1), 17–26.
11 J. Gayon, “From Mendel to epigenetics: History of genetics,” C. R. Biol., vol. 339, no. 7–8,
pp. 225–230, 2016.
12 Prasad, C.; Galbraith, P. A. Sir Archibald Garrod and Alkaptonuria–‘Story of Metabolic
Genetics.’ Clin. Genet. 2005, 68 (3), 199–203.
13 Horowitz, N. H. One-Gene-One-Enzyme: Remembering Biochemical Genetics. Protein Sci.
1995, 4 (5), 1017–1019.
14 B. S. Strauss, “Biochemical Genetics and Molecular Biology: The Contributions of George
Beadle and Edward Tatum,” Genetics, vol. 203, no. 1, pp. 13–20, 2016.
15 G. W. Beadle and E. L. Tatum, “Genetic control of biochemical reactions in Neurospora,”
Proc. Natl. Acad. Sci. U. S. A., vol. 27, no. 11, p. 499-506, 1941.
16 Beadle, G. W. Genes and Chemical Reactions in Neurospora. Science 1959, 129 (3365),
1715–1719.
17 Cobb, M. 1953: When Genes Became “Information.” Cell 2013, 153 (3), 503–506.
18 J. D. Watson and F. H. C. Crick, “Molecular Structure of Nucleic Acids: A Structure for
Deoxyribose Nucleic Acid,” Nature, vol. 171, no. 4356, p. 737-738, 1953.
19 F. H. C. Crick et al., “General Nature of the Genetic Code for Proteins,” Nature, vol. 192,
no. 4809, p. 1227-1232, 1961.
20 Ingram, V. M. Gene Mutations in Human Haemoglobin: The Chemical Difference between
Normal and Sickle Cell Haemoglobin. Nature 1957, 180 (4581), 326–328.
21 V. M. Ingram, “Abnormal human haemoglobins: I. The comparison of normal human and
sickle-cell haemoglobins by ‘fingerprinting,’” Biochim. Biophys. Acta., vol. 28, pp. 539–
545, 1958.
22 S. E. Antonarakis and J. S. Beckmann, “Mendelian disorders deserve more attention,” Nat.
Rev. Genet., vol. 7, no. 4, p. 277-282, 2006.
23 E. Duncan, M. Brown, and E. M. Shore, “The Revolution in Human Monogenic Disease
Mapping,” Genes (Basel), vol. 5, no. 3, pp. 792–803, 2014.
24 A. E. Guttmacher and F. S. Collins, Welcome to the genomic era. N. Engl. J. Med, vol. 349,
no. 10, pp. 996-998 2003.
25 Collins, F. S.; Morgan, M.; Patrinos, A. The Human Genome Project: Lessons from Large-
Scale Biology. Science 2003, 300 (5617), 286–290.
26 I. H. G. S. Consortium, “Finishing the euchromatic sequence of the human genome,”
Nature, vol. 431, no. 7011, p. 931-945, 2004.
27 Chial, H. DNA Sequencing Technologies Key to the Human Genome Project. Nat. Educ.
2008, 1 (1), 219.
28 The Cost of Sequencing a Human Genome https://www.genome.gov/about-genomics/fact-
sheets/Sequencing-Human-Genome-cost (accessed Jul 19, 2019).
29 Nerlich, B.; Dingwall, R.; Clarke, D. D. The Book of Life: How the Completion of the
Human Genome Project Was Revealed to the Public. Health (N. Y.) 2002, 6 (4), 445–469.
30 L. Hood and D. Galas, “The digital code of DNA,” Nature, vol. 421, no. 6921, p. 444-448,
2003.
31 E. Birney, “The making of ENCODE: lessons for big-data projects,” Nature, vol. 489, no.
7414, p. 49-51, 2012.
32 E. P. Consortium, “An integrated encyclopedia of DNA elements in the human genome,”
Nature, vol. 489, no. 7414, p. 57-74, 2012.
33 E. D. Green et al, “Human Genome Project: Twenty-five years of big biology,” Nat. News,
vol. 526, no. 7571, p. 29-31, 2015.
34 Bayer, R.; Galea, S. Public Health in the Precision-Medicine Era. N. Engl. J. Med. 2015,
373 (6), 499–501.
35 Collins, F. S.; Varmus, H. A New Initiative on Precision Medicine. N. Engl. J. Med. 2015,
372 (9), 793–795.
36 Hamburg, M. A.; Collins, F. S. The Path to Personalized Medicine. N. Engl. J. Med. 2010,
363 (4), 301–304.
37 Langreth, R.; Waldholz, M. New Era of Personalized Medicine Targeting Drugs for Each
Unique Genetic Profile. The oncologist 1999, 4 (5), 426–427.
38 K. Offit, “Genomic profiles for disease risk: predictive or premature?,” JAMA, vol. 299, no.
11, pp. 1353–1355, 2008.
39 F. S. Collins et al, “A vision for the future of genomics research,” Nature, vol. 422, no.
6934, p. 835-847, 2003.
40 F. S. Collins and V. A. McKusick, “Implications of the Human Genome Project for medical
science,” JAMA, vol. 285, no. 5, pp. 540–544, 2001.
41 M. L. Metzker, “Sequencing technologies—the next generation,” Nat. Rev. Genet., vol. 11,
no. 1, p. 31-46, 2010.
42 S. C. Schuster, “Next-generation sequencing transforms today’s biology,” Nat. Methods,
vol. 5, no. 1, p. 16-18, 2007.
43 J. Shendure and H. Ji, “Next-generation DNA sequencing,” Nat. Biotechnol., vol. 26, no. 10,
p. 1135-1145, 2008.
44 S. T. Bennett et al, “Toward the $1000 human genome,” Pharmacogenomics, vol. 6, no. 4,
pp. 373-382, 2005.
45 E. Check Hayden, “Technology: the $1,000 genome,” Nat. News, vol. 507, no. 7492, p. 294-295, 2014.
46 Wolinsky, H. The Thousand-Dollar Genome. EMBO Rep. 2007, 8 (10), 900–903.
47 F. Ozsolak and P. M. Milos, “RNA sequencing: advances, challenges and opportunities,”
Nat. Rev. Genet., vol. 12, no. 2, p. 87-98, 2011.
48 B. Rabbani, M. Tekin, and N. Mahdieh, “The promise of whole-exome sequencing in
medical genetics,” J. Hum. Genet., vol. 59, no. 1, p. 5-15, 2014.
49 Schatz, M. C. Biological Data Sciences in Genome Research. Genome Res. 2015, 25 (10),
1417–1422.
50 Z. D. Stephens et al., “Big data: astronomical or genomical?,” PLoS Biol., vol. 13, no. 7, p.
e1002195, 2015.
51 Fan, J.; Han, F.; Liu, H. Challenges of Big Data Analysis. Natl. Sci. Rev. 2014, 1 (2), 293–
314.
52 Y. Qin et al, “The current status and challenges in computational analysis of genomic big
data,” Big Data Res., vol. 2, no. 1, pp. 12–18, 2015.
53 B. Langmead and A. Nellore, “Cloud computing for genomic data analysis and
collaboration,” Nat. Rev. Genet., vol. 19, no. 4, p. 208-2019, 2018.
54 C. Yang et al, “Big Data and cloud computing: innovation opportunities and challenges,”
Int. J. Digit. Earth, vol. 10, no. 1, pp. 13–53, 2017.
55 Avram, M.-G. Advantages and Challenges of Adopting Cloud Computing from an
Enterprise Perspective. Procedia Technol. 2014, 12, 529–534.
56 Calabrese, B.; Cannataro, M. Cloud Computing in Healthcare and Biomedicine. Scalable
Comput. Pract. Exp. 2015, 16 (1), 1–18.
57 J.-P. Ebejer et al, “The emerging role of cloud computing in molecular modelling,” J. Mol.
Graph. Model., vol. 44, pp. 177–187, 2013.
58 Pallis, G. Cloud Computing: The New Frontier of Internet Computing. IEEE Internet
Comput. 2010, 14 (5), 70–73.
59 Varghese, B.; Buyya, R. Next Generation Cloud Computing: New Trends and Research
Directions. Future Gener. Comput. Syst. 2018, 79, 849–861.
60 AWS. Amazon EC2 Pricing - Amazon Web Services https://aws.amazon.com/ec2/pricing/
(accessed Jul 5, 2019).
61 Katsonis, P.; Koire, A.; Wilson, S. J.; Hsu, T.-K.; Lua, R. C.; Wilkins, A. D.; Lichtarge, O.
Single Nucleotide Variations: Biological Impact and Theoretical Interpretation. Protein Sci.
2014, 23 (12), 1650–1666.
62 NIH. What are single nucleotide polymorphisms (SNPs)?
https://ghr.nlm.nih.gov/primer/genomicresearch/snp (accessed Jul 5, 2019).
63 D. B. Goldstein, “Common genetic variation and human traits,” N. Engl. J. Med., vol. 360,
no. 17, p. 1696-1698, 2009.
64 N. Deng et al, “Single nucleotide polymorphisms and cancer susceptibility,” Oncotarget,
vol. 8, no. 66, p. 110635-110649, 2017.
65 Mathers, J. C.; Hesketh, J. E. The Biological Revolution: Understanding the Impact of SNPs
on Diet-Cancer Interrelationships. J. Nutr. 2007, 137 (1), 253S–258S.
66 M. Cao et al, “A fast and accurate SNP detection method on the cloud platform,” in 2015
IEEE International Conference on Mechatronics and Automation (ICMA), 2015, pp. 2186–
2191.
67 R. J. Mashl et al., “GenomeVIP: a cloud platform for genomic variant discovery and
interpretation,” Genome Res., vol. 27, no. 8, pp. 1450–1459, 2017.
68 G. Minevich et al, “CloudMap: a cloud-based pipeline for analysis of mutant genome
sequences,” Genetics, vol. 192, no. 4, pp. 1249–1269, 2012.
69 Google Inc. TCGA Cancer Genomics Data in the Cloud — Google Genomics v1
documentation
https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/isb_cgc_data.html (accessed Jul 5, 2019).
70 Ng, P. C.; Henikoff, S. SIFT: Predicting Amino Acid Changes That Affect Protein Function.
Nucleic Acids Res. 2003, 31 (13), 3812–3814.
71 I. A. Adzhubei et al., “A method and server for predicting damaging missense mutations,”
Nat. Methods, vol. 7, no. 4, p. 248-249, 2010.
72 Choi, Y.; Chan, A. P. PROVEAN Web Server: A Tool to Predict the Functional Effect of
Amino Acid Substitutions and Indels. Bioinformatics 2015, 31 (16), 2745–2747.
73 Reva, B.; Antipin, Y.; Sander, C. Predicting the Functional Impact of Protein Mutations:
Application to Cancer Genomics. Nucleic Acids Res. 2011, 39 (17), e118–e118.
74 Ginalski, K. Comparative Modeling for Protein Structure Prediction. Curr. Opin. Struct.
Biol. 2006, 16 (2), 172–177.
75 Schwede, T.; Kopp, J.; Guex, N.; Peitsch, M. C. SWISS-MODEL: An Automated Protein
Homology-Modeling Server. Nucleic Acids Res. 2003, 31 (13), 3381–3385.
76 Webb, B.; Sali, A. Protein Structure Modeling with MODELLER. In Functional Genomics;
Springer, 2017; pp 39–54.
77 A. Hildebrand et al, “Fast and accurate automatic structure prediction with HHpred,”
Proteins Struct. Funct. Bioinforma., vol. 77, no. S9, pp. 128–132, 2009.
78 Kelley, L. A. Fold Recognition. In From Protein Structure to Function with Bioinformatics;
Springer, 2017; pp 59–90.
79 D. E. Kim et al, “Protein structure prediction and analysis using the Robetta server,” Nucleic
Acids Res., vol. 32, no. suppl_2, pp. W526–W531, 2004.
80 J. Lee et al, “Ab initio protein structure prediction,” in From protein structure to function
with bioinformatics, Springer, 2017, pp. 3–35.
81 A. Roy et al, “I-TASSER: a unified platform for automated protein structure and function
prediction,” Nat. Protoc., vol. 5, no. 4, p. 725-738, 2010.
82 F. Fratev et al, “Combination of genetic screening and molecular dynamics as a useful tool
for identification of disease-related mutations: ZASP PDZ domain G54S mutation case,” J.
Chem. Inf. Model., vol. 54, no. 5, pp. 1524–1536, 2014.
83 B. Kamaraj and R. Purohit, “In silico screening and molecular dynamics simulation of
disease-associated nsSNP in TYRP1 gene and its structural consequences in OCA3,”
BioMed Res. Int., vol. 2013, pp. 1-13, 2013.
84 Lybrand, T. P. Ligand—Protein Docking and Rational Drug Design. Curr. Opin. Struct.
Biol. 1995, 5 (2), 224–228.
85 Hillisch, A.; Hilgenfeld, R. The Role of Protein 3D-Structures in the Drug Discovery
Process. In Modern methods of drug discovery; Springer, 2003; pp 157–181.
86 Murata, K.; Wolf, M. Cryo-Electron Microscopy for Structural Analysis of Dynamic
Biological Macromolecules. Biochim. Biophys. Acta BBA-Gen. Subj. 2018, 1862 (2), 324–
334.
87 Shi, Y. A Glimpse of Structural Biology through X-Ray Crystallography. Cell 2014, 159
(5), 995–1014.
88 A. Y. Filatova et al., “Functional reassessment of PAX6 single nucleotide variants by in
vitro splicing assay,” Eur. J. Hum. Genet., vol. 27, no. 3, p. 488-493, 2019.
89 D. M. Fowler and S. Fields, “Deep mutational scanning: a new style of protein science,”
Nat. Methods, vol. 11, no. 8, p. 801-807, 2014.
90 P. Mali et al., “RNA-guided human genome engineering via Cas9,” Science, vol. 339, no.
6121, pp. 823–826, 2013.
91 Kalyaanamoorthy, S.; Chen, Y.-P. P. Structure-Based Drug Design to Augment Hit
Discovery. Drug Discov. Today 2011, 16 (17–18), 831–839.
92 C. Acharya et al, “Recent advances in ligand-based drug design: relevance and utility of the
conformationally sampled pharmacophore approach,” Curr. Comput. Aided Drug Des., vol.
7, no. 1, pp. 10–22, 2011.
93 Sheridan, R. P.; Kearsley, S. K. Why Do We Need so Many Chemical Similarity Search
Methods? Drug Discov. Today 2002, 7 (17), 903–911.
94 J. S. Mason et al, “3-D pharmacophores in drug discovery,” Curr. Pharm. Des., vol. 7, no. 7,
pp. 567–597, 2001.
95 J. Verma et al, “3D-QSAR in drug design-a review,” Curr. Top. Med. Chem., vol. 10, no. 1,
pp. 95–115, 2010.
96 A. Olğaç et al, “Cloud-Based High Throughput Virtual Screening in Novel Drug
Discovery,” in High-Performance Modelling and Simulation for Big Data Applications,
Springer, 2019, pp. 250–278.
97 AWS. 1000 Genomes Project and AWS https://aws.amazon.com/1000genomes/ (accessed
Jul 5, 2019).
98 Richards, D. The democratization of data in the cloud
https://www.infoworld.com/article/3090084/the-democratization-of-data-in-the-cloud.html
(accessed Jul 5, 2019).
99 J. R. Huyghe et al., “Discovery of common and rare genetic risk variants for colorectal
cancer,” Nat. Genet., vol. 51, no. 1, p. 76-87, 2019.
100 B. J. Raphael et al, “Identifying driver mutations in sequenced cancer genomes:
computational approaches to enable precision medicine,” Genome Med., vol. 6, no. 1, p. 1-
17, 2014.
101 R. Tian et al, “Computational methods and resources for the interpretation of genomic
variants in cancer,” BMC Genomics, vol. 16, no. 8, p. S7, 2015.
102 J. Zhang et al, “Identifying driver mutations from sequencing data of heterogeneous tumors
in the era of personalized genome sequencing,” Brief. Bioinform., vol. 15, no. 2, pp. 244–
255, 2013.
103 E. T. Cirulli and D. B. Goldstein, “Uncovering the roles of rare variants in common disease
through whole-genome sequencing,” Nat. Rev. Genet., vol. 11, no. 6, p. 415-425, 2010.
104 J. G. Taylor et al, “Using genetic variation to study human disease,” Trends Mol. Med., vol.
7, no. 11, pp. 507–512, 2001.
105 R. Bhattacharya et al, “Impact of genetic variation on three dimensional structure and
function of proteins,” PloS One, vol. 12, no. 3, p. e0171355, 2017.
106 D. F. Easton et al., “A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer–predisposition genes,” Am. J. Hum. Genet., vol. 81, no. 5, pp. 873–883, 2007.
107 F. Yencilek et al., “Apolipoprotein E Genotypes in Patients with Prostate Cancer,”
Anticancer Res., vol. 36, no. 2, pp. 707–711, 2016.
108 M. Hicks et al, “Functional characterization of 3D protein structures informed by human
genetic diversity,” Proc. Natl. Acad. Sci. U. S. A., vol. 116, no. 18, pp. 8960–8965, 2019.
109 A. Stein et al, “Biophysical and mechanistic models for disease-causing protein variants,”
Trends Biochem. Sci., vol. 44, no. 7, pp. 575-588, 2019.
110 A. M. Davis et al, “Application and limitations of X-ray crystallographic data in structure-
based ligand and drug design,” Angew. Chem. Int. Ed., vol. 42, no. 24, pp. 2718–2736, 2003.
111 T. L. Nero et al, “Protein structure and computational drug discovery,” Biochem. Soc.
Trans., vol. 46, no. 5, pp. 1367–1379, 2018.
112 L. Ponzoni and I. Bahar, “Structural dynamics is a determinant of the functional significance
of missense variants,” Proc. Natl. Acad. Sci. U. S. A., vol. 115, no. 16, pp. 4164–4169, 2018.
113 Starita, L. M.; Fields, S. Deep Mutational Scanning: A Highly Parallel Method to Measure
the Effects of Mutation on Protein Function. Cold Spring Harb. Protoc. 2015, 2015 (8),
114 M. F. Sentmanat et al, “A survey of validation strategies for CRISPR-Cas9 editing,” Sci.
Rep., vol. 8, no. 1, p. 888, 2018.
115 Saudale, F.Z., "Pemodelan Homologi Komparatif Struktur 3D Protein dalam Desain dan Pengembangan Obat," Al-Kimia., vol. 8, no.1, p.93, 2020
116 T. Wang et al., “Advances in computational structure-based drug design and application in
drug discovery,” Curr. Top. Med. Chem., vol. 16, no. 9, pp. 901–916, 2016.
117 B. Bordás et al, “Ligand‐based computer‐aided pesticide design. A review of applications of
the CoMFA and CoMSIA methodologies,” Pest Manag. Sci., vol. 59, no. 4, pp. 393–400,
2003.
118 G. Sliwoski et al, “Computational methods in drug discovery,” Pharmacol. Rev., vol. 66, no.
1, pp. 334–395, 2014.
119 S. J. Y. Macalino et al, “Role of computer-aided drug design in modern drug discovery,”
Arch. Pharm. Res., vol. 38, no. 9, pp. 1686–1701, 2015.
120 C. Acharya et al, “Recent Advances in Ligand-Based Drug Design: Relevance and Utility of
the Conformationally Sampled Pharmacophore Approach,” Curr. Comput. Aided. Drug.
Des, Vol. 7, no. 1, pp. 10-22, 2011.
121 J. Verma et al, “3D-QSAR in Drug Design - A Review,” Curr. Top. Med. Chem, vol. 10, no.
1, pp. 95-115, 2010.
122 D. Stumpfe and J. Bajorath, “Similarity searching,” Wiley Interdiscip. Rev.: Comput. Mol.
Sci., vol. 1, no. 2, pp. 260–282, 2011.
123 S.-Y. Yang, “Pharmacophore modeling and applications in drug discovery: challenges and
recent advances,” Drug Discovery Today, vol. 15, no. 11, pp. 444–450, 2010.
124 Irwin, J. J.; Shoichet, B. K. ZINC- a Free Database of Commercially Available Compounds
for Virtual Screening. J. Chem. Inf. Model. 2005, 45 (1), 177–182.
125 D. S. Wishart et al., “DrugBank: a comprehensive resource for in silico drug discovery and
exploration,” Nucleic Acids Res., vol. 34, no. suppl_1, pp. D668–D672, 2006.
126 A. Gaulton et al., “ChEMBL: a large-scale bioactivity database for drug discovery,” Nucleic
Acids Res., vol. 40, no. D1, pp. D1100–D1107, 2011.
127 M. Capuccini et al, “Large-scale virtual screening on public cloud resources with Apache
Spark,” J. Cheminformatics, vol. 9, no. 15, p. 1-6, 2017.
In order to publish in Chemistry Notes, authors are required to agree to the copyright permission stating that the authors give the publisher the right to reproduce, display or distribute the accepted manuscript. In this agreement the authors also automatically declare that the submitted manuscript is exempted from plagiarism issue and conflict of interest among the authors.