BIOINFORMATICS 2018 Abstracts


Full Papers
Paper Nr: 3
Title:

Parameter Learning for Spiking Neural Networks Modelled as Timed Automata

Authors:

Elisabetta De Maria and Cinzia Di Giusto

Abstract: In this paper we present a novel approach to automatically infer parameters of spiking neural networks. Neurons are modelled as timed automata waiting for inputs on a number of different channels (synapses), for a given amount of time (the accumulation period). When this period is over, the current potential value is computed considering current and past inputs. If this potential overcomes a given threshold, the automaton emits a broadcast signal over its output channel, otherwise it restarts another accumulation period. After each emission, the automaton remains inactive for a fixed refractory period. Spiking neural networks are formalised as sets of automata, one for each neuron, running in parallel and sharing channels according to the network structure. This encoding is exploited to find an assignment for the synaptical weights of neural networks such that they can reproduce a given behaviour. The core of this approach consists in identifying some correcting actions adjusting synaptical weights and back-propagating them until the expected behaviour is displayed. A concrete case study is discussed.

Paper Nr: 18
Title:

Loop-loop Interaction Metrics on RNA Secondary Structures with Pseudoknots

Authors:

Michela Quadrini and Emanuela Merelli

Abstract: Many methods have been proposed in the literature to face the problem of RNA secondary structures comparison.From a biological point of view, most of these methods are satisfactory for the comparison of pseudoknot free secondary structures, whereas the problem of pseudoknotted motifs comparison has not been solved yet. In this paper, we propose loop-loop interaction metrics, a new measure able to compute the distance of two pseudoknotted secondary structures by comparing loops and their interactions. The new measure is defined for RNA molecules whose structural and biological information is represented as algebraic expressions of hairpin loops, so that each RNA secondary structure can be represented as a word, which describes the interactions among loops and uniquely defines the intersection set, the set of pairs of loops that cross. Hence, the interaction metrics is defined as the symmetric set difference applied to the intersection sets of molecules. To illustrate how to apply the proposed methodology, we compare two RNA molecules, PKB66 and PKB10, extracted from Pseudobase++ database. To test the validity of the measure, we evaluated the evolutionary conservation of the pseudoknot domain of Vertebrate Telomerase RNA.

Paper Nr: 28
Title:

Protein Disorder Prediction using Jumping Motifs from Torsion Angles Dynamics in Ramachandran Plots

Authors:

Jonny Alexander-Uribe, Julián D. Arias-Londoño and Alexandre Perera-Lluna

Abstract: Disordered proteins are functional proteins that do not fold in a fixed 3D structure. The order/disorder prediction in protein sequences is an important task given the biological roles of disordered proteins. In the last decade many computational based methods have been proposed for the disorder identification but currently the most accurate strategies depend on the sequence alignment of large databases of proteins, making the methods slow and hard to apply on proteome-wide analysis. In this paper is proposed an innovative approach for linking the amino acid sequences with transition tendencies in their dihedral torsion angles. The aim is to characterize the dynamical angle variations along the protein chain, as a way of measuring the flexibility of the amino acids and its connection with the disorder state. The features are estimated from empirical propensities computed from Ramachandran Plots. The classification is performed using structural learning in the form of CRF (Conditional Random Fields). The performance is evaluated in terms of AUC (Area Under the ROC Curve), and three suitable performance metrics for unbalanced classification problems. The results show that the proposed method outperforms the most referenced alignment-free predictors and its performance is also competitive with the slower and mature alignment-based methods.

Paper Nr: 34
Title:

Computer-aided Formal Proofs about Dendritic Integration within a Neuron

Authors:

Ophélie Guinaudeau, Gilles Bernot, Alexandre Muzy, Daniel Gaffé and Franck Grammont

Abstract: This article is threefold: (i) we define the first formal framework able to model dendritic integration within biological neurons, (ii) we show how we can turn continuous time into discrete time consistently and (iii) we show how a Lustre model checker can automatically perform proofs about neuron input/output behaviours owing to our framework. Our innovative formal framework is a carefully defined trade-off between abstraction and biological relevance in order to facilitate proofs. This framework is hybrid: inputs entering the synapses as well as the soma output are discrete signals made of spikes but, inside the dendrites, we combine signals quantitatively using real numbers. The soma potential is inevitably specified as a differential equation to keep a biologically accurate modelling of signal accumulation. This prevents from performing simple formal proofs. This has been our motivation to discretize time. Owing to this discretization, we are able to encode our neuron models in Lustre. Lustre is a particularly well suited flow-based language for our purpose. We also encode in Lustre a property of input/output equivalence between neurons in such a way that the model checker Kind2 is able to automatically handle the proof.

Paper Nr: 41
Title:

Bicluster Detection by Hyperplane Projection and Evolutionary Optimization

Authors:

Maryam Golchin and Alan Wee-Chung Liew

Abstract: Biclustering is a powerful unsupervised learning technique that has different applications in many fields especially in gene expression analysis. This technique tries to group rows and columns in a dataset simultaneously, which is an NP-hard problem. In this paper, a multi-objective evolutionary algorithm is proposed with a heuristic search to solve the biclustering problem. To do so, rows are projected into the column space. Projection decreases the computational cost of geometric biclustering. The heuristic search is done by sample Pearson correlation coefficient over the rows and columns of a dataset to prune unwanted rows and columns. The experimental results on both synthetic and real datasets show the effectiveness of our proposed method.

Paper Nr: 42
Title:

Supervised Classification of Metatranscriptomic Reads Reveals the Existence of Light-dark Oscillations During Infection of Phytoplankton by Viruses

Authors:

Enzo Acerbi, Caroline Chenard, Stephan C. Schuster and Federico M. Lauro

Abstract: In the era of next generation sequencing technologies microbial species identification is typically performed using sequence similarity and sequence phylogeny based approaches. Particularly challenging is the discrimination of closely related sequences such as auxiliary metabolic genes (AMGs) in cyanobacteria and their viruses (cyanophages). Here we developed a method which combines Support Vector Machine based classification of AMGs short fragments and Empirical Mode Decomposition of periodic features in time-series. We applied this method to investigate the transcriptional dynamics of viral infection in the ocean, using data extracted from a previously published metatranscriptome profile of a naturally occurring oceanic bacterial assemblage sampled Lagrangially over 3 days. We discovered the existence of light-dark oscillations in the expression patterns of AMGs in cyanophages which follow the harmonic diel transcription of both oxygenic photoautotrophic and heterotrophic members of the community. These findings suggest that viral infection might provide the link between light-dark oscillations of microbial populations in the North Pacific Subtropical Gyre.

Short Papers
Paper Nr: 6
Title:

A Model-checking Approach to Reduce Spiking Neural Networks

Authors:

Elisabetta De Maria, Daniel Gaffé, Cédric Girard Riboulleau and Annie Ressouche

Abstract: In this paper we formalize Boolean Probabilistic Leaky Integrate and Fire Neural Networks as Discrete-Time Markov Chains using the language PRISM. In our models, the probability for neurons to emit spikes is driven by the difference between their membrane potential and their firing threshold. The potential value of each neuron is computed taking into account both the current input signals and the past potential value. Taking advantage of this modeling, we propose a novel algorithm which aims at reducing the number of neurons and synaptical connections of a given network. The reduction preserves the desired dynamical behavior of the network, which is formalized by means of temporal logic formulas and verified thanks to the PRISM model checker.

Paper Nr: 7
Title:

A Novel Computer Vision Methodology for Intelligent Molecular Modeling and Simulation

Authors:

Belal Medhat and Ahmed Shawish

Abstract: Molecular modeling and simulation tools are used to study the structure of the molecules for the purpose of understanding and creating a new generation of technology that works on the nano-scale. The current techniques mainly focus on visualizing the molecule’s structure using many illustrative methods, while they leave the knowledge extraction load on the user that should be aware of many complex sciences. Developing a new innovative method in this perspective becomes crucial to support such fast development in such vital field of sciences. This paper represents a novel computer vision method for molecular modeling and simulation that gives the computer the ability to see and understand the structure of molecules just like the human eyes, and also the ability to analyze its structure without human intervention. The proposed approach is based on using the computer’s memory as a digital representation of the real 3D-physical scaled model of the molecule, and hence accommodates machine learning techniques for an automated analysis job. Moreover, a parallel processing approach has been adopted to speed up the whole process. The realistic case study of a glucose molecule reports the outstanding performance of the proposed approach to model and analyze its structure without human intervention. The proposed methodology makes the developing of an automated molecular expert system a one step away.

Paper Nr: 8
Title:

Grammar-based Compression for Directed and Undirected Generalized Series-parallel Graphs using Integer Linear Programming

Authors:

Morihiro Hayashida, Hitoshi Koyano and Tatsuya Akutsu

Abstract: We address a problem of finding generation rules from biological data, especially, represented as directed and undirected generalized series-parallel graphs (GSPGs), which include trees, outerplanar graphs, and series-parallel graphs. In the previous study, grammars for edge-labeled rooted ordered and unordered trees, called SEOTG and SEUTG, respectively, were defined, and it was examined to extract generation rules from glycans and RNAs that can be represented by rooted tree structures, where integer linear programming-based methods for finding the minimum SEOTG and SEUTG that produce only given trees were developed. In nature and organisms, however, there are various kinds of structures such as gene regulatory networks, metabolic pathways, and chemical structures that cannot be represented as rooted trees. In this study, we relax the limitation of structures to be compressed, and propose grammars representing edge-labeled directed and undirected GSPGs based on context-free grammars by extending SEOTG and SEUTG. In addition, we propose an integer linear programming-based method for finding the minimum GSPG grammar in order to analyze more complicated biological networks and structures.

Paper Nr: 19
Title:

Classification of Helitron’s Types in the C.elegans Genome based on Features Extracted from Wavelet Transform and SVM Methods

Authors:

Rabeb Touati, Imen Messaoudi, Afef ElloumiOueslati and Zied Lachiri

Abstract: Helitrons, a sub-class of the Transposable elements class 2, are considered as an important DNA type. In fact, they contribute in mechanism’s evolution. Till now, these elements are not well studied using the automatic tools. In fact, the researches done in helitron's recognition are based only on biological experiments. In this paper, we propose an automatic method for characterizing helitrons by global signature and classifying the helitron’s types in C.elegans genome. For this goal, we used the Complex Morlet Wavelet Transform to generate helitron’s signatures (helitron’s scalograms presentation) and to extract the features of each category. Then, we used the SVM-classifier to classify these 10 helitron’s families. After testing different kernels and using the cross validation function, we present the best classification results given by the RBF-kernel with c=60, σ=0. 0000000015625 and OAO approach.

Paper Nr: 21
Title:

Study on the Fidelity of Biodevice T7 DNA Polymerase

Authors:

Ming Li, Zhong-Can Ou-Yang and Yao-Gen Shu

Abstract: We proposed a comprehensive kinetic model of steady-state copolymerization and obtain analytical solution of the high replication fidelity of the biodevice DNA polymerase. Our analytical calculations definitively show that the neighbor effects are the key factor of the overall fidelity. These analytical results were further demonstrated by T7 DNAp whose fidelity (106) is well described by the 1st-order neighbor effect.

Paper Nr: 22
Title:

A New Dimension of Breast Cancer Epigenetics - Applications of Variational Autoencoders with DNA Methylation

Authors:

Alexander J. Titus, Carly A. Bobak and Brock C. Christensen

Abstract: In the era of precision medicine and cancer genomics, data are being generated so quickly that it is difficult to fully appreciate the extent of what is discoverable. DNA methylation, a chemical modification to DNA, has been shown to be a significant factor in many cancers and is a candidate data source with ample features for model traing. However, the black-box nature of non-linear models, such as those in deep learning, and a lack of accurately labeled ground truth data have limited the same rapid adoption in this space that other methods have experienced. In this article, we discuss the applications of unsupervised learning through the use of variational autoencoders using DNA methylation data and motivate further work with initial results using breast cancer data provided by The Cancer Genome Atlas. We show that a logistic regression classifier trained on the learned latent methylome accurately classifies disease subtype.

Paper Nr: 24
Title:

Robust K-Mer Partitioning for Parallel Counting

Authors:

Kemal Efe

Abstract: Due to the sheer size of the input data, k-mer counting is a memory-intensive task. Existing methods to parallelize k-mer counting cannot guarantee equal block sizes. Consequently, when the largest block is too large for a processor’s local memory, the entire computation fails. This paper shows how to partition the input into approximately equal-sized blocks each of which can be processed independently. Initially, we consider how to map k-mers into a number of independent blocks such that block sizes follow a truncated normal distribution. Then, we show how to modify the mapping function to obtain an approximately uniform distribution. To prove the claimed statistical properties of block sizes, we refer to the central limit theorem, along with certain properties of Pascal’s quadrinomial triangle. This analysis yields a tight upper bound on block sizes, which can be controlled by changing certain parameters of the mapping function. Since the running time of the resulting algorithm is O(1) per k-mer, partitioning can be performed efficiently while reading the input data from the storage medium.

Paper Nr: 29
Title:

Systems Biology Analysis and Literature Data Mining for Unmasking Pathogenic Neurogenomic Variations in Clinical Molecular Diagnosis

Authors:

Ivan Y. Iourov, Svetlana G. Vorsanova and Yuri B. Yurov

Abstract: Biotechnological advances in genomics have significantly impacted on molecular diagnosis. As a result, uncovering individual genomic variations has made whole-genome analysis attractive for clinical care of patients suffering from brain diseases. However, to obtain clinically relevant genomic data for successful molecular genetic/genomic diagnosis, interpretation technologies are recognized to be indispensable. Taking into account the predictive power of bioinformatics in basic genetic studies, it has been proposed to use in silico systems biology analysis and data mining for detecting clinically relevant genomic variations by diagnostic healthcare services. Here, we describe an algorithm used as an integral part of molecular diagnosis of clinically relevant genomic pathology (neurogenomic variations) in brain diseases. The bioinformatic technique allows interpreting variations at chromosome and gene levels through systems biology analysis including literature data mining, which enables to modulate the effect of each genomic change at transcriptome, proteome and metabolome levels. Studying neurogenomic variations using this approach, we were able to show that the algorithm can be used as a valuable add-on to whole genome analysis for diagnostic purposes inasmuch as it appreciably increases the efficiency of molecular diagnosis.

Paper Nr: 31
Title:

The Complex Biological Immune System through the Eyes of Dual Phase Evolution

Authors:

Snehal B. Shinde and Manish P. Kurhekar

Abstract: Dual phase evolution (DPE) is the process that brings self organization and emergence in complex adaptive systems (CASs) like immune system. Self organization is a property of organizing the CAS by itself, whereas emergence is caused due to composite cellular interactions among the system components. These properties are observed due to phase transitions in the system. The immune system is a complex biological system with an inbuilt defense mechanism for human beings and other vertebrate animals. It provides an intricate cellular response against the foreign disturbances. There is an immediate need to understand the cellular dynamics, emergence, and self-organization during immune response. In this paper, we describe how DPE can be used to understand the immune system and its response against foreign disturbances. DPE is proved to be helpful to analyze phase transitions of the immune system between its state before and after the disturbances. Before the disturbances, a state of the immune system is called as a circulation state, where immune cells are circulating throughout the body. After the disturbances, the state is referred to as a growth state, where rapid expansion in the number of immune cells is observed. DPE allows the immune system to rest in one of the phases as a local state of circulation phase(poorly connected phase) or a global state of growth phase (well connected phase), although it is predominantly the local phase. This model allows the integration of immune system, network theory, and DPE to observe emergence and self-organization of the immune system during immune response.

Paper Nr: 33
Title:

Environmental Metagenome Classification for Soil-based Forensic Analysis

Authors:

Jolanta Kawulok and Michal Kawulok

Abstract: Metagenome analysis makes it possible to extract essential information on the organisms that have left their traces in a given environmental sample. In some cases, it is sufficient to determine the origin of an environmental sample, rather than being able to accurately identify the organisms living there (which may be a challenging task). For example, in forensic soil analysis, it could be possible to confirm or exclude that a defendant was present in a certain place by comparing a soil sample acquired from his belongings against the samples derived from a variety of places (including the suspected place). In this paper, we present a method to identify the environmental origins of metagenomic reads by comparing them with entire metagenomic collections derived from reference samples. For this purpose, we exploit our CoMeta program, which allows for fast classification of metagenome samples, and we apply it to classify the extracted soil metagenomes to various collections of soil samples. The experimental results reported in this paper indicate that the proposed approach is effective, which allows us to outline the future research pathways to extend and improve our method.

Paper Nr: 39
Title:

An in Silico Approach for Understanding the Complex Intercellular Interaction Patterns in Cancer Cells

Authors:

Maura Cárdenas-García and Pedro Pablo González Pérez

Abstract: Intercellular interaction allows cancer cells to preserve their malignance and through cell junctions to induce malignance in neighbouring cells and receive nutrients from them. The Wnt (wingless-related integration site) signalling pathway plays an important role in the formation of intercellular communications. In this work, we explore the complex interactions patterns of intercellular communication in cancer cells using an in silico modelling and simulation methodology developed by us. The proposed cellular signalling model, characterized by a multicompartmental nature, provides symbolic abstractions and accurate algorithms to model both intracellular and intercellular behaviours. In particular, in this work, we propose an in silico model and simulation of the formation of different communication channels, involving the Wnt signalling pathway. The final purpose of this study is to propose target molecules leading to break the communication between a cancer cell and surrounding normal cells. In this way, it is not necessary to carry out long series of different in vitro experiments, but only a few, because the focus should be only on the key molecules, which saves time and money. We observed, using in silico experiments, how the inhibition of Wnt signalling pathway prevents that the cells surrounding a cancerous cell are transformed.

Paper Nr: 40
Title:

ASAR Database: An R Tool for Visual Analysis and Storage of Metagenomes

Authors:

Askarbek Orakov, Nazgul Sakenova, Igor Goryanin and Anatoly Sorokin

Abstract: The functional and taxonomic analysis is the critical step in understanding the interspecies interaction within the microbial communities. Currently, these types of analysis are run independently, which makes interpretation of the results hard and error-prone. Here we present ASAR (Advanced metagenomic Sequence Analysis in R) Database, the interactive tool and the databases for storage and exploratory analysis of the metagenomic sequencing data along three dimensions: taxonomy, function, and environmental conditions.

Posters
Paper Nr: 4
Title:

Selective Covariance-based Human Localization, Classification and Tracking in Video Streams from Multiple Cameras

Authors:

A. R. Taranyan, V. V. Devyatkov and A. N. Alfimtsev

Abstract: In this paper a novel selective covariance-based method for human localization, classification and tracking in video streams from multiple cameras is proposed. Such methods are crucial for security and surveillance systems, smart environments and robots. The method is called selective covariance-based because before classifying the object using covariance descriptors (in this case the classes are the different people being tracked) we extract (selection) specific regions, which are definitive for the class of objects we deal with (people). In our case, the region being extracted is the human head and shoulders. In the paper new feature functions for covariance region descriptors are developed and compared to basic feature functions, and a mask, filtering out the most of the background information from region of interest, is proposed and evaluated. The use of the proposed feature functions and mask significantly improved the human classification performance (from 75% when using basic feature functions to 94.6% accuracy with the proposed method) while keeping computational complexity moderate.

Paper Nr: 10
Title:

Species Categorization via MicroRNAs - Based on 3’UTR Target Sites using Sequence Features

Authors:

Malik Yousef, Dalit Levy and Jens Allmer

Abstract: Proteins define phenotypes and their dysregulation leads to diseases. Post-translational regulation of protein abundance can be achieved by microRNAs (miRNAs). Therefore studying this method of gene regulation is of high importance. MicroRNAs interact with their target messenger RNA via hybridization within a specialized molecular framework. Many miRNAs and their targets have been identified and they are listed in various databases like miRTarBase. The experimental identification of functional miRNA-mRNA pairs is difficult and, therefore, they are detected computationally which is complicated due to missing negative data. Machine learning has been used for miRNA and target detection and many features have been described for miRNAs and miRNA:mRNA target duplexes generally on a per species basis. However, many claims of cross-kingdom regulation via miRNAs have been made and, therefore, we were interested whether it is possible to differentiate among species based on the target sequence in the mRNA alone. Thus, we investigated whether miRNA targets sites within the 3’UTR can be differentiated between species based on k-mer features only. Target information of one species was used as positive examples and the others as negative ones to establish machine learning models. It was observed that few features were sufficient for successful categorization of mircoRNA targets to species. For example mouse versus Caenorhabditis elegans reached up to 97% average accuracy over 100 fold cross validation. The simplicity of the approach, based on just k-mers, is promising for automatic categorization systems. In the future, this approach will help scrutinize alleged cross-kingdom regulation via miRNAs in respect to miRNA from one species targeting mRNAs in another.

Paper Nr: 11
Title:

Apoptotic Regulatory Module as Switched Control System - Analysis of Asymptotic Properties

Authors:

Magdalena Ochab, Andrzej Swierniak, Jerzy Klamka and Krzysztof Puszyński

Abstract: Switching control systems are getting increased interests due to their capability to exhibit simultaneously several kinds of dynamic behaviour in different parts of the system. Such hybrid systems can be applied in many different fields. We present the application of the switched control systems in modelling a biological system, precisely a p53-dependent apoptotic intercellular pathway. Biological experiments show that cells exhibit variety of different behaviours for the same external stimuli. Differences in cell responses lead to population split into fractions. We present the analysis of asymptotic properties of the apoptotic regulatory module with respect to a parameter which describes an effect of an external stress. Results show that the system can exhibit two types of behaviour: stabilization or oscillation near the equilibrium point.

Paper Nr: 26
Title:

The Importance of Changes Observed in the Alternative Genetic Codes

Authors:

Paweł Błażej, Małgorzata Wnetrzak and Paweł Mackiewicz

Abstract: The standard genetic code is a way of transmitting genetic information from DNA into protein world. The code is universal for almost all living organisms on Earth but small deviations have been observed for many cellular organelles and some specific groups of microorganisms with highly reduced genomes. Such modifications are called alternative genetic codes. There is no consensus about the factors that caused or allowed these changes. A popular concept assumes that the codes evolved under neutral evolution without adaptive constraints. In this paper we present findings that argue with such view. We examined the level of error minimization in amino acid replacements generated by the standard genetic code and its alternatives. We found that only 3 out of 23 tested alternative codes have worse quality than the standard genetic code. In agreement with that, many single codon reassignments observed in the variants of the standard genetic code are generally responsible for improving the quality of the codes under the studied criteria. These results indicate that the codon reassignments observed in the existing alternative genetic codes could play an adaptive role in their evolution to minimize translational and mutational errors. The study can help in designing alternative genetic codes for artificially modified organisms in the framework of synthetic biology.

Paper Nr: 32
Title:

geneEx - A Novel Tool to Assess Differential Expression from Gene and Exon Sequencing Data

Authors:

Orazio M Scicolone, Giulia Paciello and Elisa Ficarra

Abstract: The widespread of Next Generation Sequencing technologies accounted in recent years for the possibility to evaluate gene expression with great accuracy. Moreover, it allowed assessing differential gene expression among biological conditions with high sensitivity. However, state-of-the-art bioinformatics methodologies for differential gene expression evaluation from RNA Sequencing data still suffer from several drawbacks such as reduced specificity. In this paper we propose geneEx, a novel methodology and tool for differential gene expression evaluation from RNA Sequencing reads. By combining gene and exon expression evaluation and BioMart information, geneEX provides users with annotated lists of highly reliable differentially expressed genes. The results obtained in Sequencing Quality Control dataset proven the importance of a novel approach to lower False Positive predictions from current methodologies and the strength of the proposed methodological approach to increase the sensitivity of differentially expressed gene identification.