IPSJ Digital Courier

Both bibliographic data and PDF files can be accessed free until December 2008. Copyright reserved.

Table of Contents

Message from the IPSJ president

Editorial board for 2008

[ Volume 1 BackNumber ][ Volume 2 BackNumber ]
[ Volume 3 BackNumber ][ Volume 4 BackNumber ]

Volume 4 (2008, Latest release 2008-3-26)

268-280	Regular Papers Maintaining Multiple Populations with Different Diversities for Evolutionary Optimization Based on Probability Models Takayuki Higo, Keiki Takadama This paper proposes a novel method, Hierarchical Importance Sampling (HIS) that can be used instead of population convergence in evolutionary optimization based on probability models (EOPM)such as estimation of distribution algorithms and cross entropy methods. In HIS, multiple populations are maintained simultaneously such that they have different diversities, and the probability model of one population is built through importance sampling by mixing with the other populations. This mechanism can allow populations to escape from local optima. Experimental comparisons reveal that HIS outperforms general EOPM.

257-267	Papers Fast Projective Reconstruction: Toward Ultimate Efficiency Hanno Ackermann, Kenichi Kanatani We accelerate the time-consuming iterations for projective reconstruction, a key component of self-calibration for computing 3-D shapes from feature point tracking over a video sequence. We first summarize the algorithms of the primal and dual methods for projective reconstruction. Then, we replace the eigenvalue computation in each step by the power method. We also accelerate the power method itself. Furthermore, we introduce the SOR method for accelerating the subspace fitting involved in the iterations. Using simulated and real video images, we demonstrate that the computation sometimes becomes several thousand times faster.

250-256	Original Papers A Distributed-Processing System for Accelerating Biological Research Using Data-Staging Yoshiyuki Kido, Shigeto Seno, Susumu Date, Yoichi Takenaka, Hideo Matsuda The number of biological databases has been increasing rapidly as a result of progress in biotechnology. As the amount and heterogeneity of biological data increase, it becomes more difficult to manage the data in a few centralized databases. Moreover, the number of sites storing these databases is getting larger, and the geographic distribution of these databases has become wider. In addition, biological research tends to require a large amount of computational resources, i.e., a large number of computing nodes. As such, the computational demand has been increasing with the rapid progress of biological research. Thus, the development of methods that enable computing nodes to use such widely-distributed database sites effectively is desired. In this paper, we propose a method for providing data from the database sites to computing nodes. Since it is difficult to decide which program runs on a node and which data are requested as their inputs in advance, we have introduced the notion of “data-staging” in the proposed method. Data-staging dynamically searches for the input data from the database sites and transfers the input data to the node where the program runs. We have developed a prototype system with data-staging using grid middleware. The effectiveness of the prototype system is demonstrated by measurement of the execution time of similarity search of several-hundred gene sequences against 527 prokaryotic genome data.

238-249	Original Papers A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction Gen Kawamura, Shigeto Seno, Yoichi Takenaka, Hideo Matsuda Chemical and biological activities of compounds provide valuable information for discovering new drugs. The compound fingerprint that is represented by structural information of the activities is used for candidates for investigating similarity. However, there are several problems with predicting accuracy from the requirement in the compound structural similarity. Although the amount of compound data is growing rapidly, the number of well-annotated compounds, e.g., those in the MDL Drug Data Report (MDDR)database, has not increased quickly. Since the compounds that are known to have some activities of a biological class of the target are rare in the drug discovery process, the accuracy of the prediction should be increased as the activity decreases or the false positive rate should be maintained in databases that have a large number of un-annotated compounds and a small number of annotated compounds of the biological activity. In this paper, we propose a new similarity scoring method composed of a combination of the Tanimoto coefficient and the proximity measure of random forest. The score contains two properties that are derived from unsupervised and supervised methods of partial dependence for compounds. Thus, the proposed method is expected to indicate compounds that have accurate activities. By evaluating the performance of the prediction compared with the two scores of the Tanimoto coefficient and the proximity measure, we demonstrate that the prediction result of the proposed scoring method is better than those of the two methods by using the Linear Discriminant Analysis (LDA) method. We estimate the prediction accuracy of compound datasets extracted from MDDR using the proposed method. It is also shown that the proposed method can identify active compounds in datasets including several un-annotated compounds.

228-237	Original Papers Reaction Structure Profile: A Comparative Analysis of Metabolic Pathways Based on Important Substructures Yuta Ashida, Tomonobu Ozaki, Takenao Ohkawa Comparative analysis of organisms with metabolic pathways gives important information about functions within organisms. In this paper, we propose a new method for comparing the metabolic pathways with reaction structures that include important enzymes. In this method, subgraphs from pathways that include `choke point' or `load point' are extracted as important “reaction structures, ” and a “reaction structure profile, ” which represents whether extracted reaction structures are observed in the metabolic pathway of other organisms, is created. Distance regarding function within organisms between species is defined using the “reaction structure profile.”By applying the proposed method to the metabolic networks of 64 representative organisms selected from Archaea, Eubacteria and Eukaryote in the KEGG database, we succeed in reconstructing a phylogenetic tree, and confirm the effectiveness of the method.

217-227	Original Papers Prediction of Protein-Protein Interaction Sites Using Only Sequence Information and Using Both Sequence and Structural Information Masanori Kakuta, Shugo Nakamura, Kentaro Shimizu Protein-protein interactions play an important role in a number of biological activities. We developed two methods of predictingprotein-protein interaction site residues. One method uses only sequence information and the other method uses both sequence and structural information. We used support vector machine (SVM) with a position specific scoring matrix (PSSM) as sequence information and accessible surface area(ASA) of polar and non-polar atoms as structural information. SVM is used in two stages. In the first stage, an interaction residue is predicted by taking PSSMs of sequentially neighboring residues or taking PSSMs and ASAs of spatially neighboring residues as features. The second stage acts as a filter to refine the prediction results. The recall and precision of the predictor using both sequence and structural information are 73.6% and 50.5%, respectively. We found that using PSSM instead of frequency of amino acid appearance was the main factor of improvement of our methods.

207-216	Original Papers A Clustering Method for Analysis of Sequence Similarity Networks of Proteins Using Maximal Components of Graphs Morihiro Hayashida, Tatsuya Akutsu, Hiroshi Nagamochi This paper proposes a novel clustering method based on graph theory for analysis of biological networks. In this method, each biological network is treated as an undirected graph and edges are weighted based on similarities of nodes. Then, maximal components, which are defined based on edge connectivity, are computed and the nodes are partitioned into clusters by selecting disjoint maximal components. The proposed method was applied to clustering of protein sequences and was compared with conventional clustering methods. The obtained clusters were evaluated using P-values for GO(GeneOntology) terms. The average P-values for the proposed method were better than those for other methods.

193-206	Original Papers A Generalized Hidden Markov Model Approach to Transmembrane Region Prediction with Poisson Distribution as State Duration Probabilities Takashi Kaburagi, Takashi Matsumoto We present a novel algorithm to predict transmembrane regions from a primary amino acid sequence. Previous studies have shown that the Hidden Markov Model (HMM) is one of the powerful tools known to predict transmembrane regions; however, one of the conceptual drawbacks of the standard HMM is the fact that the state duration, i.e., the duration for which the hidden dynamics remains in a particular state follows the geometric distribution. Real data, however, does not always indicate such a geometric distribution. The proposed algorithm utilizes a Generalized Hidden Markov Model (GHMM), an extension of the HMM, to cope with this problem. In the GHMM, the state duration probability can be any discrete distribution, including a geometric distribution. The proposed algorithm employs a state duration probability based on a Poisson distribution. We consider the two-dimensional vector trajectory consisting of hydropathy index and charge associated with amino acids, instead of the 20 letter symbol sequences. Also a Monte Carlo method (Forward/Backward Sampling method) is adopted for the transmembrane region prediction step. Prediction accuracies using publicly available data sets show that the proposed algorithm yields reasonably good results when compared against some existing algorithms.

182-192	Regular Papers A Type System for Dynamic Delimited Continuations Takuo Yonezawa, Yukiyoshi Kameyama We study the control operators “control” and “prompt” which manage part of continuations, that is, delimited continuations. They are similar to the well-known control operators“shift” and “reset”, but differ in that the former is dynamic, while the latter is static. In this paper, we introduce a static type system for “control”and “prompt” which does not use recursive types. We design our type system based on the dynamic CPS transformation recently proposed by Biernacki, Danvy and Millikin. We also introduce let-polymorphism into our type system, and show that our type system satisfies several important properties such as strong type soundness.

167-181	Regular Papers Parallel Skeletons for Sparse Matrices in SkeTo Skeleton Library Yuki Karasawa, Hideya Iwasaki Skeletal parallel programming makes both parallel programs development and parallelization easier. The idea is to abstract generic and recurring patterns within parallel programs as skeletons and provide them as a library whose parallel implementations are transparent to the programmer. SkeTo is a parallel skeleton library that enables programmers to write parallel programs in C++ in a sequential style. However, SkeTo's matrix skeletons assume that a matrix is dense, so they are incapable of efficiently dealing with a sparse matrix, which has many zeros, because of duplicated computations and commutations of identical values. This problem is solved by re-formalizing the matrix data type to cope with sparse matrices and by implementing a new C++ class of SkeTo with efficient sparse matrix skeletons based on this new formalization. Experimental results show that the new skeletons for sparse matrices perform well compared to existing skeletons for dense matrices.

151-166	Numerical Computation Level-3 BLAS and LU Factorization on a Matrix Processor Ahmed S. Zekri, Stanislav G. Sedukhin As increasing clock frequency approaches its physical limits, a good approach to enhance performance is to increase parallelism by integrating more cores as coprocessors to general-purpose processors in order to handle the different workloads in scientific, engineering, and signal processing applications. In this paper, we propose a many-core matrix processor model consisting of a scalar unit augmented with b×b simple cores tightly connected in a 2D torus matrix unit to accelerate matrix-based kernels. Data load/store is overlapped with computing using a decoupled data access unit that moves b×b blocks of data between memory and the two scalar and matrix processing units. The operation of the matrix unit is mainly processing fine-grained b×b matrix multiply-add (MMA) operations. We formulate the data alignment operations including matrix transposition and skewing as MMA operations in order to overlap them with data load/store. Two fundamental linear algebra algorithms are designed and analytically evaluated on the proposed matrix processor: the Level-3 BLAS kernel, GEMM, and the LU factorization with partial pivoting, the main step in solving linear systems of equations. For the GEMM kernel, the maximum speed of computing measured in FLOPs/cycle is approached for different matrix sizes, n, and block sizes, b. The speed of the LU factorization for relatively large values of n ranges from around 50-90% of the maximum speed depending on the model parameters. Overall, the analytical results show the merits of using the matrix unit for accelerating the matrix-based applications.

[ Volume 1 BackNumber ][ Volume 2 BackNumber ]
[ Volume 3 BackNumber ][ Volume 4 BackNumber ]