Using pre and post-processing methods to improve binding site predictions

Sun, Yi; Castellano, C.G.; Robinson, M.; Adams, R.G.; Rust, A.G.; Davey, N.

Using pre and post-processing methods to improve binding site predictions

Sun, Yi, Castellano, C.G., Robinson, M., Adams, R.G., Rust, A.G. and Davey, N. (2009) Using pre and post-processing methods to improve binding site predictions. Pattern Recognition, 42 (9). pp. 1949-1958. ISSN 0031-3203

Copy

Currently the best algorithms for transcription factor binding site prediction within sequences of regulatory DNA are severely limited in accuracy. In this paper, we integrate 12 original binding site prediction algorithms, and use a `window' of consecutive predictions in order to contextualise the neighbouring results. We combine either random selection or Tomek links under-sampling with SMOTE over-sampling techniques. In addition, we investigate the behaviour of four feature selection filtering methods: Bi-Normal Separation, Correlation Coefficients, F-Score and a cross entropy based algorithm. Finally, we remove some of the final predicted binding sites on the basis of their biological plausibility. The results show that we can generate a new prediction that significantly improves on the performance of any one of the individual algorithms.

Item Type	Article
Identification Number	10.1016/j.patcog.2009.01.027
Additional information	Original article can be found at: http://www.sciencedirect.com/science/journal/00313203 Copyright Elsevier Ltd.
Keywords	tomek link, support vector machines, transcription factors
Date Deposited	15 May 2025 11:35
Last Modified	08 Apr 2026 19:48

Explore Further

Pattern Recognition

Full text not available from this repository.

EndNote

BibTeX

Reference Manager

Refer

Atom

Dublin Core

OPENAIRE

RIOXX2 XML

METS

Data Cite XML

OpenURL ContextObject

ASCII Citation

OpenURL ContextObject in Span

HTML Citation

MPEG-21 DIDL

MODS

Export

Downloads