The predictive features are those that allow an accurate predictive model, for example for disease diagnosis. I prove that finding redictive features is a tractable problem, in that consistent estimates can be computed in polynomial time.
This is a substantial improvement upon current theory. However, I also demonstrate that selecting features to optimize prediction accuracy does not control feature error rates.
As with the Markov boundary, this definition can also be expressed using a distribution divergence measure. Note that the above reasoning applies to any regularized empirical risk of the form 2. As future work, we would like to confirm our results on additional real datasets. Feature extraction and gene set Summary. B2D2K fellows can choose from optional short modules on specific experimental or computational techniques e. Here p y denotes the marginal class probability. For real datasets, brain atlases are in general available for the sake of result interpretation.
This is a severe drawback in life science, where the selected features per se are important, for example as candidate drug targets. To address this problem, I propose a statistical method which to my knowledge is the first to achieve error control.
Moreover, I show that in high dimensions, feature sets can be impossible to replicate in independent experiments even with controlled error rates. This finding may explain the lack of agreement among genome-wide association studies and molecular signatures of disease. The most predictive features may not always be the most relevant ones from a biological perspective, since the predictive power of a given feature may depend on measurement noise rather than biological properties. I therefore consider a wider definition of relevance that avoids this problem. The resulting feature selection problem is shown to be asymptotically intractable in the general case; however, I derive a set of simplifying assumptions which admit an intuitive, consistent polynomial-time algorithm.
Moreover, I present a method that controls error rates also for this problem. Username Password I forgot my password Register new account. Supports Open Access. View Articles. Track accepted paper Once production of your article has started, you can track the status of your article via Track Your Accepted Article.
Request PDF on ResearchGate | On Jan 1, , Roland Nilsson and others published Statistical Feature Selection With Applications in Life Science. Link¨oping Studies in Science and Technology. Dissertations No. Statistical Feature Selection With Applications i.
Journal Metrics CiteScore : 2. CiteScore values are based on citation counts in a given year e. Impact Factor: 2. View More on Journal Insights. This free service is available to anyone who has published and whose publication is in Scopus.
Researcher Academy Author Services Try out personalized alert features. Results in Physics. Read more. Iglesias V.
Wanting Xiong Han Fu Do idiosyncratic risk, market risk, and total risk matter during different firm life cycle stages? Farrukh Shahzad Zeeshan Fareed Konstantinos Gkillas Athanasios Tsagkanos Xue Wang Yu Xue Non validity of index law in fractional calculus: A fractional differential operator with Markovian and non-Markovian properties Abdon Atangana.
Xiao Jun Yang J. Tenreiro Machado.
Poonam Redhu Arvind Kumar Gupta. Piero Montebruno Robert J. Most Cited Articles The most cited articles published since , extracted from Scopus. Abdon Atangana. Arcady Ponosov Lev Idels New research reveals that cellular motors are controlled by movement, not energy gradients. Mendeley Data Repository is free-to-use and open access.
It enables you to deposit any research data including raw and processed data, video, code, software, algorithms, protocols, and methods associated with your research manuscript. Your datasets will also be searchable on Mendeley Data Search, which includes nearly 11 million indexed datasets. For more information, visit Mendeley Data.
Konstantinos Gkillas. Ruisen Jiang. HuanHuan Feng. Article Selections.