Why do Trees work better than DNNs on genome data?

This topic occurred to me following my recent talk at a dental conference at Charité Berlin. Upon hearing that I have a strong interest in inference, my fellow keynote mentioned that it drives him crazy that random forests, and similar algorithms, work so much better than DNNs on genomic data. He challenged me to come up with a reason for why this is the case.

I think that I know why. The problem I have is that I suspect that I can never prove it. That issue of not being able to prove things in machine learning is probably an equally interesting topic, for a future article, but here I want to address my theory of why random forests work better than DNNs for analysing genome data.

Continue reading “Why do Trees work better than DNNs on genome data?”

Mathematics and Biology III – Bioinformatics

When I sat down in Summer 2018 to begin my blog one of my goals was to write approximately 5 definitive articles about Mathematics and Biology. So far, I have been pretty hard on the efforts in both fields to come together. I began with a review of the very different world-views inherent in the two subjects – combined with a call to arms for likeminded people to come and help out. I followed this with a more practical consideration of the repertoire of techniques necessary and the career constraints, which actively work against combining these two disciplines. Today I want to consider the shining example of bioinformatics – the one area in which mathematics is clearly being used in biology and which demonstrates a clear career path.

Continue reading “Mathematics and Biology III – Bioinformatics”