I am asked quite often how I see Data Science in the biomedical industry. I have, of course, many answers each of which is context dependent. However one theme which I find frequently recurring is a sort of straw-man debate which seems to inherently attract technical practitioners.
The debate is usually structured as follows: How do you see the validation of medical AI products working in practice? Answer: clinical trials, test-validation sets, blah, blah But doesn’t this lead to enormous overheads? Answer: yes, but there are shortcuts But if you take these shortcuts then don’t you run the risk of running into costly failures when you finally run the clinical trials? It goes on….
Working in industrial research is usually very motivating but occasionally it is also frustrating. You’ve just done something really cool but you’re not allowed to tell anybody outside the company about it. Indeed, in a small company there might not be anybody inside of the company who can even appreciate it!
I have worked on roughly 4 really cool projects since leaving academia at the end of 2017. And apart from some basic mentions in my blog (e.g. here and here) most of what I have done has been known only to a few key stakeholders.
I had the opportunity to talk recently with a relatively advanced researcher in machine learning methods. The conversation turned briefly to the study of embeddings when he mentioned that most of his work involves things that can be embedded in Euclidean space. Since I’ve been spending a bit of time thinking about embeddings recently, I asked him some questions to get the official ML take on the subject. I was resonably gratified to learn that – although most ML engineers don’t think much about embeddings – the research on this topic considers the embedding to be tightly bound to the network architecture. It is not possible to study abstract embeddings, divorced from applications. I fully agree with this point-of-view.