Series
Reproducibility is the corner stone of science
Somewhere between building a model and publishing the results, something gets lost. not the results themselves, those make it into the paper. what gets lost is everything that would allow someone else to arrive at the same place.
This is the reproducibility problem in machine learning, and it is more widespread than most people admit.
In 2016, the journal Nature reported that around 70% of researchers had failed to reproduce another researcher’s results , and 50% had failed to reproduce their own. machine learning is no exception. a study that analyzed 400 papers from top AI conferences found that only 6% shared code, roughly 33% shared test data, and 54% shared nothing more than a pseudocode summary of their algorithm. not the environment. not the hyperparameters. not the exact version of the library that made it work. just a rough sketch of the idea.
The art must have a purpose other than itself, or it collapses into infinite recursion.
I’ve spent a lot of time confused about my own wiring. not in a debilitating way, more like an ongoing low grade puzzle that I kept returning to. the confusion had a specific shape: I knew I was deeply analytical, the kind of person who could sit with an idea for hours without needing it to go anywhere, who would trace a concept back to its roots just for the satisfaction of understanding it fully. but I also noticed I was restless whenever nothing was being made. not bored exactly. more like something in me would protest, ask what all this thinking was actually for.
This is part of a series of notes I’m taking to understand the person I am when learning something. most of them are about understanding myself and applying useful techniques.
Reading about neuroscience can help me save time instead of experimenting blindly and trying to see what works. and I plan to study neuroscience some day, but as for now I want to share what I’ve learned by myself, as I spent the first 19 years of my life learning how to learn. I never even signed up for the famous Coursera course on this topic. the road of learnign was lonely, paved with faliure moods, it was an experimentation based road.
I call this problem floating information. It is a literal translation of the Arabic word “طائفة”, which means floating in space without connection. I first noticed that there is information that is not linked to anything. it has no context, it solves no problem, and this kind of information is the hardest to keep in my mind. I forget it very easily.
One of the main incentives behind sharing my notes in public is not the urge to talk. As social creatures, we naturally love sharing experiences, stories, and ideas. But I buried that essential human feature for a long time. I used to post on Facebook , not consistently, and then I went through what I called “ Manulasis ”(I heard of this back in high school in one of the articles but I forget the spelling/word but never the definition): the tendency to give up explaining things to people. I essentially gave up sharing altogether. I became detached from the external world without even noticing, day by day.
There’s an anecdote about Richard Feynman when a historian walks up to his desk and sees all the sheets of paper lying around on Mr Feynman desk and makes a comment about these being a record of Feynman’s thinking and then Mr Feynman corrects the historian and says that: these are not a record of my thinking I think on paper and then the historian presses on and says that surely you’re thinking in your head and these are only records of the thoughts in your head and that’s when Mr Feynman says: no they aren’t a record of my thinking process they are my thinking process. I actually did the work on the paper.