Learning
If the first thing to accept is that you’re not managing time, you’re managing energy , the second thing is asking where the energy is actually going. because it doesn’t just disappear. it goes somewhere else. and most of the time it’s going to things you’d never guess were costing you anything.
I don’t like productivity tips. they just give you the schedule. and that’s truly like naming the car should cruise in this highway, without understanding the mechanics of how and why.
Almost everyone can build a machine learning model now. you can prompt your way into a working pipeline in a matter of five minutes. building a model is not the problem, optimizing it is, understanding how and what it is doing are two completely different things, and that gap becomes very visible the moment you step into a competition.
I spent time participating in competitions on Kaggle and Zindi , joining teams, reading solutions, watching people work. and the pattern I kept seeing was the same: build a baseline, iterate, submit, repeat, and that’s fine, that’s how it works, but without ever asking why the iteration was moving in a particular direction. not having a compass. is what mostly happen. the leaderboard becomes the only signal, and you start optimizing for a number without understanding what the number means. that’s not learning. that’s just another form of guessing.
Reproducibility is the corner stone of science
Somewhere between building a model and publishing the results, something gets lost. not the results themselves, those make it into the paper. what gets lost is everything that would allow someone else to arrive at the same place.
This is the reproducibility problem in machine learning, and it is more widespread than most people admit.
In 2016, the journal Nature reported that around 70% of researchers had failed to reproduce another researcher’s results , and 50% had failed to reproduce their own. machine learning is no exception. a study that analyzed 400 papers from top AI conferences found that only 6% shared code, roughly 33% shared test data, and 54% shared nothing more than a pseudocode summary of their algorithm. not the environment. not the hyperparameters. not the exact version of the library that made it work. just a rough sketch of the idea.