Generalist-Notes

On Entering Machine Learning Competitions

Almost everyone can build a machine learning model now. you can prompt your way into a working pipeline in a matter of five minutes. building a model is not the problem, optimizing it is, understanding how and what it is doing are two completely different things, and that gap becomes very visible the moment you step into a competition.

I spent time participating in competitions on Kaggle and Zindi , joining teams, reading solutions, watching people work. and the pattern I kept seeing was the same: build a baseline, iterate, submit, repeat, and that’s fine, that’s how it works, but without ever asking why the iteration was moving in a particular direction. not having a compass. is what mostly happen. the leaderboard becomes the only signal, and you start optimizing for a number without understanding what the number means. that’s not learning. that’s just another form of guessing.

Action Influences Thoughts

Action Influences Thought and Not The Other Way Around.

Most self help books sell you the same idea in different packaging: fix your mind first, then your life will follow. cultivate the right thoughts. visualize the outcome. build the belief before you build anything else.

Indeed there is truth in that, thoughts do shape action, I agree. but there is a half of the equation that nobody talks about, the feedback running in the other direction. action shapes thought. and it does, faster, and more permanently, than thinking ever could.

The Reproducibility Problem in Machine Learning

Reproducibility is the corner stone of science

Somewhere between building a model and publishing the results, something gets lost. not the results themselves, those make it into the paper. what gets lost is everything that would allow someone else to arrive at the same place.

This is the reproducibility problem in machine learning, and it is more widespread than most people admit.

In 2016, the journal Nature reported that around 70% of researchers had failed to reproduce another researcher’s results , and 50% had failed to reproduce their own. machine learning is no exception. a study that analyzed 400 papers from top AI conferences found that only 6% shared code, roughly 33% shared test data, and 54% shared nothing more than a pseudocode summary of their algorithm. not the environment. not the hyperparameters. not the exact version of the library that made it work. just a rough sketch of the idea.