Ai-and-Tech

On Entering Machine Learning Competitions

Almost everyone can build a machine learning model now. you can prompt your way into a working pipeline in a matter of five minutes. building a model is not the problem, optimizing it is, understanding how and what it is doing are two completely different things, and that gap becomes very visible the moment you step into a competition.

I spent time participating in competitions on Kaggle and Zindi , joining teams, reading solutions, watching people work. and the pattern I kept seeing was the same: build a baseline, iterate, submit, repeat, and that’s fine, that’s how it works, but without ever asking why the iteration was moving in a particular direction. not having a compass. is what mostly happen. the leaderboard becomes the only signal, and you start optimizing for a number without understanding what the number means. that’s not learning. that’s just another form of guessing.

Career Advice, AI and Learning How to Code

I remember my friend Awab sharing a thoughtful reading recommendation from the LessWrong community titled You will be OK . it is both a clarification and a gentle reminder of different ways to think and act under the existential threat of AI, written as a response to concerns raised by a young community member in Turning 20 in the probable pre-apocalypse .

Computation Is Not Reality

Recently, I’ve been working on the reproducibility of machine learning research. In practice, this meant doing far more infrastructure work than actual machine learning modeling. I found myself digging into operating systems, containers, storage, and hardware details. and suddenly, I started questioning everything, not because it wasn’t interesting, but because I was drifting away from the kind of problems that originally made me fall in love with machine learning.

The Reproducibility Problem in Machine Learning

Reproducibility is the corner stone of science

Somewhere between building a model and publishing the results, something gets lost. not the results themselves, those make it into the paper. what gets lost is everything that would allow someone else to arrive at the same place.

This is the reproducibility problem in machine learning, and it is more widespread than most people admit.

In 2016, the journal Nature reported that around 70% of researchers had failed to reproduce another researcher’s results , and 50% had failed to reproduce their own. machine learning is no exception. a study that analyzed 400 papers from top AI conferences found that only 6% shared code, roughly 33% shared test data, and 54% shared nothing more than a pseudocode summary of their algorithm. not the environment. not the hyperparameters. not the exact version of the library that made it work. just a rough sketch of the idea.

Flow State and Sleep

Lately, I lost my ability to focus and maintain a flow state. I don’t sleep late, I eat relatively healthy food, and I’m not dopamine addicted. The problem comes down to practices that would align your efforts with what you want to achieve.