Show challenge overview

Datathon: https://www.kaggle.com/competitions/archivalDatathon/

Profiling: file:///home/ber2/archivalDatathon/training_profile.html

Exploration: look for the inbalance in the classes

The point of versioning data and models is reproducibility

Do not pay attention to good engineering practices: testing is superseded by validation, code duplication is faster than solving python import paths

Winning public score: 0.93262

Winning private score: 0.93111

4/7 people went above 0.90

2/7 people went above 0.93

Feature engineering typically has an impact of one order of magnitude higher