
A quick thought on three parallels that I have recently observed. We keep pushing the needle of what is possible and what can be automated, but:
-
Doing good data analysis relies on spending ~80% of time doing data cleaning tasks. Notably, finding inconsistencies between raw data and business logic, and taking intelligent decisions to circumvent them.
-
The impact of feature engineering on training a Machine Learning model often far outweighs the impact of model selection. Having the insight to manually build a feature with the right predictive power can bump model accuracy by much more than hyperparameter tuning.
These are still true in the AI era, by the way, although now the task is often done much faster by an AI agent. Besides that, is there another parallel in the time of AI engineering?
I can think of at least one:
- Building an ad hoc document preprocessing and ingestion pipeline suited to your needs has an impact on text retrieval which far outweighs the particular choice of vector index and implementation of approximate nearest neighbours.
Picture with a bunch of documents in need of sorting by Wesley Tingey on Unsplash.