Working a bit in all areas of the Data spectrum but specializing in Data Science, ML Engineering
Wouldn't call myself a python expert
Thanks to the PyDay organisers for putting everything together
This applies to pretty much any trending tech.
Seen it for Big Data, AI, etc
LLM is a very loose definition because it depends on the size of the NN and how feasible the generated text is
Agent comes from RL
A simple starting picture
This can become as complicated as necessary: sequence of actions, loops...
**Why a good example?**
- Real-world PDFs with messy data
- Structured but inconsistent formatting
- Perfect for extraction tasks
Two-agent architecture
Extractor is responsible for reading whole paper and getting info about authors and affiliations.
Resolver is responsible for normalising affiliations into standardized names and raising issues.
Automatic author affiliation btw is a real problem
A large part of the interaction with LLMs revolves around validating their outputs.
Pydantic does modern python by leveraging type hints.
There is a very nice pattern of modelling your IOs as Pydantic models and then building an API on top is almost automatic.
Model agnostic is important due to the speed at which SOTA changes
Do not get married to a provider or a particular LLM
See Enric's workshop earlier for LangGraph
Really useful tools each with its use case