Posts tagged with 'data engineering'
Conserving Memory while Streaming from DuckDB
In the weeks since my previous post on Working with Arrow and DuckDB in Rust, I've found a few gripes that I'd like to address. Memory usage of query_arrow and stream_arrow In the previous post, I used the query_arrow API. It's pretty straightforward ... read more →
How (and why) to work with Arrow and DuckDB in Rust
My day job involves wrangling a lot of data very fast. I've heard a lot of people raving about several technologies like DuckDB, (Geo)Parquet, and Apache Arrow recently. But despite being an "early adopter," it took me quite a while to figu ... read more →
Quadrupling the Performance of a Data Pipeline
Over the past two weeks, I've been focused on optimizing some data pipelines. I inherited some old ones which seemed especially slow, and I finally hit a limit where an overhaul made sense. The pipelines process and generate data on the order of hund ... read more →