Posts tagged with 'rust'
Optimizing Rust Builds with Target Flags
Recently I've been doing some work using Apache DataFusion for some high-throughput data pipelines. One of the interesting things I noticed on the user guide was the suggestion to set RUSTFLAGS='-C target-cpu=native'. This is actually a pretty common ... read more →
Ownership Benefits Beyond Memory Safety
Rust's ownership system is well-known for the ways it enforces memory safety guaranteees. For example, you can't use some value after it's been freed. Further, it also ensures that mutability is explicit, and it enforces some extra rules that make mo ... read more →
Unicode Normalization
Today I ran into an amusingly named place, thanks to some sharp eyes on the OpenStreetMap US Slack. The name of this restaurant is listed as "𝐊𝐄𝐁𝐀𝐁 𝐊𝐈𝐍𝐆 𝐘𝐀𝐍𝐆𝐎𝐍". That isn't some font trickery; it's a bunch of Unicode math symbols cleverly u ... read more →
The rust-toolchain.toml file
This isn't so much a TIL as a quick PSA. If you're a Rust developer and need to ensure specific things about your toolchain, the rust-toolchain.toml file is a real gem! I don't quite remember how, but I accidentally discovered this file a year or two ... read more →
Databases as an Alternative to Application Logging
In my work, I've been doing a lot of ETL pipeline design recently for our geocoding system. The system processes on the order of a billion records per job, and failures are part of the process. We want to log these. Most applications start by dumping ... read more →
Conserving Memory while Streaming from DuckDB
In the weeks since my previous post on Working with Arrow and DuckDB in Rust, I've found a few gripes that I'd like to address. Memory usage of query_arrow and stream_arrow In the previous post, I used the query_arrow API. It's pretty straightforward ... read more →
How (and why) to work with Arrow and DuckDB in Rust
My day job involves wrangling a lot of data very fast. I've heard a lot of people raving about several technologies like DuckDB, (Geo)Parquet, and Apache Arrow recently. But despite being an "early adopter," it took me quite a while to figu ... read more →
Quadrupling the Performance of a Data Pipeline
Over the past two weeks, I've been focused on optimizing some data pipelines. I inherited some old ones which seemed especially slow, and I finally hit a limit where an overhaul made sense. The pipelines process and generate data on the order of hund ... read more →