Michael Bell

Processing whole files from S3 with Spark

I have recently started diving into Apache Spark for a project at work and ran into issues trying to process the contents of a collection of files in parallel, particularly when the files are stored on Amazon S3. In this post I describe my problem and how I got around …

more ...

Spell checking an IPython notebook

I've been using IPython notebooks a lot lately for both my personal and professional research and analysis projects. It's a great tool for keeping code, visualization and analysis together in one place. It's also convenient for communicating results. Just export your notebook to HTML and it's ready to distribute... except …

more ...