The family cookbook

Laying around in my computer, I have this file:

-rw-r--r-- 1 joseph joseph 165K Feb 5 1995 COOKFINL.XLS

Every time I see that timestamp, I am amazed. I can’t believe that on this very computer, I have a long sequence of 1’ and 0’s that has remained unchanged since they were first created nineteen years ago. I was not even a year old when this file was made, and yet here it is just hanging out on the cloud in my Dropbox folder. It’s seriously awesome.

The file itself is a cookbook that my mom made one year as a Christmas gift to our extended family. She compiled around 150 of the most popular recipes from both sides of our family, typed them into Excel, and then had spiral-bound books printed with each recipe indexed and grouped into categories like appetizers, entrees, and desserts. Over winter break, she expressed some interest in updating the cookbook to fix some errors she’s found over the years. She also wanted to be able to send a copy of the book to the members of my generation of the family, as we are now away off at school and wanting to access some of the old, familiar recipes of home. Instead of printing out physical copies again, we decided that it would be best to send the digital copies out to all the kids. Cookbooks are nice, but the convenience of having the entire thing just on your phone wherever you are (from the couch, to the grocery, to the kitchen) overshadows any advantages of an actual dead tree book.

The problem is that this file is an Excel file. My mom complained about how terrible it was to make this 140-some page beast cell-by-cell in a spreadsheet document, but she did it because it was what the techie guy who guided her through this project told her it was the best tool for the job. Really, with the consumer-friendly programs available to my mom at the time (e.g. Microsoft Office) it probably was the best. While she always said she should have just done it in Word, I can’t imagine trying to format each recipe with that would be anything less than horrendous.

But I knew there had to be a better format. I didn’t know what I wanted to actually do with all the data, but I knew that it would be much easier to work with the recipes if they weren’t stuck in an old Excel file from ‘95. I decided I would try and parse it into a JSON file, and I would use Perl to do so. I chose Perl because I will soon be using it in my student sysadmin job and I wanted to get some experience with it.

I started by saving the file as a CSV from LibreOffice, and then got to work parsing the thing with Perl. I quickly realized I couldn’t just parse this as a normal CSV, as it was nowhere near formatted like a traditional CSV. Instead of neat rows of comma-separated columns of fields, it was hundreds of entries like this:

,,, ,,, ,,, ,BLUEBERRY MUFFINS,, ,,, ,NUMBER OF SERVINGS:,12 muffins, ,AUTHOR:,Joan Graham, ,,, ,2 cups flour,1 C. sugar, ,2 tsp. baking powder,1/2 tsp. salt, ,1/2 C. butter,1/2 C. milk, ,2 eggs,1 tsp. vanilla extract, ,2-1/2 C. blueberries,sugar for sprinkling tops of muffins, ,,, ,,, ,,, ,,, ,Preheat oven to 375°F. Grease and flour a 12-cup muffin pan. In medium bowl,, ,""combine dry ingredients. With pastry blender or 2 knives, cut in butter until mixture"",, ,""resembles coarse crumbs. In bowl combine milk, eggs and vanilla. Add to dry"",, ,""ingredients all at once, stirring just until moistened. Do not overmix. Crush 1/2 cup"",, ,berries and add to batter. Fold in remaining berries and spoon into muffin pan. ,, ,Sprinkle with sugar. Bake 30 minutes or until toothpick inserted in center comes out,, ,clean. Cool in pan 5 minutes; turn onto wire rack. (250 calories each),, ,,, ,,,

My plan of attack was as follows: parse for the recipe name by looking for an all-caps line, parse for number of servings line and the author, and finally parse the ingredients list and instructions as normal CSV’s. The way I went about doing this was, admittedly, an enormous hack. My first effort of looking for all caps lines with something like if ($line eq uc($line)) worked alright, but I had some instances of the NUMBER OF SERVINGS line giving me issues if it just included a number. So I ended up with an ugly hack where I first looked for the author line, since I knew every recipe in the book had an author, and then worked backwards to get the servings and recipe name data. I say work backwards loosely, as I was doing this all line-by-line reading the file, but instead I threw the information into different spots in appropriate arrays based on whether or not I had hit an author line. It was ugly. I also realized pretty late on that I didn’t want all these arrays, and instead I went for a nested hash where each recipe name pointed to hashes for author, servings, ingredients, and instructions.

I also got a chance to learn a little more about regular expressions. I knew a bit about regexes mostly from vim find-and-replace commands, but with this project I actually read up more on them. Admittedly, I read up on them only a little, and my lack of understanding definitely shows in the code. I’m sure there is some fancy way to address nearly all of my parsing problems with a well-crafted regex, but instead I broke them into little bits and pieces doing one thing at a time (like removing the NUMBER OF SERVINGS:, bit, and leading and trailing commas, and other things). Also, there were probably lots of simple cases where I didn’t need the regexes and instead could have written a little function to do what I needed more simply and cleanly, but as the saying goes, when all you have is a hammer, everything looks like a nail.

In the end, I was successful (mostly) in producing a JSON file of all the recipes. Here’s what I learned while doing so:

Posted on 17 Mar 2014