Originally published at: http://kwontum.blogspot.com/2013/02/None.html

I don’t think I have to remind my readers what renaissance means, but that is what is going on here.

It has been well over a year since my last post, and much has changed. For one, I can hardly call myself a physicist anymore. Now I live and breathe data of the machine-readable form. My tools have changed from solder and liquid nitrogen to python and hadoop. I have also attended almost as many concerts since my last post as I had in my entire life up to that point. I still do not have the ear or history of a professional critic or musicologist, but I can certainly appreciate much more than I could before. I am approaching three months in a grand new city with wonderful opportunities to explore the arts. I intend to take advantage.

Given these changes, I plan to extend the content of this blog. Classical music will be a recurring theme as before, but I will start to mix in some musings on technology, data, and other arts including food and visual arts. Where these intersect with music, I will be sure to post my thoughts. And I hope to share some creative work of my own in time. In the past, I have concentrated on the quality of my writing, but in the interest of sharing more content and ideas, you may notice a few more typos and poor grammar here and there.

As an example of the type of content I would like to share, I leave you today with this interesting tool: http://www.peachnote.com/ It appears to have stopped returning results at the time of publishing, but when it works, it allows you to search for melodies (minus the temporal component). The term n-gram comes from natural language processing. The keyboard on your smart phone probably does a fair job of predicting the next word you intend to type. One way to do it is to go through a large corpus of text (preferably one generated by people on their phones as opposed to something less germane, like legal text) and count up how many times words follow other words. We may find that ‘whites’ is highly likely to follow ‘egg’. So if a user inputs ‘egg’, we should recommend ‘whites’ for the next word. But we can do better by looking at bigrams. If a user types ‘Easter egg’, then the next word will very seldom be ‘whites’. Something like ‘hunt’ is more likely. Increasing the pattern length to trigrams or up to arbitrary n-grams greatly increases the quality of results, but quickly gets more expensive (in terms of computing time) to train and compute.

To be clear, the linked tool gives the frequency of melodies across a time span rather than saying anything about what follows. But the data are rich for experimentation and discovery with respect to how western musical tastes have evolved over the past few centuries. Melodies of up to 15 intervals (15-grams) are searchable here! Music is a multidimensional stream and therefore difficult to represent in machine-readable bits. This is a wonderful first step at extracting one dimension in order to enable some discovery. I will be taking a crack at it when I find the time.