Saturday, September 18, 2010

Playing Notes by Image Processing. What You See is What You Hear.

Amazingly, musical notes can also be played in scilab. Consider the musical score sheet in figure 1.


Figure 1: Rain Rain Go Away music score sheet.

After some processing, scilab would be abe to play these notes out.



This video file plays the musical score sheet in figure 1. Again, this is to emphasize that the notes can be played by using the concepts learned in Image Processing. The problem can be solved in a freestyle manner. For this post, I will be discussing everything, in detail, what I used to play these notes. The important thing is, the computer reads the score sheet and then plays the notes.

The main concept I will be using for this activity is the correlation concept. How two patterns are correlated? The higher the correlation, the more similar are the patterns in question.

Lucky for us the score sheet is relatively 'clean', so no further 'cleaning' is required.

First I cropped or copied the pattern of a half note and a quarter note (Figure 2 and 3 respectively). These cropped images will be used later to implement the correlation concept.


Figure 2: half note


Figure 3: quarter note


I open the score sheet, the half note and the quater note in scilab as grayscale images, using the gray_imread('filename/filepath') function. It is important to note that the half note and the quarter note must have the same size as the score sheet. Plus, the location of these notes must be at the center of the image, I can't emphasize it enough, the notes (half note and quarter note) must be at the center of their respective images.
The next step is to derive the corresponding FT's of the images. This is done by using the fft2(image) function.
The next step is to correlate the musical score sheet with the notes individually. This step helps in distinguishing the notes in the original musical score sheet in figure 1. Click here to review the concept of correlation.
Figure 4 shows the correlation between the scoresheet and the half note.


Figure 4: Correlation between the original score sheet and the half note.

Apparently, the difference between the half note and the quarter note is not that big because as figure 4 shows, even the quarter notes in the score sheet have high correlation with the half note. This can only mean that further processing is required.


Figure 5: Binarized image of the Correlation between the original score sheet and the half note.
This has a threshold value of 0.9

The next thing I did to further distinguish the half notes and the quarter notes in the original score sheet was to binarize the image as shown in figure 5. Then I observed that in figure 5, the half notes only have one pixel in size left, whereas the quarter notes have more than 1 pixel in size. With this in mind, I collected all the pixel location information of those 1 pixel size dots and labeled them as the half notes.
Figure 6 shows the correlation between the scoresheet and the quarter notes.


Figure 6: Correlation between the original scoresheet and the quarter note.

Much like the case in figure 4, figure 6 requires further processing to properly distinguish the notes. So, again, I binarized the image in figure 6, which is shown in figure 7.


Figure 7: Binarized image of the correlation between the original score sheet and the quarter notes.

As can be seen in figure 7, what remains are the points where the quarter notes are located. So, I collected all of their pixel locations and labeled them as quarter notes.

After collecting all of the necessary informations (the notes and their locations), I arranged them in such a way that their arrangement or order in the original score sheet are preserved. So basically, the column part of their pixel location indicates which notes should be played first. And the row part of their pixel location indicates in what frequency they should be played (A,B,C,D,E,F,G). The duration or how long they should be played is indicated by the correlation part.

Now that all of the necessary things are in hand (the notes, the order in which to play them, the frequency they should be played at, and the duration of how long they should be played), there is only one thing left to do, and that is to play music. In scilab, notes can be played by using the sound() function. A function that represents sound is also needed to play the notes and in simple terms, sound can be represented as a sine wave, hence a sin(frequency, time(=soundsec(time))) function is used, and depending on the frequency, different tones/pitch can be heard. The function soundsec() is used to indicate the duration of the note to be played. All in all, the note can be played by using this function: sound([function with the representation of sound]).

I would like to acknowledge my discussions with Arvin Mabilangan and Gino Leynes.

--
Technical Correctness: 4/5 (late posting)
Quality of Presentation: 5/5


No comments:

Post a Comment