Researchers reproduce the human clarinet player in kilobyte form

Researchers have produced a 20 second audio file which is smaller than one kilobyte. That’s an encryption rate hundreds of times greater than on a standard mp3 file. But what makes the file most interesting is that it’s a reproduction rather than a recording.

The audio in question is a 20 second playing clarinet solo. That’s important as there are limitations to how quickly a human player can manipulate the three parts of the body which control the clarinet sound: the breath, tongue, and fingers. This greatly reduces the amount of information which is needed to recreate the playing.

The researchers are from the University of Rochester, led by Mark Bocko (pictured). He’s an electrical and computer engineering professor who’s worked on everything from solid state devices to microwaves, but has recently concentrated on digital signal processing in audio.

Along with Bocko, researchers Xiaoxiao Dong and Mark Sterling worked to produce computer models of both the clarinet itself and the human body (treating it as an instrument), then mapped out how the two interact to produce the sound. Using this system, it’s possible to feed in a recording of a real performance to the computer, which will then ‘learn’ the piece and recreate it, producing a much smaller audio file.

Though the resulting sound is close to the original, it’s not yet a perfect mix. That’s because the researchers haven’t yet mastered recreating a clarinet playing technique known as tonguing in which the player strikes the reed with their tongue. This means the system is currently limited to sections of music with sustained notes rather than the staccato style which includes lengthier gaps between notes.

The researchers say that in theory the principle could also be used to map the human voice. However, the sheer complexity of the voice means this would be a difficult task. More immediate uses for the system could be to refine it to process multiple instruments from an original recording (then combining them in an ultra-compressed audio file) or greatly improving the ‘realism’ of synthesizers by having them recreate the player as well as the instrument.

[Source: Analogik.com]