Audio Features Ontology working document
This Wiki page aims to capture some requirements for a more expressive version of the
Audio features ontology.
The current version was tailored to fit the
EASAIER requirements, and includes just a really limited subset of
music/speech features.
Within the
Music Ontology framework, an audio
feature is represented as an event on the timeline of the signal, therefore
classifying a region of it. Hopefully, all the features detailed here can be expressed by subsuming terms in the
Event ontology.
This page should be editable by anybody, and we welcome all contributions.
Using this page
How to launch the discussion about a new feature?
Just add a bullet point to the list below, specifying what you want to express, with as many possible details and examples, to help the discussion process.
How to contribute to the discussion about a particular feature?
Just edit the discussion under the corresponding bullet point. Please add a link to a homepage, or a nickname allowing to identify your contributions.
N3/TTL notation
Some RDF examples may be given in
N3/TTL notation rather than XML, for readability's sake. If you really love XML with all your heart, you can run these files through Triplr by appending the URL to
http://triplr.org/rdf/ to get back an XML version.
For example :
http://www.omras2.com/twiki/pub/Main/AudioFeaturesOntology/sv-time-instants-unlabelled-csv.n3 is available in XML at
http://triplr.org/rdf/http://www.omras2.com/twiki/pub/Main/AudioFeaturesOntology/sv-time-instants-unlabelled-csv.n3
Discussion
Track-level features, generic operators
Where do these fit in? Generic operators commonly applied to frame-level features are things like mean, variance, entropy, difference... Their output is typically regarded as another low-level feature, even though the output of a summary operator like Mean of MFCCs, for example, is clearly not an event on a timeline.
--
MarkLevy - 22 Nov 2007
Sonic Visualiser's exported features
Sonic Visualiser exports features generated by Vamp plugins or manual annotation using its own XML format (not really intended to be an interchange format) or as comma-separated value (CSV) files. Here are some examples of the features produced by a few of the current Vamp plugins and annotation types, and the formats they end up in when exported.
Note onsets
sv-time-instants-unlabelled.xml sv-time-instants-unlabelled.csv: SV time instants layer without labels (note onsets). The instant times are defined in the XML using audio sample frame number, with the sample rate specified in the model element. In the CSV, times are just given in seconds. This is the output SV produces natively from the QM onset detector plugin (vamp:qm-vamp-plugins:qm-onsetdetector:onsets).
- These are relatively straightforward - Each time point is represented by something like :
:onset1 a af:Onset ;
tl:onTimeLine :dtl ;
tl:atInt 386000 .
- The continuous time case (eg. CSV format) is simple, an example can be found in sv-time-instants-unlabelled-csv.n3
- The discrete time case (eg. XML format) is a little harder as I'm not too familiar with the timeline mapping stuff. But I think it would be something like sv-time-instants-unlabelled-xml.n3
- In both cases I've used the existing af:Onset class from the audio features ontology. If SV didn't know the type of feature being calculated, these could simply use the superclass of tl:Instant instead.
Beats
sv-time-instants-labelled.xml sv-time-instants-labelled.csv: SV time instants layer with labels (beat positions with the tempo noted in the label). This is the output SV produces natively from the QM beat tracker plugin (vamp:qm-vamp-plugins:qm-tempotracker:beats).
- These are very similar to the two onset examples above. You just add rdfs:label triples to attach (any number of) labels to each instant. I've used the existing af:Beat class from the audio features ontology here, but again, the generic tl:Instant class could be used instead.
Function of time
sv-time-values-unlabelled.xml sv-time-values-unlabelled.csv: SV time values layer without labels (note onset detection function). Note that in principle there can be any number of points at the same time. Again, the times are by audio sample frame. This is the output SV produces from the detection-function output of the QM onset detector plugin (vamp:qm-vamp-plugins:qm-onsetdetector:detection_fn).
Notes
sv-notes.xml sv-notes.csv: SV notes layer. This is a very limited representation, even more so than MIDI: it doesn't even have velocity for example. The pitch is frequency in Hz; the units attribute of the model element tells SV that (otherwise it would treat the pitch as a MIDI pitch value). I notice that the CSV gives the start time in seconds but the duration in sample frames. There's a sort-of good reason for that, but it probably can and should be fixed. This is the output SV produces from the Aubio note tracker plugin (vamp:vamp-aubio:aubionotes:notes). The plugin actually just returns a pair of values for each result, which SV assumes to represent frequency and duration; a major limitation of Vamp is that it has neither an explicit duration field for a result nor separate units for the values in a multi-valued result, so returning a note or interval always needs some such back-channel "understanding" between plugin and host.
Grid
sv-chromagram.xml sv-chromagram.csv: SV colour 3d plot layer (chromagram). This is the output SV produces from the QM chromagram plugin (vamp:qm-vamp-plugins:qm-chromagram:chromagram). The XML contains labels for the output bins, which are currently lost in the CSV.
Text
sv-text.xml sv-text.csv: SV text layer, labels entered by hand. Each label has a time, which again is a sample frame in the XML and seconds in the CSV; Y coordinate in the range 0 (bottom of pane) to 1 (top of pane); and label content.
- For these, you could either just label instants as in the "Beats" case above, or define your own subclass of tl:Instant to use instead. You would want to define a sv:height_in_pane or something for the Y coordinate.
- Each label becomes something like :
:label5 a sv:TimeLabel ;
tl:onTimeLine :rtl ;
tl:at 8.032500000 ;
sv:height_in_pane 0.524823 ;
rdfs:label "bop!" .
- Full example : sv-text-csv.n3
- The discrete time case (from sv-text.xml) would be analogous to the onsets and beats examples above.
Images
sv-images.xml sv-images.csv: SV images layer, image added by hand. An image has a time, image source (filename or URL) and label. Images don't have a Y coordinate in SV.
Single values
sv-single-value.xml sv-single-value.csv: SV has no representation for results that consist of a single value, or even a set of values, that have no specific time and are instead associated with the entire input (e.g. an overall tempo estimate for a roughly fixed-tempo track, or textual or numeric metadata). Consequently these tend to turn up as time instants with time zero and the "value" in the label, or time values with time zero if the value is amenable to being stored in a floating point variable. This example contains the
MusicDNS? PUID for the track.
- I think you're looking to associate a value with the Signal object associated with an audio file, in which case it's just something like this :
@prefix mo: <http://purl.org/ontology/mo/>.
:sig a mo:Signal ;
mo:available_as <file:///Music/Artists/Paniq/Occidental.mp3> ;
mo:puid "341f7bfe-a570-76c7-c064-c33dae7f93c6" ;
mo:time :sig_int .
- In the PUID case, we have this existing term in the Music Ontology. In other cases, I think we'd want to define a new term for the semantics of the value, or perhaps just have generic Vamp terminology such as :
@prefix mo: <http://purl.org/ontology/mo/>.
@prefix vamp: <http://example.org/vamp/> .
:sig a mo:Signal ;
mo:available_as <file:///Music/Artists/Paniq/Occidental.mp3> ;
vamp:analysis_output :output ;
mo:time :sig_int .
:output a vamp:AnalysisOutput ;
vamp:plugin "vamp:qm-vamp-plugins:qm-fingerprinter:puid" ;
vamp:value "341f7bfe-a570-76c7-c064-c33dae7f93c6" .
- I need to study the Vamp API some more to know how workable this approach is. Any comments Chris ?
Session file
paniq-manythings.sv: The SV session file these are taken from. The SV file format is just bzipped XML; the uncompressed version is
paniq-manythings.xml.
Vamp Plugins
I've started producing RDF descriptions of Vamp plugins, using the "template-generator" tool now in Sonic Visualiser's SVN. The main thing these add to plugins' internally stored metadata is to provide RDF types for the output events and features of the plugin. This will help derive higher level knowledge from Vamp output, and compare the results of different Vamp plugins.
This clearly necessitates some additions to the Audio Features ontology, so I'd like to encourage some discussion here. The idea is that each Vamp output can have an associated Feature subclass, Event subclass, or both, using the vamp:computes_feature_type and vamp:computes_event_type properties.
--
ChrisSutton - 19 Nov 2007
So far, here are some plugins & their outputs in need of typing :
qm-vamp-plugins
- Plugin : qm-keydetector
- Output : tonic
- Output : mode
- Note : Defined in the (old) keys ontology as a restricted property but not a class -- CS
- Feature type :
- Output : key
- Plugin : qm-tempotracker
- Output : beats
- Event type : new af:Beat ?
- Output : detection_fn
- Note : Should we have a hierarchy of detection functions where, eg. a tempo detection function and a tonal change detection function both subclass af:DetectionFunction ? -- CS
- Feature type :
- Output : tempo
- Feature type : new af:Tempo ?
- Plugin : qm-tonalchange
- Output : tcstransform ("Transform to 6D Tonal Content Space")
- Output : tcfunction ("Tonal Change Detection Function")
- Output : changepositions ("Tonal Change Positions")
- Event type :
- Is this what the existing af:TonalOnset is for ? -- CS
vamp-aubio
- Plugin : aubioonset
- Output : onsets
- Output : detectionfunction
- Plugin : aubiopitch
- Output : frequency
- Feature type : new af:Pitch ?
- Plugin : aubiotempo
- Output : beats
- Event type : new af:Beat ? (as for qm-tempotracker)
vamp-example-plugins
- Plugin : amplitudefollower
- Output : amplitude
- Feature type : new af:Amplitude ?
- Plugin : percussiononsets
- Output : onsets
- Output : detectionfunction
- Feature type :
- as with aubioonset, qm-tempotracker and qm-tonalchange ? -- CS
- Plugin : spectralcentroid
- Output : logcentroid
- Feature type : new af:LogFrequencyCentroid, subclass of new af:SpectralCentroid ?
- Output : linearcentroid
- Feature type : new af:LinearFrequencyCentroid, subclass of new af:SpectralCentroid ?
- Plugin : zerocrossing
- Output : counts
- Feature type : new af:ZeroCrossingCount ?
- Output : zerocrossings
- Event type : new af:ZeroCrossing ?
- Plugin :
- Output :
- Feature type :
- Event type :