r9 - 22 Nov 2007 - 12:11:59 - MarkLevyYou are here: OMRAS2 >  Main Web  > AudioFeaturesOntology

Audio Features Ontology working document

This Wiki page aims to capture some requirements for a more expressive version of the Audio features ontology. The current version was tailored to fit the EASAIER requirements, and includes just a really limited subset of music/speech features.

Within the Music Ontology framework, an audio feature is represented as an event on the timeline of the signal, therefore classifying a region of it. Hopefully, all the features detailed here can be expressed by subsuming terms in the Event ontology.

This page should be editable by anybody, and we welcome all contributions.

Using this page

How to launch the discussion about a new feature?

Just add a bullet point to the list below, specifying what you want to express, with as many possible details and examples, to help the discussion process.

How to contribute to the discussion about a particular feature?

Just edit the discussion under the corresponding bullet point. Please add a link to a homepage, or a nickname allowing to identify your contributions.

N3/TTL notation

Some RDF examples may be given in N3/TTL notation rather than XML, for readability's sake. If you really love XML with all your heart, you can run these files through Triplr by appending the URL to http://triplr.org/rdf/ to get back an XML version. For example :

http://www.omras2.com/twiki/pub/Main/AudioFeaturesOntology/sv-time-instants-unlabelled-csv.n3 is available in XML at http://triplr.org/rdf/http://www.omras2.com/twiki/pub/Main/AudioFeaturesOntology/sv-time-instants-unlabelled-csv.n3

Discussion

Track-level features, generic operators

Where do these fit in? Generic operators commonly applied to frame-level features are things like mean, variance, entropy, difference... Their output is typically regarded as another low-level feature, even though the output of a summary operator like Mean of MFCCs, for example, is clearly not an event on a timeline.

-- MarkLevy - 22 Nov 2007

Sonic Visualiser's exported features

Sonic Visualiser exports features generated by Vamp plugins or manual annotation using its own XML format (not really intended to be an interchange format) or as comma-separated value (CSV) files. Here are some examples of the features produced by a few of the current Vamp plugins and annotation types, and the formats they end up in when exported.

Note onsets

sv-time-instants-unlabelled.xml sv-time-instants-unlabelled.csv: SV time instants layer without labels (note onsets). The instant times are defined in the XML using audio sample frame number, with the sample rate specified in the model element. In the CSV, times are just given in seconds. This is the output SV produces natively from the QM onset detector plugin (vamp:qm-vamp-plugins:qm-onsetdetector:onsets).
  • These are relatively straightforward - Each time point is represented by something like :
:onset1 a af:Onset ;
     tl:onTimeLine :dtl ;
     tl:atInt 386000 .
  • The continuous time case (eg. CSV format) is simple, an example can be found in sv-time-instants-unlabelled-csv.n3
  • The discrete time case (eg. XML format) is a little harder as I'm not too familiar with the timeline mapping stuff. But I think it would be something like sv-time-instants-unlabelled-xml.n3
  • In both cases I've used the existing af:Onset class from the audio features ontology. If SV didn't know the type of feature being calculated, these could simply use the superclass of tl:Instant instead.

Beats

sv-time-instants-labelled.xml sv-time-instants-labelled.csv: SV time instants layer with labels (beat positions with the tempo noted in the label). This is the output SV produces natively from the QM beat tracker plugin (vamp:qm-vamp-plugins:qm-tempotracker:beats).
  • These are very similar to the two onset examples above. You just add rdfs:label triples to attach (any number of) labels to each instant. I've used the existing af:Beat class from the audio features ontology here, but again, the generic tl:Instant class could be used instead.

Function of time

sv-time-values-unlabelled.xml sv-time-values-unlabelled.csv: SV time values layer without labels (note onset detection function). Note that in principle there can be any number of points at the same time. Again, the times are by audio sample frame. This is the output SV produces from the detection-function output of the QM onset detector plugin (vamp:qm-vamp-plugins:qm-onsetdetector:detection_fn).

Notes

sv-notes.xml sv-notes.csv: SV notes layer. This is a very limited representation, even more so than MIDI: it doesn't even have velocity for example. The pitch is frequency in Hz; the units attribute of the model element tells SV that (otherwise it would treat the pitch as a MIDI pitch value). I notice that the CSV gives the start time in seconds but the duration in sample frames. There's a sort-of good reason for that, but it probably can and should be fixed. This is the output SV produces from the Aubio note tracker plugin (vamp:vamp-aubio:aubionotes:notes). The plugin actually just returns a pair of values for each result, which SV assumes to represent frequency and duration; a major limitation of Vamp is that it has neither an explicit duration field for a result nor separate units for the values in a multi-valued result, so returning a note or interval always needs some such back-channel "understanding" between plugin and host.

Grid

sv-chromagram.xml sv-chromagram.csv: SV colour 3d plot layer (chromagram). This is the output SV produces from the QM chromagram plugin (vamp:qm-vamp-plugins:qm-chromagram:chromagram). The XML contains labels for the output bins, which are currently lost in the CSV.

Text

sv-text.xml sv-text.csv: SV text layer, labels entered by hand. Each label has a time, which again is a sample frame in the XML and seconds in the CSV; Y coordinate in the range 0 (bottom of pane) to 1 (top of pane); and label content.
  • For these, you could either just label instants as in the "Beats" case above, or define your own subclass of tl:Instant to use instead. You would want to define a sv:height_in_pane or something for the Y coordinate.
  • Each label becomes something like :
:label5 a sv:TimeLabel ;
     tl:onTimeLine :rtl ;
     tl:at 8.032500000 ;
     sv:height_in_pane 0.524823 ;
     rdfs:label "bop!" .
  • Full example : sv-text-csv.n3
  • The discrete time case (from sv-text.xml) would be analogous to the onsets and beats examples above.

Images

sv-images.xml sv-images.csv: SV images layer, image added by hand. An image has a time, image source (filename or URL) and label. Images don't have a Y coordinate in SV.

Single values

sv-single-value.xml sv-single-value.csv: SV has no representation for results that consist of a single value, or even a set of values, that have no specific time and are instead associated with the entire input (e.g. an overall tempo estimate for a roughly fixed-tempo track, or textual or numeric metadata). Consequently these tend to turn up as time instants with time zero and the "value" in the label, or time values with time zero if the value is amenable to being stored in a floating point variable. This example contains the MusicDNS? PUID for the track.
  • I think you're looking to associate a value with the Signal object associated with an audio file, in which case it's just something like this :
@prefix mo: <http://purl.org/ontology/mo/>.
:sig a mo:Signal ;
    mo:available_as <file:///Music/Artists/Paniq/Occidental.mp3> ;
    mo:puid "341f7bfe-a570-76c7-c064-c33dae7f93c6" ;
    mo:time :sig_int .
  • In the PUID case, we have this existing term in the Music Ontology. In other cases, I think we'd want to define a new term for the semantics of the value, or perhaps just have generic Vamp terminology such as :
@prefix mo: <http://purl.org/ontology/mo/>.
@prefix vamp: <http://example.org/vamp/> .

:sig a mo:Signal ;
    mo:available_as <file:///Music/Artists/Paniq/Occidental.mp3> ;
    vamp:analysis_output :output ;
    mo:time :sig_int .
:output a vamp:AnalysisOutput ;
         vamp:plugin "vamp:qm-vamp-plugins:qm-fingerprinter:puid" ;
         vamp:value "341f7bfe-a570-76c7-c064-c33dae7f93c6" .
  • I need to study the Vamp API some more to know how workable this approach is. Any comments Chris ?

Session file

paniq-manythings.sv: The SV session file these are taken from. The SV file format is just bzipped XML; the uncompressed version is paniq-manythings.xml.

Vamp Plugins

I've started producing RDF descriptions of Vamp plugins, using the "template-generator" tool now in Sonic Visualiser's SVN. The main thing these add to plugins' internally stored metadata is to provide RDF types for the output events and features of the plugin. This will help derive higher level knowledge from Vamp output, and compare the results of different Vamp plugins.

This clearly necessitates some additions to the Audio Features ontology, so I'd like to encourage some discussion here. The idea is that each Vamp output can have an associated Feature subclass, Event subclass, or both, using the vamp:computes_feature_type and vamp:computes_event_type properties.

-- ChrisSutton - 19 Nov 2007

So far, here are some plugins & their outputs in need of typing :

qm-vamp-plugins

  • Plugin : qm-chromagram
    • Output : chromagram
      • Feature type :

  • Plugin : qm-keydetector
    • Output : tonic
      • Feature type :
    • Output : mode
      • Note : Defined in the (old) keys ontology as a restricted property but not a class -- CS
      • Feature type :
    • Output : key

  • Plugin : qm-tempotracker
    • Output : beats
      • Event type : new af:Beat ?
    • Output : detection_fn
      • Note : Should we have a hierarchy of detection functions where, eg. a tempo detection function and a tonal change detection function both subclass af:DetectionFunction ? -- CS
      • Feature type :
    • Output : tempo
      • Feature type : new af:Tempo ?

  • Plugin : qm-tonalchange
    • Output : tcstransform ("Transform to 6D Tonal Content Space")
      • Feature type :
    • Output : tcfunction ("Tonal Change Detection Function")
      • Feature type :
    • Output : changepositions ("Tonal Change Positions")
      • Event type :
        • Is this what the existing af:TonalOnset is for ? -- CS

vamp-aubio

  • Plugin : aubionotes
    • Output : notes
      • Event type :

  • Plugin : aubioonset
    • Output : onsets
      • Event type : af:Onset ?
    • Output : detectionfunction
      • Feature type :

  • Plugin : aubiopitch
    • Output : frequency
      • Feature type : new af:Pitch ?

  • Plugin : aubiotempo
    • Output : beats
      • Event type : new af:Beat ? (as for qm-tempotracker)

vamp-example-plugins

  • Plugin : amplitudefollower
    • Output : amplitude
      • Feature type : new af:Amplitude ?

  • Plugin : percussiononsets
    • Output : onsets
      • Event type : af:Onset ?
    • Output : detectionfunction
      • Feature type :
        • as with aubioonset, qm-tempotracker and qm-tonalchange ? -- CS

  • Plugin : spectralcentroid
    • Output : logcentroid
      • Feature type : new af:LogFrequencyCentroid, subclass of new af:SpectralCentroid ?
    • Output : linearcentroid
      • Feature type : new af:LinearFrequencyCentroid, subclass of new af:SpectralCentroid ?

  • Plugin : zerocrossing
    • Output : counts
      • Feature type : new af:ZeroCrossingCount ?
    • Output : zerocrossings
      • Event type : new af:ZeroCrossing ?

  • Plugin :
    • Output :
      • Feature type :
      • Event type :

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
xmlxml sv-time-instants-unlabelled.xml manage 63.9 K 19 Oct 2007 - 10:24 ChrisCannam SV time instants layer without labels (note onsets)
elsecsv sv-time-instants-unlabelled.csv manage 21.1 K 19 Oct 2007 - 10:25 ChrisCannam SV time instants layer without labels (note onsets) as CSV
xmlxml sv-time-instants-labelled.xml manage 24.9 K 19 Oct 2007 - 10:25 ChrisCannam SV time instants layer with labels (beats with tempo label)
elsecsv sv-time-instants-labelled.csv manage 11.7 K 19 Oct 2007 - 10:26 ChrisCannam SV time instants layer with labels (beats with tempo label) as CSV
xmlxml sv-time-values-unlabelled.xml manage 1150.1 K 19 Oct 2007 - 10:26 ChrisCannam SV time values layer without labels (note onset detection function)
elsecsv sv-time-values-unlabelled.csv manage 432.9 K 19 Oct 2007 - 10:27 ChrisCannam SV time values layer without labels (note onset detection function) as CSV
xmlxml sv-notes.xml manage 102.8 K 19 Oct 2007 - 10:27 ChrisCannam SV notes layer
elsecsv sv-notes.csv manage 37.3 K 19 Oct 2007 - 10:27 ChrisCannam SV notes layer as CSV
xmlxml sv-chromagram.xml manage 173.5 K 19 Oct 2007 - 10:28 ChrisCannam SV colour 3d plot layer (chromagram)
elsecsv sv-chromagram.csv manage 137.9 K 19 Oct 2007 - 10:28 ChrisCannam SV colour 3d plot layer (chromagram) as CSV
xmlxml sv-images.xml manage 0.5 K 19 Oct 2007 - 11:59 ChrisCannam SV images layer
elsecsv sv-images.csv manage 0.1 K 19 Oct 2007 - 12:00 ChrisCannam SV images layer as CSV
elsesv paniq-manythings.sv manage 161.3 K 19 Oct 2007 - 12:13 ChrisCannam The SV session file these are taken from
xmlxml paniq-manythings.xml manage 1301.5 K 19 Oct 2007 - 12:14 ChrisCannam Uncompressed XML of the SV session file
xmlxml sv-text.xml manage 0.9 K 19 Oct 2007 - 12:14 ChrisCannam SV text layer
elsecsv sv-text.csv manage 0.3 K 19 Oct 2007 - 12:15 ChrisCannam SV text layer as CSV
xmlxml sv-single-value.xml manage 0.6 K 19 Oct 2007 - 12:15 ChrisCannam SV time instants layer with a single value in it
elsecsv sv-single-value.csv manage 0.1 K 19 Oct 2007 - 12:16 ChrisCannam SV time instants layer with a single value in it as CSV
elsen3 sv-time-instants-unlabelled-csv.n3 manage 0.8 K 23 Oct 2007 - 09:44 ChrisSutton RDF/N3 representation of note onsets (continuous time)
elsen3 sv-time-instants-unlabelled-xml.n3 manage 1.1 K 23 Oct 2007 - 09:44 ChrisSutton RDF/N3 representation of note onsets (discrete time)
elsen3 sv-time-instants-labelled-xml.n3 manage 1.1 K 23 Oct 2007 - 09:44 ChrisSutton RDF/N3 representation of labelled beats (discrete time)
elsen3 sv-time-instants-labelled-csv.n3 manage 0.9 K 23 Oct 2007 - 09:45 ChrisSutton RDF/N3 representation of labelled beats (continuous time)
elsen3 sv-text-csv.n3 manage 1.6 K 23 Oct 2007 - 10:37 ChrisSutton RDF/N3 representation of text labels (continous time)
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r9 < r8 < r7 < r6 < r5 | More topic actions
 
EPSRC OMRAS2
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding OMRAS2? Send feedback