* development of functionality
** exposure of all non-write functions over Web Services
radius query-type over SOAP. [DONE]
** matrix of possible queries
Three major types of result ranking are supported:
1 Averaged-Nearest Neighbours (a-NN)
2a Radius-bounded nearest neighbours (r-NN)
2b Approximate radius-bounded near neighbors (ar-NN) this is a-NN with LSH indexing
Averaged-NN and Radius-Bounded-NN use two different algorithms to sort tracks, they both report nearest-neighbour points within tracks.
The space of possible (sensible?) queries is
larger than this -- though working out the sensible abstraction might
have to wait for more use cases -- and also that the orthogonality of
various parameters is missing. (e.g. a silence threshold should be
applied to all queries or none, if it makes sense at all.)
Additionally, query by key (filename) might be important. [DONE by Michael]
** results
Need to sort out what the results mean; is it a similarity or a
distance score, etc. Also, is it possible to support NN queries in a
non-Euclidean space?
E.g. Embedding Earth-Mover's Distance in L1
** SOAP / URIs
Define a query data structure that can be serialised (preferably automatically) by SOAP for use in queries.
QueryByKey? solves most of this, but features, powers and restrict lists (keyLists) are not currently serialized.
Add support for serealizing features over Web Services
If we ever support inserting or other write functionality over SOAP,
this will need doing for feature files (the same as queries) and for
key lists too.
** Memory management tricks
For non-LSH search, investigate whether madvise() tricks improve
performance on any OSes. Also, maybe investigate a specialized use of
GetViewOfFile? on win32 to make it tolerable on that platform.
** LSH
DONE
** RDF (not necessarily related to audioDB)
Export the results of our experiments (kept in an SQL database) as
RDF, so that people can infer stuff if they know enough about our
methods.
Possibly also write an export routine for exporting an audioDB as RDF.
And laugh hollowly as XML parsers fail completely to ingest such a
monstrous file.
* architectural issues
** more safety
A couple of areas are not yet safe against runtime faults.
LARGE_ADB format supports millions of tracks.
For non-LARGE_ADB format Large databases might well end up writing off the end of the
various tables (e.g. track, l2norm).
* transactionality is important; the last thing that should be
updated on insert are the free pointers (dbH->length,
dbH->numFiles, maybe others), so that if something goes wrong in
the meantime the database is not in an inconsistent state.
[Michael Thinks that this is DONE. Needs testing in all cases.]
** API vs command-line
API version 1 coming soon.
But most functionality is still accessed by faking command-line calls.
Having the "business logic" run by the constructor is also a little bit weird.
* regression (and other) tests
** Command line interface
There is now broad coverage of the audioDB logic, with the major
exceptions of the batch insert command, and the specifying of
different keys on import.
** SOAP
The shell's support for wait() and equivalents is limited, so there
are "sleep 1"s dotted around to attempt to avoid race conditions.
Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
that ought not to be necessary just to run the same test twice...
** Locking
The fcntl() locking should be good enough for our uses. Investigate
whether it is in fact robust enough (including that EAGAIN workaround
for OS X; read the kernel source to find out where that's coming from
and report it if possible).
** Benchmarks
Get together a realistic set of usage cases, preferably testing each
of the query types, and benchmark them automatically. This is
basically a prerequisite of any performance work.
--
MichaelCasey - 17 Sep 2008