r2 - 04 Sep 2008 - 16:27:48 - MichaelCaseyYou are here: OMRAS2 >  Main Web  >  TWikiUsers > MichaelCasey > AudioDB > LargeADBTutorial

Why LARGE_ADB ?

Normally, AudioDB makes a database by copying feature files into a large table so that they can be accessed conveniently and efficiently at QUERY time.

However, this strategy is redundant for databases that will remain in the same location as they are built and if the feature files remain on the filesystem. Instead of copying the features into the database, the LARGE_ADB format keeps only links to the feature files (and power files). These links are followed when they are needed; i.e. when the following audioDB commands are issued:

  • --INSERT
  • --INDEX
  • --QUERY
  • --SAMPLE

A FLAG in the AudioDB --STATUS output indicates whether an AudioDB instance is LARGE_ADB format or not.

  • audioDB -d mydb.adb -S
num files:43360 data dim:20 total vectors:1082798 vectors available:7797330 total bytes:173247680 (12.1935%) bytes available:1247572800 (87.8065%) flags: l2norm[on] minmax[off] power[on] times[off] largeADB[on] null count: 0 small sequence count 8

How to make a LARGE_ADB

First, make a new audioDB database instance specifying --ntracks>20000

  • audioDB -N -d mydb.adb --ntracks 50000

Set the L2norm and Power flags as usual

  • audioDB -L -d mydb.adb
  • audioDB -P -d mydb.adb

Make three text files, featureList.txt, powerList.txt, keyList.txt containing the features, powers and keys, one per line respectively.

ABSOLUTE PATHS

Because the LARGE_ADB does not copy the features into the database, it needs to be able to resolve the path to the features at query time. One way to ensure this is to use ABSOLUTE paths in the featureList.txt and powerList.txt files. Make subdirectories containing about 100 feature files each so that the filesystem isn't weighted down with directories with large numbers of entries. E.g.:

featureList.txt
/path/to/features/A/file001.feat
/path/to/features/A/file002.feat
...
/path/to/features/A/file100.feat
/path/to/features/B/file101.feat
/path/to/features/B/file102.feat
...
/path/to/features/B/file199.feat
/path/to/features/C/file200.feat
...

powerList.txt
/path/to/features/A/file001.power
/path/to/features/A/file002.power
...
/path/to/features/A/file100.power
/path/to/features/B/file101.power
/path/to/features/B/file102.power
...
/path/to/features/B/file199.power
/path/to/features/C/file200.power
...

Issue the audioDB Batch Insert command (-B):

  • audioDB -d mydb.adb -B -F featureList.txt -W powerList.txt -K keyList.txt

HARD LINKING WITH LN

One caveat with ABSOLUTE pathnames is that features cannot be moved to a new location without re-building the database. This is because the database contains the ABSOLUTE path names of each of the feature and power files.

To get around this caveat we recommend the use of hard links via the UNIX "ln" command; this feature is available in UNIX and works with EXT and NTFS filesystems, but it is not available for FAT32 filesystems. The commonly available MyBook? external drive, for example, is pre-formatted as FAT32, so does not support hard links.

If hard linking is available, the ln command is used to mirror the directory and file structure of the features in a convenient location. The DATA is not copied, only the references are copied. If the features move, the hard links are automatically updated by the UNIX file system to point to the new data location, so AudioDB in LARGE_ADB format will still be able to locate the features. Hard links cannot link directories, or across file systems. The linked directory structure can be different from the original feature's structure, so hard linking is recommended to make the directory hierarchies discussed above.

RELATIVE PATHS

An alternative to using ABSOLUTE paths or HARD LINKS is to use RELATIVE paths for the features and powers and to resolve the path at QUERY time by issuing the --adb_root and --adb_feature_root arguments. E.g., for the same features above, make the featureList.txt and powerList.txt files as:

featureList.txt
A/file001.feat
A/file002.feat
...
A/file100.feat
B/file101.feat
B/file102.feat
...
B/file199.feat
C/file200.feat
...

powerList.txt
A/file001.power
A/file002.power
...
A/file100.power
B/file101.power
B/file102.power
...
B/file199.power
C/file200.power
...

To build the index, cd into /path/to/features (so that audioDB can locate the features and insert them with their relative path names), and issue the Batch insert command:

  • audioDB -d mydb.adb -B -F featureList.txt -W powerList.txt -K keyList.txt

At QUERY time, audioDB needs to know where to find the features (since the paths are relative to the /path/to/features directory)

  • audioDB -d mydb.adb -Q seq -k key1 --adb_feature_root /path/to/features

Now the features are located relative to the /path/to/features directory.

For convenience, audioDB also sports a --adb_root argument, which takes /path/to/database as its value. This is equivalent to issuing commands with absolute database paths:

  • audioDB -d /path/to/database/myadb.adb -S
  • audioDB -d myadb.adb -S --adb_root /path/to/database

RELATIVE PATHS AND WEB SERVICES

When we issue an audioDB SERVER command, it is necessary to specify --adb_root so that the client doesn't need to have knowledge of the /path/to/database. The SERVER simply re-writes the client's mydb.adb argument as /path/to/database/mydb.adb.

  • audioDB -s 14476 --adb_root /path/to/database

Here, all databases that are queries via this SERVICE must be rooted at the same directory. I.e. the client must pass the correct RELATIVE path if the database is not located in the adb_root directory.

When using audioDB SERVER with LARGE_ADB format, it is necessary to provide the --adb_feature_root argument UNLESS the audioDB SERVER is started in the same directory that the features are located (/path/to/features):

  • audioDB -s 14476 --adb_root /path/to/database --adb_feature_root /path/to/features

-- MichaelCasey - 04 Sep 2008

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
EPSRC OMRAS2
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding OMRAS2? Send feedback