r7 - 10 Sep 2008 - 20:35:05 - MichaelCaseyYou are here: OMRAS2 >  Main Web  >  TWikiUsers > MichaelCasey > AudioDB > AudioDBDeveloperDiscussionPage > HashTableInCore

LSH Hashtable core format

The doubly linked lists that represent the LSH hash tables have high overhead due to being heap allocated.

Instead we are making a compressed core format that encodes the t2 hashes and point indexes in efficient form requiring no pointers.

The on-disk format is currently:

{ [ T1 t1 [T2 t2 p+]+ ]* E}^L

The in-core format will be an array of 32-bit cells:

   * -> pointer
   * t1 32-bit hash
   * t2 32-bit hash
   * p. 30-bit pointID with 2 skip bits
   * p  30-bit pointID without skip bits
   * E   end of hash-table row token


Retrieval from Hashtable Array

t1->[ t2 p. [p. | NULL.] [p+(1..15)] ]+(1..*) E

  • t2 values are in sort order <
  • pointID is divided into 18bit track and 12bit vector index
  • two-bit skip counter over two points (dummy point if necessary)
  • CONSIDER 64-bit pointID

SKIP TABLE

MS 2 skip bitspoint1 data 30 bits
LS 2 skip bitspoint2 data 30 bits

Used for retrieval from query_t2:

// *p is a pointer to the beginning of a hashtable row array                                                                                                    
 // The array consists of t2 hash keys and one or more point identifiers p for each hash key                                                                     
 // Retrieval is performed by generating a hash key query_t2 for query point q                                                                                   
 // We identify the row that t2 is stored in using a secondary hash t1, this row is the entry                                                                    
 // point for retrieve_from_core_hashtable_array                                                                                                                 
#define SKIP_BITS (0xC0000000)
void G::retrieve_from_core_hashtable_array(Uns32T* p, Uns32T qpos){
  Uns32T skip;
  Uns32T t2;
  Uns32T p1;
  Uns32T p2;

  CR_ASSERT(p);

  do{
    t2 = *p++;
    if( t2 > H::t2 )
      return;
    p1 = *p++;
    p2 = *p++;
    skip = (( p1 & SKIP_BITS ) >> SKIP_BITS_RIGHT_SHIFT_LSB) + (( p2 & SKIP_BITS ) >> SKIP_BITS_RIGHT_SHIFT_MSB);
    if( t2 == H::t2 ){
      add_point_callback(calling_instance, p1 ^ (p1 & SKIP_BITS), qpos, radius);
      if(skip--){
        add_point_callback(calling_instance, p2 ^ (p2 & SKIP_BITS), qpos, radius);
        while(skip-- )
          add_point_callback(calling_instance, *p++, qpos, radius);
      }
    }
    else
      if(*p != LSH_CORE_ARRAY_END_ROW_TOKEN)
        p = p + skip;
  }while( *p != LSH_CORE_ARRAY_END_ROW_TOKEN );
}

Insertion code sketch

// Read Disk Format2 LSH into Core Format2 LSH
Uns32T G::unserialize_hashtable_row_coreFormat2(int fid, bucket** b){
   ...
      bucket_insert_point_format2(b, H::t2, H::p);
   ...
}

// pp address of Hashtable row pointer ( hashTable[ j ] + t1 )
// t2 32-bit hash value
// p  30-bit point value
void H::bucket_insert_point_format2(bucket **pp, Uns32T t2, Uns32T p, Uns32T N){
  if( *pp == NULL ){
    *pp = new Uns32T[ N ];
     (*pp)[0] = t2;
     (*pp)[1] = p; // No skip bits required
     (*pp)[2] = ET;
}else
  __bucket_insert_point_format2(*pp, t2, p);                                                                                      
}

 void __bucket_insert_point_format2(bucket *pp, Uns32T t2, Uns32T p ){
   bucket* p = seek_t2_or_endpos(t2);
   if( *p == ET )
          // Make a new chain at this cell (p)
   else     
          // Merge existing chain at this cell (p)
}

-- MichaelCasey - 15 Aug 2008

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r7 < r6 < r5 < r4 < r3 | More topic actions
 
EPSRC OMRAS2
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding OMRAS2? Send feedback