primeqa.ir.dense.colbert_top.colbert.indexing.codecs.residual.ResidualCodec#

class primeqa.ir.dense.colbert_top.colbert.indexing.codecs.residual.ResidualCodec(config, centroids, avg_residual=None, bucket_cutoffs=None, bucket_weights=None)#

Bases: object

Methods

binarize

compress

compress_into_codes

EVENTUALLY: Fusing the kernels or otherwise avoiding materalizing the entire matrix before max(dim=0)

decompress

We batch below even if the target device is CUDA to avoid large temporary buffers causing OOM.

load

lookup_centroids

Handles multi-dimensional codes too.

save

try_load_torch_extensions

compress_into_codes(embs, out_device)#
EVENTUALLY: Fusing the kernels or otherwise avoiding materalizing the entire matrix before max(dim=0)

seems like it would help here a lot.

decompress(compressed_embs: primeqa.ir.dense.colbert_top.colbert.indexing.codecs.residual_embeddings.ResidualEmbeddings)#

We batch below even if the target device is CUDA to avoid large temporary buffers causing OOM.

lookup_centroids(codes, out_device)#

Handles multi-dimensional codes too.

EVENTUALLY: The .split() below should happen on a flat view.