primeqa.ir.dense.colbert_top.colbert.indexing.codecs.residual.ResidualCodec#

class primeqa.ir.dense.colbert_top.colbert.indexing.codecs.residual.ResidualCodec(config, centroids, avg_residual=None, bucket_cutoffs=None, bucket_weights=None)#

Bases: object

Methods

`binarize`
`compress`
`compress_into_codes`	EVENTUALLY: Fusing the kernels or otherwise avoiding materalizing the entire matrix before max(dim=0)
`decompress`	We batch below even if the target device is CUDA to avoid large temporary buffers causing OOM.
`load`
`lookup_centroids`	Handles multi-dimensional codes too.
`save`
`try_load_torch_extensions`

compress_into_codes(embs, out_device)#

EVENTUALLY: Fusing the kernels or otherwise avoiding materalizing the entire matrix before max(dim=0): seems like it would help here a lot.

decompress(compressed_embs: primeqa.ir.dense.colbert_top.colbert.indexing.codecs.residual_embeddings.ResidualEmbeddings)#: We batch below even if the target device is CUDA to avoid large temporary buffers causing OOM.

lookup_centroids(codes, out_device)#

Handles multi-dimensional codes too.

EVENTUALLY: The .split() below should happen on a flat view.