primeqa.ir.dense.colbert_top.colbert.data.examples.Examples#

class primeqa.ir.dense.colbert_top.colbert.data.examples.Examples(path=None, data=None, nway=None, provenance=None)#

Bases: object

Methods

cast

provenance

save

toDict

tolist

NOTE: For distributed sampling, this isn't equivalent to perfectly uniform sampling.

tolist(rank=None, nranks=None)#

NOTE: For distributed sampling, this isn’t equivalent to perfectly uniform sampling. In particular, each subset is perfectly represented in every batch! However, since we never repeat passes over the data, we never repeat any particular triple, and the split across nodes is random (since the underlying file is pre-shuffled), there’s no concern here.