docs2passages#

Divide a document collection into N-word/token passage spans (with wrap-around for last passage).

Functions

main

process_page

Wraps around if we split: make sure last passage isn't too short.