ThunderDots¶
ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.
It helps you move from a remote DTS API to structured Python objects and JSON records that can feed indexing pipelines (including full-text search, RAG-vector databases) or corpus-analysis workflows.
What ThunderDots does¶
ThunderDots can:
- walk DTS collections and subcollections;
- fetch resources and TEI/XML documents;
- extract text fragments from documents, DTS navigation, or custom TEI XPath rules;
- preserve or filter Dublin Core and extension metadata;
- validate generated outputs;
- export records to indexation pipelines (like Elasticsearch or Qdrant-compatible formats);
- cache fetched corpora as JSON and CSV.
Installation¶
With uv
uv add thunderdots
With pip
pip install thunderdots
Minimal example¶
from thunderdots import ThunderDots
td = ThunderDots(
endpoint_dts="https://dots.chartes.psl.eu/api/dts",
collection_params={"collection_id": "ENCPOS_1900"},
resource_params={"fragment_mode": "document"},
)
td.fetch()
results = td.results()
License¶
Citation¶
@software{terriel_thunderdots_2026,
author = {Terriel, Lucas},
title = {ThunderDots},
year = {2026},
publisher = {GitHub},
institution = {{École nationale des chartes}},
url = {https://github.com/chartes/thunderdots},
note = {Python client for Distributed Text Services (DTS) via DoTS}
}