Iteration
read_dir
read_dir is the main entry point. It discovers CSV files under path (default ".") and returns either a CsvDir or CsvChunksDir depending on chunksize.
from csvdir import read_dir
reader = read_dir("/exports", extension="csv", delimiter=",")
for row in reader:
...
Return type
chunksize |
Type | Each iteration yields |
|---|---|---|
None (default) |
CsvDir |
dict[str, str] |
positive int |
CsvChunksDir |
list[dict[str, str]] |
chunksize must be positive
chunksize 0, negative integers, or any value < 1 raise ValueError with message chunksize must be a positive integer.
Row shape
Every row is a plain dictionary:
- Keys — header names from the file (BOM stripped from the first column name when present)
- Values — strings; missing/
Nonecells become""
File order
Paths come from pathing.get_csv_paths, which returns a sorted list. Order is stable across runs on the same filesystem. CsvDirFile emits stitched body lines in this same sorted order (pandas).
For how header names are compared across files (read_dir) vs stitched sequences (CsvDirFile), see Headers.
Properties
On CsvDir (and the chunked reader):
paths—list[str]of absolute or relative paths to matched filesnames—list[str]of filename stems (extension removed)
Tagged iteration
Helper methods return new iterator objects that share the same configuration but attach a file label to each row.
with_names() / enumerate()
Alias pair on CsvDir. Yields (stem, row):
stem is the filename without extension, e.g. reports_2024 from reports_2024.csv.
with_paths()
Yields (path, row) with the full path string:
On chunked readers, the same helpers yield (label, chunk) where chunk is list[dict].
read_dir_chunks
Equivalent to read_dir(path, chunksize=n) but requires an explicit chunk size:
Use whichever style reads clearer in your codebase.
Multiple passes
Iterator objects read from disk lazily. To scan the directory again, create a new read_dir(...) call or re-instantiate helpers like .with_names().
CsvDirFile supports seek(0) to restart the concatenated stream (see pandas).