Skip to content

Iteration

read_dir

read_dir is the main entry point. It discovers CSV files under path (default ".") and returns either a CsvDir or CsvChunksDir depending on chunksize.

from csvdir import read_dir

reader = read_dir("/exports", extension="csv", delimiter=",")
for row in reader:
    ...

Return type

chunksize Type Each iteration yields
None (default) CsvDir dict[str, str]
positive int CsvChunksDir list[dict[str, str]]

chunksize must be positive

chunksize 0, negative integers, or any value < 1 raise ValueError with message chunksize must be a positive integer.

Row shape

Every row is a plain dictionary:

  • Keys — header names from the file (BOM stripped from the first column name when present)
  • Values — strings; missing/None cells become ""
{"id": "42", "name": "Ada", "active": "true"}

File order

Paths come from pathing.get_csv_paths, which returns a sorted list. Order is stable across runs on the same filesystem. CsvDirFile emits stitched body lines in this same sorted order (pandas).

For how header names are compared across files (read_dir) vs stitched sequences (CsvDirFile), see Headers.

Properties

On CsvDir (and the chunked reader):

  • pathslist[str] of absolute or relative paths to matched files
  • nameslist[str] of filename stems (extension removed)
r = read_dir("/data")
print(r.paths)   # ['/data/a.csv', '/data/b.csv']
print(r.names)   # ['a', 'b']

Tagged iteration

Helper methods return new iterator objects that share the same configuration but attach a file label to each row.

with_names() / enumerate()

Alias pair on CsvDir. Yields (stem, row):

for stem, row in read_dir("/data").with_names():
    print(stem, row["id"])

stem is the filename without extension, e.g. reports_2024 from reports_2024.csv.

with_paths()

Yields (path, row) with the full path string:

for path, row in read_dir("/data").with_paths():
    print(path, row)

On chunked readers, the same helpers yield (label, chunk) where chunk is list[dict].

read_dir_chunks

Equivalent to read_dir(path, chunksize=n) but requires an explicit chunk size:

from csvdir import read_dir_chunks

for chunk in read_dir_chunks("/data", chunksize=500):
    ...

Use whichever style reads clearer in your codebase.

Multiple passes

Iterator objects read from disk lazily. To scan the directory again, create a new read_dir(...) call or re-instantiate helpers like .with_names().

CsvDirFile supports seek(0) to restart the concatenated stream (see pandas).