Pandas integration
CsvDirFile exposes a single concatenated CSV text stream (Unicode strings from read / readline) suitable for pandas.read_csv.
Basic usage
import pandas as pd
from csvdir import CsvDirFile
f = CsvDirFile("/data/csvs", on_mismatch="skip")
df = pd.read_csv(f)
Or with a context manager:
pandas is an optional dependency — install separately (pip install pandas).
What CsvDirFile does
- Discover CSV files (same sorting as dict readers).
- Choose a canonical header sequence (headers guide — sequence-sensitive stitching).
- Emit that header once, then concatenate body lines in sorted path order, matching traversal order used by
read_dir-style iterators. - Subsequent files omit their duplicate header rows when sequences match (
on_mismatchcontrols skips vs errors).
Data is produced lazily; the full directory is not loaded into memory.
File-like API
| Method | Support |
|---|---|
read(size=-1) |
Yes |
readline() |
Yes |
readlines() |
Yes |
__iter__ |
Yes (lines) |
seek(0) |
Restart stream |
seek(other) |
UnsupportedOperation |
tell() |
Logical position |
close() |
Release generator |
| Context manager | Yes |
Configuration
CsvDirFile(
path="/data",
extension="csv",
delimiter=",",
encoding="utf-8",
strict_headers=False,
expected_headers=None,
on_mismatch="error",
recurse=False,
include_hidden=False,
)
Header matching for stitching is order-sensitive (unlike dict iterators). See headers.
Restarting reads
Alternatives
If you need per-file control or set-based header matching only (column order may differ across files), iterate paths and load each CSV:
import pandas as pd
from csvdir import read_dir
frames = []
r = read_dir("/data")
for path in r.paths:
frames.append(pd.read_csv(path))
df = pd.concat(frames, ignore_index=True)
For many files with identical schemas (and stitchable sequences), CsvDirFile is usually simpler.
Dtype and parse options
Pass any pandas.read_csv keyword arguments as usual:
csvdir does not interpret types; pandas sees one continuous CSV text stream.