Pandas integration

CsvDirFile exposes a single concatenated CSV text stream (Unicode strings from read / readline) suitable for pandas.read_csv.

Basic usage

import pandas as pd
from csvdir import CsvDirFile

f = CsvDirFile("/data/csvs", on_mismatch="skip")
df = pd.read_csv(f)

Or with a context manager:

with CsvDirFile("/data/csvs") as f:
    df = pd.read_csv(f)

pandas is an optional dependency — install separately (pip install pandas).

What `CsvDirFile` does

Discover CSV files (same sorting as dict readers).
Choose a canonical header sequence (headers guide — sequence-sensitive stitching).
Emit that header once, then concatenate body lines in sorted path order, matching traversal order used by read_dir-style iterators.
Subsequent files omit their duplicate header rows when sequences match (on_mismatch controls skips vs errors).

Data is produced lazily; the full directory is not loaded into memory.

File-like API

Method	Support
`read(size=-1)`	Yes
`readline()`	Yes
`readlines()`	Yes
`__iter__`	Yes (lines)
`seek(0)`	Restart stream
`seek(other)`	`UnsupportedOperation`
`tell()`	Logical position
`close()`	Release generator
Context manager	Yes

Configuration

CsvDirFile(
    path="/data",
    extension="csv",
    delimiter=",",
    encoding="utf-8",
    strict_headers=False,
    expected_headers=None,
    on_mismatch="error",
    recurse=False,
    include_hidden=False,
)

Header matching for stitching is order-sensitive (unlike dict iterators). See headers.

Restarting reads

f = CsvDirFile("/data")
df1 = pd.read_csv(f)
f.seek(0)
df2 = pd.read_csv(f)

Alternatives

If you need per-file control or set-based header matching only (column order may differ across files), iterate paths and load each CSV:

import pandas as pd
from csvdir import read_dir

frames = []
r = read_dir("/data")
for path in r.paths:
    frames.append(pd.read_csv(path))
df = pd.concat(frames, ignore_index=True)

For many files with identical schemas (and stitchable sequences), CsvDirFile is usually simpler.

Dtype and parse options

Pass any pandas.read_csv keyword arguments as usual:

pd.read_csv(
    CsvDirFile("/data"),
    dtype={"id": str},
    parse_dates=["timestamp"],
)

csvdir does not interpret types; pandas sees one continuous CSV text stream.