Files & Data Formats
Python makes it easy to read and write files. This page covers text files, JSON, and CSV.
Reading and Writing Text Files
Reading a File
from pathlib import Path
content = Path("data.txt").read_text(encoding="utf-8")
print(content)
Writing a File
from pathlib import Path
Path("output.txt").write_text("Hello, World!", encoding="utf-8")
Using Context Manager (Traditional Way)
with open("data.txt", "r", encoding="utf-8") as f:
content = f.read()
with open("output.txt", "w", encoding="utf-8") as f:
f.write("Hello, World!")
Always use context managers
The with statement automatically closes the file, even if an error happens.
File Modes
| Mode | Meaning |
|---|---|
"r" |
Read (default) |
"w" |
Write (overwrites file) |
"a" |
Append (adds to end) |
"x" |
Create (fails if file exists) |
"rb" |
Read binary |
"wb" |
Write binary |
Working with Paths
Use pathlib — it is the modern way to work with file paths:
from pathlib import Path
project_dir = Path("project")
config_file = project_dir / "config" / "settings.json"
print(config_file.exists()) # True/False
print(config_file.name) # "settings.json"
print(config_file.suffix) # ".json"
print(config_file.parent) # project/config
print(config_file.is_file()) # True/False
List Files in a Directory
from pathlib import Path
for file in Path("tests").glob("*.py"):
print(file.name)
for file in Path("project").rglob("*.py"):
print(file) # recursive search
JSON
JSON is the most common data format for APIs and configuration.
Reading JSON
import json
from pathlib import Path
data = json.loads(Path("data.json").read_text(encoding="utf-8"))
print(data["name"])
Writing JSON
import json
from pathlib import Path
data = {"name": "Alice", "scores": [90, 85, 92]}
Path("output.json").write_text(
json.dumps(data, indent=2, ensure_ascii=False),
encoding="utf-8",
)
JSON with Custom Types
import json
from datetime import datetime
def json_serializer(obj):
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError(f"Type {type(obj)} is not serializable")
data = {"timestamp": datetime.now()}
json_text = json.dumps(data, default=json_serializer)
CSV
CSV files store tabular data (rows and columns).
Reading CSV
import csv
from pathlib import Path
with Path("users.csv").open(encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(row["name"], row["email"])
Writing CSV
import csv
from pathlib import Path
users = [
{"name": "Alice", "email": "alice@example.com"},
{"name": "Bob", "email": "bob@example.com"},
]
with Path("output.csv").open("w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=["name", "email"])
writer.writeheader()
writer.writerows(users)
YAML
YAML is used for configuration files. It is more readable than JSON for complex settings.
Install
uv add pyyaml
Reading YAML
import yaml
from pathlib import Path
data = yaml.safe_load(Path("config.yaml").read_text(encoding="utf-8"))
print(data["database"]["host"])
Writing YAML
import yaml
from pathlib import Path
config = {
"database": {
"host": "localhost",
"port": 5432,
"name": "testdb",
},
"debug": True,
}
Path("config.yaml").write_text(
yaml.dump(config, default_flow_style=False),
encoding="utf-8",
)
Example YAML File
database:
host: localhost
port: 5432
name: testdb
api:
base_url: https://api.example.com
timeout: 30
debug: true
Always use yaml.safe_load()
Never use yaml.load() — it can execute arbitrary Python code. Use yaml.safe_load() instead.
Format Comparison
| Format | Best For | Human Readable | Comments |
|---|---|---|---|
| JSON | API responses, data exchange | Medium | No |
| CSV | Tabular data, spreadsheets | Yes | No |
| YAML | Configuration files | Yes | Yes |
Best Practices
- Always specify
encoding="utf-8"when opening files - Always use
with(context managers) for file operations - Use
pathlib.Pathinstead ofos.pathfor path handling - Validate data from external files before using it
- Handle
FileNotFoundErrorwhen reading files - Use
json.dumps(indent=2)for readable output - Never hardcode file paths — use configuration or parameters