Getting Started
Installation
uv sync
Configuration
Edit the_met_art_dataset/config.json to set your scraping parameters:
{
"isHighlight": "true",
"departmentId": 19,
"q": "*",
"limit": 200,
"output": "data/dbs/art_db.json"
}
| Field | Description |
|---|---|
isHighlight |
Only return highlighted objects ("true" / "false") |
departmentId |
Filter by Met department ID |
q |
Search query string ("*" returns all) |
limit |
Maximum number of objects to scrape |
output |
Path to the output JSON file |
Scraping
make scrape
Or directly:
uv run the_met_art_dataset/scraper.py -config the_met_art_dataset/config.json
Example output:
Full URL sent by Python: https://collectionapi.metmuseum.org/public/collection/v1/search?isHighlight=True&departmentId=20&q=%2A
✅ Saved 'Hunting and fishing scenes' as 229770.jpg
✅ Saved 'Quilt' as 229936.jpg
✅ Saved 'Cravat end' as 227284.jpg
Photo 488551 not of public domain
Photo 269091 not of public domain
Images are saved to data/images/ and metadata to the path defined in output.
Filtering
make filter
Or with custom parameters:
uv run the_met_art_dataset/filter.py \
--input data/dbs/art_db.json \
--output data/dbs/filtered.json \
--exclude asian