API Reference
scrape_met_paintings(params, limit=200, output='data/dbs/art_db.json')
Scrape artwork metadata and images from the Met Museum public API.
Searches the Met collection using the given query parameters, then for each
result fetches object metadata and downloads the primary image if it is
public domain. Images are saved to data/images/<id>.jpg.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
dict
|
Query parameters forwarded to the Met search endpoint
(e.g. |
required |
limit
|
int
|
Maximum number of object IDs to process. Defaults to 200. |
200
|
output
|
str
|
Path to the output JSON file. Defaults to |
'data/dbs/art_db.json'
|
Source code in the_met_art_dataset/scraper.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
Filter Met Museum artwork records by department keyword.
filter_by_department(source, output, exclude)
Filter artwork records by excluding a department keyword.
Reads a JSON database of artwork entries, removes any records whose
department field contains the given keyword (case-insensitive), and
writes the cleaned dataset to a new file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
Path to the input JSON file. |
required |
output
|
str
|
Path to write the filtered JSON file. |
required |
exclude
|
str
|
Keyword to exclude from the |
required |
Source code in the_met_art_dataset/filter.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |