A Temporal Knowledge Graph for Music Festival Lineup Forecasting
| Item Type: | Dataset |
|---|---|
| Title: | A Temporal Knowledge Graph for Music Festival Lineup Forecasting |
| Date: | 27 April 2026 |
| Creator: |
Gastinger, Julia ORCID: 0000-0003-1914-6723 ; Dieing, Thilo ORCID: 0009-0000-2256-0795 ; Meilicke, Christian ORCID: 0000-0002-0198-5396 ; Stuckenschmidt, Heiner ORCID: 0000-0002-0209-3859
|
| Divisions: | School of Business Informatics and Mathematics > Practical Computer Science II: Artificial Intelligence (Stuckenschmidt 2009-) |
| DDC Classification: |
004 Computer science, internet |
|---|---|
| Keywords: | Cultural Event Prediction; Temporal Knowledge Graphs; Temporal Knowledge Graph Forecasting |
| Abstract: | Predicting lineups for future festivals has practical value for organizers, booking agencies, and audiences: It can support booking decisions, lineup planning, and early ticket purchasing. Festival lineups emerge from complex relationships among artists, genres, releases, labels, and past performances, making this a natural fit for temporal Knowledge Graph (TKG) modeling. In this work, we present a TKG covering 380 music festivals over 55 years. The TKG includes information on festivals, artist tours, and artist metadata. We formalize festival lineup prediction as temporal link prediction between artists and festivals at future time stamps. We extract information from two sources: setlist.fm and MusicBrainz. setlist.fm (https://www.setlist.fm) is a community-maintained wiki-style platform for collecting and sharing concert setlists. Beyond individual concert setlists, it contains information on concert venues, festivals, and the artists who play there. The platform provides a public API for data extraction, with all artists, festivals, concerts, and venues assigned unique identifiers. In setlist.fm, artists are represented by MusicBrainz identifiers. This enables straightforward cross-referencing between the two data sources. MusicBrainz (https://musicbrainz.org/) is an open music encyclopedia that collects and publishes music metadata, maintained by a global community of contributors. Like setlist.fm, it offers API access for straightforward data retrieval. We initially fetch information on all festivals, their editions, venues, countries, and associated artists from setlist.fm. To ensure the resulting TKG remains computationally tractable, we restrict our scope to festivals held in Germany. Germany provides a particularly suitable case study due to its large, economically significant, and culturally diverse festival sector. Moreover, Germany is among the countries with the highest number of entries on setlist.fm, ensuring sufficient data coverage. Additionally, restricting the analysis to a country with which we have strong domain familiarity improves data validation and interpretation. To focus on established festivals, we apply the following filtering criteria: Festivals must have occured at least 5 times, each festival edition must feature at least 10 artists, and the festivals must have featured at least 30 unique artists across all editions. These thresholds are pragmatic cutoffs to exclude small-scale events. These filters reduce our dataset to 380 festivals. Given that the earliest festival in our filtered set dates to 1971, we include temporal information starting from that year. For each artist appearing at these festivals, we extract additional concert data from setlist.fm, including all concert dates, venues, and venue locations. To prevent the TKG from being extremely large and to focus on relevant venues, we exclude venues hosting less than five concerts, as such venues typically contribute little structural information while substantially increasing graph sparsity. Finally, we enrich the artist information with metadata from MusicBrainz via their API, extracting founding (and, if applicable, ending) and geographic information, artist types, label relationships, and release history where available. We started fetching the festival information from setlist.fm on December 16th 2025, fetching the concert performance data on February 9th 2026, and fetching the metainformation from musicbrainz on February 14th 2026. All data mining scripts are publicly available in our GitHub repository https://github.com/JuliaGast/festival-lineup-tkg. (English) |
| URL: | https://madata.bib.uni-mannheim.de/822/ |
|---|---|
| Access (Controlled): | Open Access |
| File | Filename / Infos | Link |
|---|---|---|
|
Archive
Filename: tkgl_concertperformanceonly.zip
|
Download (4MB)
|
|
|
Archive
Filename: tkgl_concert.zip
|
Download (195MB)
|
|
|
Archive
Filename: tkgl_concertwithshortcuts.zip
|
Download (269MB)
|
| Date Deposited: | 27 Apr 2026 16:58 |
|---|---|
| Last Modified: | 27 Apr 2026 16:58 |
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
![]() |
View Item |


