Additional Labeled Reference Data from the Linked Open Citation Database (LOC-DB) Project ============================================================================== The data consists of 2.402 pages of lists of references from books and chapters together with the labeled boxes for each entry in the list of references. The XML files contain the coordinates of the boxes and for each box a label (box or incomplete). The XML files are in PASCAL VOC format and the boxes in there look for example like this ``` box # "box" for boxes which contain the whole reference string, "incomplete" for boxes which contain only a part of the reference string Unspecified # this element is not used for the reference analysis, it always contains a default value 0 # this element is not used for the reference analysis, it always contains a default value 0 # this element is not used for the reference analysis, it always contains a default value 190 # x coordinate of the upper left corner of the bounding box in pixels counting from the upper left corner of the page 1746 # y coordinate of the upper left corner of the bounding box in pixels counting from the upper left corner of the page 1890 # x coordinate of the lower right corner of the bounding box in pixels counting from the upper left corner of the page 1809 # y coordinate of the lower right corner of the bounding box in pixels counting from the upper left corner of the page ``` The file names contain the id (called PPN) from the SWB union catalog http://swb.bsz-bw.de/DB=2.1/SET=1/TTL=1/START_WELCOME where the bibliographic metadata of the book can easily be found. ## Details about the labeling process * The labeling took place from May to July 2018 and was done by student workers as well as librarians of the Mannheim University Library. * The data was produced during the LOC-DB project https://locdb.bib.uni-mannheim.de/ * The software labelImg (from v1.2. to v1.7.0) was used for labeling the images. * Each box should contain the whole reference, but can also contain a little more space e.g. to the right. Text before or after references is not labeled. ## Examining the data visually Possible steps for a visual impression about the data: 1. Download the software labelImg (e.g. version 1.7.0): https://tzutalin.github.io/labelImg/ 2. Unzip and run labelImg 3. Click 'Change default saved annotation folder' in Menu/File and choose one of the subfolders here 4. Click 'Open Dir' and choose the same subfolder ## Copyright and Data Citations All data here is CC0 and can be reused without further limitations. However, we encourage you to make a data citation in any publication using the data here: Laura Erhard, Annette Klein, Syed Tahseen Raza Rizvi, Sylvia Zander, Philipp Zumstein (2018): Additional Labeled Reference Data from the Linked Open Citation Database (LOC-DB) Project. Universitätsbibliothek Mannheim. https://doi.org/10.7801/283