# Dataset Files - entity_ids.del - maps ids used in all files to Wikidata IDs - first column entity id, second column Wikidata entity id - tab separated - entity_mentions.del - maps entity ids to entity mentions - tab separated - entity_desc.del - maps entity ids to entity descriptions - tab separated - relation_ids.del - maps relation ids Wikidata relation ids - first column relation id, second column Wikidata relation id - tab separated - relation_mentions.del - maps relation ids to relation mentions - train.del - contains training triples in the form of subject, relation, object - tab separated ## Transductive - valid.del - contains transductive validation triples in the form of subject, relation, object - tab separated - test.del - contains transductive validation triples in the form of subject, relation, object - tab separated ## Semi-Inductive - all_entity_ids.del - contains ids from entity_ids.del and additionally all ids of unseen entities - tab separated - all_entity_mentions.del - contains mentions from entity_mentions.del and additionally all mentions of unseen entities - tab separated - all_entity_desc.del - contains descriptions from entity_desc.del and additionally all descriptions of unseen entities - tab separated - valid_pool.del - contains all triples used for semi-inductive validation - columns - 1: unseen entity id - 2: slot of unseen entity (0: unseen entity is in subject slot, 1: unseen entity in object slot) - 3-5: validation triple - 3: subject - 4: relation - 5: object - tab separated - use `prepare_few_shot.py` to create all semi-inductive tasks from this file - test_pool.del - contains all triples used for semi-inductive testing - columns - 1: unseen entity id - 2: slot of unseen entity (0: unseen entity is in subject slot, 1: unseen entity in object slot) - 3-5: test triple - 3: subject - 4: relation - 5: object - tab separated - use `prepare_few_shot.py` to create all semi-inductive tasks from this file # Generate Few Shot Tasks - use the file `prepare_few_shot.py` - create a `few_shot_set_creator` object - `dataset_name`: (str) name of the dataset - default: wikidata5m-si - `use_invese`: (bool) whether to use inverse relations - default: False - if True: for all triples where the unseen entity is in the object slot, increase relation id by num-relations and invert triple - `split`: (str) which split to use - default: valid - `context_selection`: (str) which context_selection technique to use - default: most_common - options: most_common, least_common, random ```python few_shot_set_creator = FewShotSetCreator( dataset_name="wikidata5m-si", use_inverse=True, split="test" ) ``` - generate the data using the `few_shot_set_creator` - `num_shots`: (int) the number of shots to use (between 0 and 10) ```python data = few_shot_set_creator.create_few_shot_dataset(num_shots=5) ``` - evaluation is performed in direction unseen to seen - output format looks like this ```python [ { "unseen_entity": , "unseen_slot": , "triple: <[s, p, o]>, "context: <[unseen_entity_id, unseen_entity_slot, s, p, o]> }, ... ] ```