# Dataset Files

- entity_ids.del
	- maps ids used in all files to Wikidata IDs
    - first column entity id, second column Wikidata entity id 
    - tab separated
- entity_mentions.del
	- maps entity ids to entity mentions
	- tab separated
- entity_desc.del
	- maps entity ids to entity descriptions
	- tab separated
- relation_ids.del
	- maps relation ids Wikidata relation ids
    - first column relation id, second column Wikidata relation id
    - tab separated
- relation_mentions.del
	- maps relation ids to relation mentions
- train.del
	- contains training triples in the form of subject, relation, object
    - tab separated


## Transductive

- valid.del
	- contains transductive validation triples in the form of subject, relation, object
    - tab separated
- test.del
	- contains transductive validation triples in the form of subject, relation, object
    - tab separated


## Semi-Inductive

- all_entity_ids.del
	- contains ids from entity_ids.del and additionally all ids of unseen entities
	- tab separated
- all_entity_mentions.del
	- contains mentions from entity_mentions.del and additionally all mentions of unseen entities
	- tab separated
- all_entity_desc.del
	- contains descriptions from entity_desc.del and additionally all descriptions of unseen entities
	- tab separated
- valid_pool.del
	- contains all triples used for semi-inductive validation
    - columns
      - 1: unseen entity id
      - 2: slot of unseen entity (0: unseen entity is in subject slot, 1: unseen entity in object slot)
      - 3-5: validation triple
        - 3: subject
        - 4: relation
        - 5: object
	- tab separated
	- use `prepare_few_shot.py` to create all semi-inductive tasks from this file
- test_pool.del
	- contains all triples used for semi-inductive testing
	- columns
		- 1: unseen entity id
		- 2: slot of unseen entity (0: unseen entity is in subject slot, 1: unseen entity in object slot)
		- 3-5: test triple
			- 3: subject
			- 4: relation
			- 5: object
	- tab separated
	- use `prepare_few_shot.py` to create all semi-inductive tasks from this file


# Generate Few Shot Tasks

- use the file `prepare_few_shot.py`
- create a `few_shot_set_creator` object
	- `dataset_name`: (str) name of the dataset 
      - default: wikidata5m-si
	- `use_invese`: (bool) whether to use inverse relations 
      - default: False
      - if True: for all triples where the unseen entity is in the object slot, increase relation id by num-relations and invert triple
	- `split`: (str) which split to use 
      - default: valid
	- `context_selection`: (str) which context_selection technique to use 
      - default: most_common 
      - options: most_common, least_common, random

```python
few_shot_set_creator = FewShotSetCreator(
	dataset_name="wikidata5m-si", 
	use_inverse=True, 
	split="test" 
)
```

- generate the data using the `few_shot_set_creator`
	- `num_shots`: (int) the number of shots to use (between 0 and 10)

```python
data = few_shot_set_creator.create_few_shot_dataset(num_shots=5)
```

- evaluation is performed in direction unseen to seen
- output format looks like this
```python
[
{
	"unseen_entity": <id of unseen entity>,
	"unseen_slot": <slot of unseen entity: 0 for head/subject, 2 for tail/object>,
	"triple: <[s, p, o]>,
	"context: <[unseen_entity_id, unseen_entity_slot, s, p, o]>
},
...

]
```