gobbli.augment.wordnet module

class gobbli.augment.wordnet.WordNet(skip_download_check=False, spacy_model='en_core_web_sm')[source]

Bases: gobbli.augment.base.BaseAugment

Data augmentation method based on WordNet. Replaces words with similar words according to the WordNet ontology. Texts will be Part of Speech-tagged using spaCy to help ensure only sensible replacements (i.e., within the same part of speech) are considered.

Parameters
  • skip_download_check (bool) – If True, don’t try to download the WordNet corpus; assume it’s already been downloaded.

  • spacy_model – The language model to be used for Part of Speech tagging by spaCy. The model must already have been installed.

augment(X, times=5, p=0.1)[source]

Return additional texts for each text in the passed array.

Parameters
  • X (List[str]) – Input texts.

  • times (int) – How many texts to generate per text in the input.

  • p (float) – Probability of considering each token in the input for replacement. Note that some tokens aren’t able to be replaced by a given augmentation method and will be ignored, so the actual proportion of replaced tokens in your input may be much lower than this number.

Return type

List[str]

Returns

Generated texts (length = times * len(X)).

classmethod data_dir()
Return type

Path

Returns

The data directory used for this class of augmentation model.