gobbli.augment.wordnet module¶

class gobbli.augment.wordnet.WordNet(skip_download_check=False, spacy_model='en_core_web_sm')[source]¶

Bases: gobbli.augment.base.BaseAugment

Data augmentation method based on WordNet. Replaces words with similar words according to the WordNet ontology. Texts will be Part of Speech-tagged using spaCy to help ensure only sensible replacements (i.e., within the same part of speech) are considered.

Parameters

skip_download_check¶ (bool) – If True, don’t try to download the WordNet corpus; assume it’s already been downloaded.
spacy_model¶ – The language model to be used for Part of Speech tagging by spaCy. The model must already have been installed.

augment(X, times=5, p=0.1)[source]¶

Return additional texts for each text in the passed array.

Parameters

X¶ (List[str]) – Input texts.
times¶ (int) – How many texts to generate per text in the input.
p¶ (float) – Probability of considering each token in the input for replacement. Note that some tokens aren’t able to be replaced by a given augmentation method and will be ignored, so the actual proportion of replaced tokens in your input may be much lower than this number.

Return type

List[str]

Returns

Generated texts (length = times * len(X)).

classmethod data_dir()¶

Return type: Path
Returns: The data directory used for this class of augmentation model.

gobbli.augment.wordnet module¶

Navigation

Related Topics