gobbli.dataset.imdb module

class gobbli.dataset.imdb.IMDBDataset(*args, **kwargs)[source]

Bases: gobbli.dataset.nested_file.NestedFileDataset

gobbli Dataset for the IMDB sentiment analysis problem.

https://ai.stanford.edu/~amaas/data/sentiment/

Blank constructor needed to satisfy mypy

DELIMITER = '\x00'
TEST_X_FILE = 'test_X.csv'
TEST_Y_FILE = 'test_Y.csv'
TRAIN_X_FILE = 'train_X.csv'
TRAIN_Y_FILE = 'train_y.csv'
X_test()
X_train()
classmethod data_dir()
Return type

Path

download(data_dir)[source]

Download and extract the dataset archive into the given data dir. Return the resulting path.

Return type

Path

embed_input(embed_batch_size=32, pooling=<EmbedPooling.MEAN: 'mean'>, limit=None)
Return type

EmbedInput

folders()[source]

Return relative paths to the train and test folders, respectively, from the top level of the extracted archive.

Return type

Tuple[Path, Path]

labels()[source]

Return the set of folder names that should be considered labels in each directory.

Return type

Set[str]

classmethod load(*args, **kwargs)
Return type

BaseDataset

predict_input(predict_batch_size=32, limit=None)
Return type

PredictInput

read_source_file(file_path)[source]

Read the text from a source file. Used to account for per-dataset encodings and other format differences.

Return type

str

train_input(train_batch_size=32, valid_batch_size=8, num_train_epochs=3, valid_proportion=0.2, split_seed=1234, shuffle_seed=1234, limit=None)
Return type

TrainInput

y_test()
y_train()