gobbli.dataset.nested_file module¶
-
class
gobbli.dataset.nested_file.NestedFileDataset(*args, **kwargs)[source]¶ Bases:
gobbli.dataset.base.BaseDatasetA dataset downloaded as an archive from some URL and composed of the following directory structure:
- <train_folder>/
- <label1>/
data1 data2
- <label2>/
data1 data2
- <test_folder>/
- <label1>/
data1 data2
- <label2>/
data1 data2
Blank constructor needed to satisfy mypy
-
DELIMITER= '\x00'¶
-
TEST_X_FILE= 'test_X.csv'¶
-
TEST_Y_FILE= 'test_Y.csv'¶
-
TRAIN_X_FILE= 'train_X.csv'¶
-
TRAIN_Y_FILE= 'train_y.csv'¶
-
classmethod
data_dir()¶ - Return type
Path
-
abstract
download(data_dir)[source]¶ Download and extract the dataset archive into the given data dir. Return the resulting path.
- Return type
Path
-
embed_input(embed_batch_size=32, pooling=<EmbedPooling.MEAN: 'mean'>, limit=None)¶ - Return type
-
abstract
folders()[source]¶ Return relative paths to the train and test folders, respectively, from the top level of the extracted archive.
- Return type
Tuple[Path,Path]
-
abstract
labels()[source]¶ Return the set of folder names that should be considered labels in each directory.
- Return type
Set[str]
-
classmethod
load(*args, **kwargs)¶ - Return type
-
predict_input(predict_batch_size=32, limit=None)¶ - Return type
-
abstract
read_source_file(file_path)[source]¶ Read the text from a source file. Used to account for per-dataset encodings and other format differences.
- Return type
str
-
train_input(train_batch_size=32, valid_batch_size=8, num_train_epochs=3, valid_proportion=0.2, split_seed=1234, shuffle_seed=1234, limit=None)¶ - Return type