gobbli.dataset.newsgroups module¶
-
class
gobbli.dataset.newsgroups.
NewsgroupsDataset
(*args, **kwargs)[source]¶ Bases:
gobbli.dataset.nested_file.NestedFileDataset
gobbli Dataset for the 20 Newsgroups problem.
http://qwone.com/~jason/20Newsgroups/
Blank constructor needed to satisfy mypy
-
DELIMITER
= '\x00'¶
-
TEST_X_FILE
= 'test_X.csv'¶
-
TEST_Y_FILE
= 'test_Y.csv'¶
-
TRAIN_X_FILE
= 'train_X.csv'¶
-
TRAIN_Y_FILE
= 'train_y.csv'¶
-
X_test
()¶
-
X_train
()¶
-
classmethod
data_dir
()¶ - Return type
Path
-
download
(data_dir)[source]¶ Download and extract the dataset archive into the given data dir. Return the resulting path.
- Return type
Path
-
embed_input
(embed_batch_size=32, pooling=<EmbedPooling.MEAN: 'mean'>, limit=None)¶ - Return type
-
folders
()[source]¶ Return relative paths to the train and test folders, respectively, from the top level of the extracted archive.
- Return type
Tuple
[Path
,Path
]
-
labels
()[source]¶ Return the set of folder names that should be considered labels in each directory.
- Return type
Set
[str
]
-
classmethod
load
(*args, **kwargs)¶ - Return type
-
predict_input
(predict_batch_size=32, limit=None)¶ - Return type
-
read_source_file
(file_path)[source]¶ Read the text from a source file. Used to account for per-dataset encodings and other format differences.
- Return type
str
-
train_input
(train_batch_size=32, valid_batch_size=8, num_train_epochs=3, valid_proportion=0.2, split_seed=1234, shuffle_seed=1234, limit=None)¶ - Return type
-
y_test
()¶
-
y_train
()¶
-