gobbli.test.util module¶
-
class
gobbli.test.util.
MockDataset
(*args, **kwargs)[source]¶ Bases:
gobbli.dataset.base.BaseDataset
A minimal dataset derived from BaseDataset for testing the ABC’s logic.
-
X_TEST
= ['test1', 'test2']¶
-
X_TRAIN_VALID
= ['train1', 'train2', 'train3', 'train4']¶
-
Y_TEST
= ['1', '0']¶
-
Y_TRAIN_VALID
= ['0', '1', '0', '1']¶
-
classmethod
data_dir
()¶ - Return type
Path
-
embed_input
(embed_batch_size=32, pooling=<EmbedPooling.MEAN: 'mean'>, limit=None)¶ - Return type
-
classmethod
load
(*args, **kwargs)¶ - Return type
-
predict_input
(predict_batch_size=32, limit=None)¶ - Return type
-
train_input
(train_batch_size=32, valid_batch_size=8, num_train_epochs=3, valid_proportion=0.2, split_seed=1234, shuffle_seed=1234, limit=None)¶ - Return type
-
-
class
gobbli.test.util.
MockExperiment
(model_cls, dataset, test_dataset=None, data_dir=None, name=None, param_grid=None, task_num_cpus=1, task_num_gpus=0, worker_gobbli_dir=None, worker_log_level=30, limit=None, overwrite_existing=False, ignore_ray_initialized_error=False, distributed=False, ray_kwargs=None)[source]¶ Bases:
gobbli.experiment.base.BaseExperiment
A minimal experiment derived from BaseExperiment for testing the ABC’s logic.
Construct an experiment.
- Parameters
model_cls¶ (
Any
) – The class of model to be used for the experiment.dataset¶ (
Union
[Tuple
[List
[str
],List
[str
]],BaseDataset
]) – Dataset to be used for the experiment. Can be either a 2-tuple containing a list of texts and a corresponding list of labels or agobbli.dataset.base.BaseDataset
.test_dataset¶ (
Optional
[Tuple
[List
[str
],List
[str
]]]) – An optional separate dataset to be used for calculating test metrics. If passed, should be a 2-tuple containing a list of texts and corresponding list of labels. If not passed, a test dataset will be automatically split out of the dataset.data_dir¶ (
Optional
[Path
]) – Optional path to a directory used to store data for the experiment. If not given, a directory under GOBBLI_DIR will be created and used.name¶ (
Optional
[str
]) – A descriptive name for the experiment, used to label directories in the filesystem. If not passed, a random name will be generated and used. The name must be unique (i.e., there should not be another experiment with the same name).param_grid¶ (
Optional
[Dict
[str
,List
[Any
]]]) – Optional grid of parameters. If passed, it should be a dictionary with keys being valid parameter names for the passed model and values being lists of parameter values. Every combination of parameter values will be tried in the experiment, and the results for the best combination will be returned. If not passed, only the model’s default parameters will be used.task_num_cpus¶ (
int
) – Number of CPUs to reserve per task.task_num_gpus¶ (
int
) – Number of GPUs to reserve per task.worker_gobbli_dir¶ (
Optional
[Path
]) – Directory to use for gobbli file storage by workers.worker_log_level¶ (
Union
[int
,str
]) – Logging level to use for logs output by workers running training tasks.limit¶ (
Optional
[int
]) – Read up to this many rows from the passed dataset. Useful for debugging.overwrite_existing¶ (
bool
) – If True, don’t fail if there’s an existing experiment in the same directory.ignore_ray_initialized_error¶ (
bool
) – If True, don’t error when a ray connection is already initialized; instead, shut it down and restart it with the passed ray_kwargs.distributed¶ (
bool
) – If True, run the ray cluster assuming workers are distributed over multiple nodes. This requires model weights for all trials to fit in the ray object store, which requires a lot of memory. If False, run the ray cluster assuming all workers are on the master node, and weights will be passed around as filepaths; an error will be thrown if a remote worker tries to run a task.ray_kwargs¶ (
Optional
[Dict
[str
,Any
]]) – Dictionary containing keyword arguments to be passed directly toray.init()
. By default, a new ray cluster will be initialized on the current node using all available CPUs and no GPUs, but these arguments can be used to connect to a remote cluster, limit resource usage, and much more.
-
data_dir
()¶ - Return type
Path
- Returns
The main data directory unique to this experiment.
-
property
metadata_path
¶ - Return type
Path
- Returns
The path to the experiment’s metadata file containing information about the experiment parameters.
-
class
gobbli.test.util.
MockModel
(data_dir=None, load_existing=False, use_gpu=False, nvidia_visible_devices='all', logger=None, **kwargs)[source]¶ Bases:
gobbli.model.base.BaseModel
A minimal model derived from BaseModel for testing the ABC’s logic.
Create a model.
- Parameters
data_dir¶ (
Optional
[Path
]) – Optional path to a directory used to store model data. If not given, a unique directory under GOBBLI_DIR will be created and used.load_existing¶ (
bool
) – If True,data_dir
should be a directory that was previously used to create a model. Parameters will be loaded to match the original model, and user-specified model parameters will be ignored. If False, the data_dir must be empty if it already exists.use_gpu¶ (
bool
) – If True, use the nvidia-docker runtime (https://github.com/NVIDIA/nvidia-docker) to expose NVIDIA GPU(s) to the container. Will cause an error if the computer you’re running on doesn’t have an NVIDIA GPU and/or doesn’t have the nvidia-docker runtime installed.nvidia_visible_devices¶ (
str
) – Which GPUs to make available to the container; ignored ifuse_gpu
is False. If not ‘all’, should be a comma-separated string: ex.1,2
.logger¶ (
Optional
[Logger
]) – If passed, use this logger for logging instead of the default module-level logger.**kwargs¶ – Additional model-specific parameters to be passed to the model’s
init()
method.
-
build
()¶ Perform any pre-setup that needs to be done before running the model (building Docker images, etc).
-
property
class_weights_dir
¶ The root directory used to store initial model weights (before fine-tuning). These should generally be some pretrained weights made available by model developers. This directory will NOT be created by default; models should download their weights and remove the weights directory if the download doesn’t finish properly.
Most models making use of this directory will have multiple sets of weights and will need to store those in subdirectories under this directory.
- Return type
Path
- Returns
The path to the class-wide weights directory.
-
data_dir
()¶ - Return type
Path
- Returns
The main data directory unique to this instance of the model.
-
property
info_path
¶ - Return type
Path
- Returns
The path to the model’s info file, containing information about the model including the type of model, gobbli version it was trained using, etc.
-
init
(params)[source]¶ Initialize a derived model using parameters specific to that model.
- Parameters
params¶ – A dictionary where keys are parameter names and values are parameter values.
-
property
logger
¶ - Return type
Logger
- Returns
A logger for derived models to use.
-
property
metadata_path
¶ - Return type
Path
- Returns
The path to the model’s metadata file containing model-specific parameters.
-
classmethod
model_class_dir
()¶ - Return type
Path
- Returns
A directory shared among all classes of the model.
-
property
weights_dir
¶ The directory containing weights for a specific instance of the model. This is the class weights directory by default, but subclasses might define this property to return a subdirectory based on a set of pretrained model weights.
- Return type
Path
- Returns
The instance-specific weights directory.
-
gobbli.test.util.
model_test_dir
(model_cls)[source]¶ Return a directory to be used for models of the passed type. Helpful when the user wants data to be persisted so weights don’t have to be reloaded for each test run.
- Return type
Path