gobbli.experiment.base module

class gobbli.experiment.base.BaseExperiment(model_cls, dataset, test_dataset=None, data_dir=None, name=None, param_grid=None, task_num_cpus=1, task_num_gpus=0, worker_gobbli_dir=None, worker_log_level=30, limit=None, overwrite_existing=False, ignore_ray_initialized_error=False, distributed=False, ray_kwargs=None)[source]

Bases: abc.ABC

Base class for all derived Experiments.

Construct an experiment.

Parameters
  • model_cls (Any) – The class of model to be used for the experiment.

  • dataset (Union[Tuple[List[str], List[str]], BaseDataset]) – Dataset to be used for the experiment. Can be either a 2-tuple containing a list of texts and a corresponding list of labels or a gobbli.dataset.base.BaseDataset.

  • test_dataset (Optional[Tuple[List[str], List[str]]]) – An optional separate dataset to be used for calculating test metrics. If passed, should be a 2-tuple containing a list of texts and corresponding list of labels. If not passed, a test dataset will be automatically split out of the dataset.

  • data_dir (Optional[Path]) – Optional path to a directory used to store data for the experiment. If not given, a directory under GOBBLI_DIR will be created and used.

  • name (Optional[str]) – A descriptive name for the experiment, used to label directories in the filesystem. If not passed, a random name will be generated and used. The name must be unique (i.e., there should not be another experiment with the same name).

  • param_grid (Optional[Dict[str, List[Any]]]) – Optional grid of parameters. If passed, it should be a dictionary with keys being valid parameter names for the passed model and values being lists of parameter values. Every combination of parameter values will be tried in the experiment, and the results for the best combination will be returned. If not passed, only the model’s default parameters will be used.

  • task_num_cpus (int) – Number of CPUs to reserve per task.

  • task_num_gpus (int) – Number of GPUs to reserve per task.

  • worker_gobbli_dir (Optional[Path]) – Directory to use for gobbli file storage by workers.

  • worker_log_level (Union[int, str]) – Logging level to use for logs output by workers running training tasks.

  • limit (Optional[int]) – Read up to this many rows from the passed dataset. Useful for debugging.

  • overwrite_existing (bool) – If True, don’t fail if there’s an existing experiment in the same directory.

  • ignore_ray_initialized_error (bool) – If True, don’t error when a ray connection is already initialized; instead, shut it down and restart it with the passed ray_kwargs.

  • distributed (bool) – If True, run the ray cluster assuming workers are distributed over multiple nodes. This requires model weights for all trials to fit in the ray object store, which requires a lot of memory. If False, run the ray cluster assuming all workers are on the master node, and weights will be passed around as filepaths; an error will be thrown if a remote worker tries to run a task.

  • ray_kwargs (Optional[Dict[str, Any]]) – Dictionary containing keyword arguments to be passed directly to ray.init(). By default, a new ray cluster will be initialized on the current node using all available CPUs and no GPUs, but these arguments can be used to connect to a remote cluster, limit resource usage, and much more.

data_dir()[source]
Return type

Path

Returns

The main data directory unique to this experiment.

property metadata_path
Return type

Path

Returns

The path to the experiment’s metadata file containing information about the experiment parameters.

gobbli.experiment.base.experiment_dir()[source]
Return type

Path

gobbli.experiment.base.get_worker_ip()[source]

Determine the IP address of the current ray worker. :rtype: str :returns: A string containing the IP address.

gobbli.experiment.base.init_gpu_config()[source]

Determine the GPU configuration from the current ray environment on a worker.

Returns

whether GPU should be used and a comma-separated string containing the ids of the GPUs that should be used

Return type

2-tuple

gobbli.experiment.base.init_worker_env(gobbli_dir=None, log_level=30)[source]

Initialize environment on a ray worker.

Parameters
  • gobbli_dir (Optional[Path]) – Used as the value of the GOBBLI_DIR environment variable;

  • where gobbli data is stored on the worker's filesystem. (determines) –

  • log_level (Union[int, str]) – Level for logging coming from the worker.

Return type

Logger