Definition of terms and their associated context in Aquarium
Tasks in Aquarium map to common computer vision task types.
Currently supported task types in Aquarium include:
- 2D Object Detection
- 3D Object Detection
- Semantic Segmentation
- Instance Segmentation
Different task types (and even sub types within tasks) often require different data formats. Configuring various components of Aquarium may require declaring the task type, so that we can correctly support the required data.
Projects are the highest level of organization for data in Aquarium. Projects allow you to organize your data and collaborate in one central location. Each project may contain one or more datasets.
Every project in Aquarium has:
- A globally unique name
- A single task type (e.g. 2D object detection, semantic segmentation, etc.)
- A single ontology
An ontology (also referred to as a taxonomy or label class map) enumerates entities and their relationships for a given machine learning task. Ontologies in Aquarium must explicitly define every label and every inference within a given project.
Ontologies in Aquarium allow you to define display names, colors, relationships between training and inference classes, and many other metadata fields. These definitions are used throughout the platform to display data, calculate metrics, and otherwise support Aquarium's various workflows.
Note that ontologies are assigned at the project level and apply to all datasets within a given project.
Datasets are the core organizational construct within Aquarium. In the most basic sense, a dataset is a grouping of imagery, metadata and annotations.
Aquarium supports two primary types of datasets, each of which has distinct capabilities.
- Labeled datasets are the most common type of datasets in Aquarium and are made up of imagery, metadata and ground truth annotations. Examples of labeled datasets may include your training, validation and test sets.
- Labeled datasets allow you to assess data quality, evaluate model performance, run data curation workflows, and set up production-ready ML processes.
- Unlabeled datasets are a more specialized dataset type and are made up of imagery and proposed regions of interest (usually inferences generated by a model).
- Unlabeled datasets are used as a part of data curation and sampling workflows in Aquarium. They serve as the search space for rare or interesting scenarios found within your labeled datasets.
- Unlabeled datasets are typically much larger than labeled datasets.
Datasets in Aquarium are intended to be modified over time (for example as new data is acquired and added to the training or test sets). All of these modifications are tracked in Aquarium, enabling you to view and interact with the various states of your dataset over time.
A frame is the atomic unit of data within a dataset.
- In the base case a frame is made up of an electro-optical image (as a .PNG, .JPEG or any other common format) and its structured metadata (as JSON).
- In more advanced cases a frame may be made up of multiple images (a primary image and context imagery or data from other sensor types) and its structured metadata (as JSON or other formats). The more complex structure is common for autonomous driving tasks with LIDAR, Radar or other sensor types and in robotics tasks.
- Labels (and associated metadata) and inferences (and associated metadata) are associated with frames.
Frames in Aquarium are intended to be modified over time (for example as labeling providers return new labels, or as new metadata becomes available). All of these edits are tracked in Aquarium, enabling you to view and interact with the various states of your dataset over time.