Data Types and File Types

Vitessce supports several data types which denote in an abstract sense the type of observations contained in a file (e.g., matrix, dataframe, image). For each data type, Vitessce may support multiple file types (datasets[].files[].fileType) which denote specific schemas and file formats that Vitessce knows how to read.

For example, a file that conforms to the obsEmbedding data type must contain embedding coordinates for each cell (assuming each observation represents a cell) computed via e.g. t-SNE or UMAP. Depending on which file format is more convenient, you may choose either obsEmbedding.csv or obsEmbedding.anndata.zarr.

Data Types and File Types

Data Type	File Types	Convert from...
`obsEmbedding` Per-observation 2D embedding coordinates. Typically used to store dimensionality reductions performed on cell-by-biomarker expression matrices.	`obsEmbedding.csv` `obsEmbedding.anndata.zarr` `obsEmbedding.mudata.zarr` `anndata.zarr`	AnnData Loom
`obsPoints` Spatially-resolved 2D coordinates without a specified size. For example, individual RNA molecule x-y coordinates measured by FISH. (Supported by `spatialBeta` view.)	`obsPoints.csv` `obsPoints.anndata.zarr` `obsPoints.mudata.zarr` `obsPoints.spatialdata.zarr` `anndata.zarr` `spatialdata.zarr`	AnnData Loom
`obsSpots` Spatially-resolved 2D coordinates with a specified size. For example, spot-based or bead-based spatial transcriptomics such as from 10x Visium. (Supported by `spatialBeta` view.)	`obsSpots.csv` `obsSpots.anndata.zarr` `obsSpots.mudata.zarr` `obsSpots.spatialdata.zarr` `anndata.zarr` `spatialdata.zarr`	AnnData Loom
`obsSegmentations` Per-observation segmentation polygons or bitmasks. For example, cell or organelle segmentations.	`obsSegmentations.ome-tiff` `obsSegmentations.ome-zarr` `obsSegmentations.json` `obsSegmentations.anndata.zarr` `obsSegmentations.mudata.zarr` `obsSegmentations.raster.json` `labels.spatialdata.zarr` `anndata.zarr` `spatialdata.zarr`	AnnData Loom
`obsLocations` 2D coordinates representing precise locations corresponding to segmentations. For example, cell segmentation centroid coordinates to support lasso selection interactions.	`obsLocations.csv` `obsLocations.anndata.zarr` `obsLocations.mudata.zarr` `anndata.zarr`	AnnData Loom
`obsSets` Lists or hierarchies of sets of observations. For example, cell type annotations or unsupervised clustering results.	`obsSets.json` `obsSets.csv` `obsSets.anndata.zarr` `obsSets.mudata.zarr` `obsSets.spatialdata.zarr` `anndata.zarr` `spatialdata.zarr`	AnnData Loom
`obsLabels` Per-observation string labels. For example, alternate cell identifiers.	`obsLabels.csv` `obsLabels.anndata.zarr` `obsLabels.mudata.zarr` `anndata.zarr`	AnnData Loom
`image` Multi-scale multiplexed imaging data, including OME-TIFF files and OME-NGFF Zarr stores.	`image.ome-zarr` `image.ome-tiff` `image.spatialdata.zarr` `spatialdata.zarr`	TIFF Proprietary Formats
`obsFeatureMatrix` Observation-by-feature matrix. Typically used to store cell-by-gene expression matrices.	`obsFeatureMatrix.csv` `obsFeatureMatrix.anndata.zarr` `obsFeatureMatrix.mudata.zarr` `obsFeatureMatrix.spatialdata.zarr` `anndata.zarr` `spatialdata.zarr`	AnnData Loom
`featureLabels` Per-feature string labels. For example, alternate gene identifiers.	`featureLabels.csv` `featureLabels.anndata.zarr` `featureLabels.mudata.zarr` `anndata.zarr`	AnnData Loom
`genomic-profiles` Genomic profiles, such as ATAC-seq profiles.	`genomic-profiles.zarr`	SnapATAC
`sampleEdges` Tuples of (observationId, sampleId) to map observations to samples.	`sampleEdges.anndata.zarr`	AnnData
`sampleSets` Lists or hierarchies of sets of samples.	`sampleSets.csv` `sampleSets.anndata.zarr`	AnnData
`comparisonMetadata` Comparison metadata.	`comparisonMetadata.anndata.zarr`	AnnData
`featureStats` Per-feature statistics. For example, differential expression test results.	`comparativeFeatureStats.anndata.zarr`	AnnData
`featureSetStats` Per-feature-set statistics. For example, gene set enrichment test results.	`comparativeFeatureSetStats.anndata.zarr`	AnnData
`obsSetStats` Per-observation-set statistics. For example, cell type composition analysis results.	`comparativeObsSetStats.anndata.zarr`	AnnData

Joint File Types

A joint file type is a pseudo-file type (pseudo in the sense that it does not correspond to any one data type) which allows a single file definition (and therefore a single URL) in the Vitessce configuration to expand to define multiple files of the atomic (i.e., non-joint) types listed in the table above.

This is motivated by the fact that one file may store information corresponding to more than one data type. For instance, AnnData files may store not only the obs-by-feature matrix (adata.X) but also multiple 2D embedding coordinates (adata.obsm['X_umap'] and adata.obsm['X_pca']), spatial coordinates (adata.obsm['X_spatial']), and cell type annotations (e.g., adata.obs['cell_type']). Rather than defining five different files (corresponding to the atomic file types obsFeatureMatrix.anndata.zarr, obsEmbedding.anndata.zarr, obsEmbedding.anndata.zarr, obsLocations.anndata.zarr, and obsSets.anndata.zarr, respectively), one anndata.zarr joint file definition can be used instead.

Note that joint file type expansion is currently not performed recursively (i.e., a joint file type expansion function must return a list of atomic file definitions).

Data Types and File Types​

Joint File Types​

Data Types and File Types

Joint File Types