anndata 0.13: layers meets X, zarr v3, and accessors
We’re thrilled to announce that anndata 0.13 is out now!
We’ll highlight some of the major changes here but be sure to check out the changelog for the full list of changes, including breaking changes.
X is now a layer
The biggest conceptual change in 0.13 is that the main expression matrix X is now stored as a layer, internally layers[None].
For most code this is invisible since you still access/read/write adata.X exactly as before.
But treating X as just another layer moves us towards encouraging our users to be explicit about what X really is while lightening our code burden and unfiying behavior.
Towards this end, our support for accessor objects (see below) will hopefully help make it easier for packages to declare what layers they expect to produce/consume.
One major breaking change you will notice here is that anything relying on the keys of layers being all strings will now need to handle None as a key.
Copy-on-write for X
As X is now a layer, writing into a view’s X no longer mutates the parent X.
This now matches how every other element (obsm, var etc.) already behaved:
view = adata[:100]
view.X = 0 # does NOT modify adata.X anymore, instead instantiates `view` as a fully in-memory `AnnData` object
If you were relying on the old propagating behavior, operate on the original object directly.
Zarr v3 by default
We introduced support for the zarr v3 file-format as well as the v3 zarr-python package in 0.12.
In 0.13 the file format v3 becomes the default, and support for zarr-python<3 (i.e., the python package) is dropped entirely, however you can still opt-in to write v2 data.
Concretely:
anndata.settings.zarr_write_formatnow defaults to3so allwrite_zarrcalls in which the argument is a string file path will be v3- Automatic sharding is on by default (
anndata.settings.auto_shard_zarr_v3targeting ~1GB uncompressed shards), so you get far fewer files per store with no manual tuning.
If you want a refresher on what zarr v3 buys you, sharding to avoid file-system slowdowns, concurrent/parallel io, tips/tricks, and using rust-accelerated tooling like zarrs-python, the zarr v3 guide and our 0.12 post cover it in depth.
If zarrs is installed in your environment the bulk read/write functions i.e., anndata.read_zarr and AnnData.write_zarr will use it.
Accessors: object-independent array references
0.13 introduces the new anndata.acc module for referring to a vector or array by location without binding it to a specific object.
The use cases for this feature abound: plotting functionality, validation, or arguments for things like “the leiden column” or “the first PCA component”.
Until now you’d pass around strings inlined into functions.
Accessors now make these references first-class, fully supported objects that your package can export to users.
You build references through the global A accessor:
from anndata.acc import A
A.X[:, "gene-3"] # expression of one gene
A.obsm["pca"][:, 0] # first PCA component
A.obs["louvain"] # a cell-level annotation
These AdRef objects are independent of any AnnData instance.
Furthermore you can inspect them (.dims, .idx, .acc), compare them for equality and use them as dictionary keys.
To resolve a reference against an actual object i.e., index with it or test membership:
ref = A.obs["louvain"]
adata[ref] # indexes the anndata object, materializing an array
A.varm["PCs"][:, 30] in adata # answeres "does this reference exist here?" without loading the data
Accessors also work with MuData and are subclassable if you need custom behavior.
If you are interested in schemas for AnnData please get in touch - we support rudimentary serialization of accessors and have a schema for this output, but it can definitely be richer.
Check it out and open an issue if you are interested!
— The scverse core team