Database#
- class pyskani.Database(path=None, *, compression=125, marker_compression=1000, k=15)#
A database storing sketched genomes.
The database contains two different sketch collections with different compression levels: marker sketches, which are heavily compressed, and always kept in memory; and genome sketches, which take more memory, but may be stored inside an external file.
- flush()#
Flush the database.
This does nothing for a database loaded in memory. For a database stored in a folder, this will save the markers into a file named
markers.bin
.
- load()#
Load a database from a folder containing sketches.
The sketches will be loaded in memory to speed-up querying. To reduce memory consumption and load sketches lazily from the folder, use
Database.open
.- Parameters:
path (
str
,bytes
, oros.PathLike
) – The path to the folder containing the sketched references.- Returns:
Database
– A database with all sketches loaded in memory.- Raises:
OSError – When the files from the folder could not be opened.
ValueError – When the sketches could not be deserialized.
- open()#
Open a database from a folder containing sketches.
The marker sketches will be loaded in memory, but the sketches will be loaded only when needed when querying. To speed-up querying by pre-fetching sketches, use
Database.load
.- Parameters:
path (
str
,bytes
, oros.PathLike
) – The path to the folder containing the sketched references.- Returns:
Database
– A database with only markers loaded in memory.- Raises:
OSError – When the files from the folder could not be opened.
ValueError – When the markers could not be deserialized.
- query(name, *contigs, seed=True, learned_ani=None, median=False, robust=False)#
Query the database with a genome.
- Parameters:
name (
str
) – The name of the query genome.contigs (
str
,bytes
,bytearray
ormemoryview
) – The contigs of the query genome.
- Keyword Arguments:
seed (
bool
) – Compute seed positions while sketching the query.learned_ani (
bool
orNone
) – Use a regression model to compute ANI, using a model trained on MAGs. PassTrue
orFalse
to force enabling or disabling the model, respectively. By default, the regression model is enabled when the sketch compression factor is >=70.median (
bool
) – Estimate median identity instead of average identity. Disabled by default.robust (
bool
) – Estimate mean after trim off 10%/90% quantiles. Disabled by default.
- Returns:
- save(path, overwrite=False)#
Save the database to the given path.
- sketch(name, *contigs, seed=True)#
Add a reference genome to the database.
This method is a shortcut for
Database.add_draft
when a genome is complete (i.e. only contains a single contig).
- path#
The path where sketches are stored.
- Type:
pathlib.Path
orNone