API

Species definitions

The Catalog contains a large number of species and simulation model definitions, which are built using a number of classes defined here. These are usually not intended to be instantiated directly, but should be accessed through the main entry point, get_species().

stdpopsim.get_species(id)[source]

Returns a Species object for the specified id.

Parameters

id (str) – The string identifier for the requested species. E.g. “HomSap”. A complete list of species, and their IDs, can be found in the Catalog.

Returns

An object containing the species definition.

Return type

Species

class stdpopsim.Species[source]

Class representing a species in the catalog.

Variables
  • id (str) – The unique identifier for this species. The species ID is the three first letters of the genus name followed by the first three letters of the species name, and does not contain any spaces or punctuation. The usual scheme is to use the first three letters of the genus and species (similar to the approach used in the UCSC genome browser), e.g., “HomSap” is the ID for Homo Sapiens.

  • name (str) – The full name of this species in binominal nomenclature as it would be used in written text, e.g., “Homo sapiens”.

  • common_name (str) – The name of this species as it would most often be used informally in written text, e.g., “human”, or “Orang-utan”. Where no common name for the species exist, use the most common abbreviation, e.g., “E. Coli”.

  • genome (stdpopsim.Genome) – The Genome instance describing the details of this species’ genome.

  • generation_time (float) – The current best estimate for the generation time of this species in years. Note that individual demographic models in the catalog may or may not use this estimate: each model uses the generation time that was used in the original publication(s).

  • generation_time_citations (list) – A list of Citation objects providing justification for the genertion time estimate.

  • population_size (float) – The current best estimate for the population size of this species. Note that individual demographic models in the catalog may or may not use this estimate: each model uses the population sizes defined in the original publication(s).

  • population_size_citations (list) – A list of Citation objects providing justification for the population size estimate.

  • demographic_models (list) – This list of DemographicModel instances in the catalog for this species.

  • ensembl_id (str) – The ensembl id for the species’ genome assembly, which will be used by maintenance scripts to query ensembl’s database. This parameter will be automatically populated from the species name, and should not be set directly unless a non-default assembly is used for the species definition (e.g. see E. coli).

get_annotations(id)[source]

Returns a set of annotations with the specified id.

Parameters

id (str) – The string identifier for the set of annotations A complete list of IDs for each species can be found in the “Annotations” subsection for the species in the Catalog.

Return type

Annotation

Returns

A Annotation that holds genome annotation information from Ensembl

get_contig(chromosome=None, genetic_map=None, length_multiplier=1, length=None, inclusion_mask=None, exclusion_mask=None)[source]

Returns a Contig instance describing a section of genome that is to be simulated based on empirical information for a given species and chromosome.

Parameters
  • chromosome (str) – The ID of the chromosome to simulate. A complete list of chromosome IDs for each species can be found in the “Genome” subsection for the species in the Catalog. If the chromosome is not given, we specify a “generic” contig with given length.

  • genetic_map (str) – If specified, obtain recombination rate information from the genetic map with the specified ID. If None, simulate using a default uniform recombination rate on a region with the length of the specified chromosome. The default rates are species- and chromosome- specific, and can be found in the Catalog. (Default: None)

  • length_multiplier (float) – If specified, simulate a region of length length_multiplier times the length of the specified chromosome with the same chromosome-specific mutation and recombination rates. This option cannot currently be used in conjunction with the genetic_map argument.

  • inclusion_mask – If specified, simulated genomes are subset to only inlude regions given by the mask. The mask can be specified by the path and file name of a bed file or as a list or array of intervals given by the left and right end points of the intervals.

  • exclusion_mask – If specified, simulated genomes are subset to exclude regions given by the mask. The mask can be specified by the path and file name of a bed file or as a list or array of intervals given by the left and right end points of the intervals.

  • length (float) – Used with a “generic” contig, specifies the length of genome sequence for this contig. For a generic contig, mutation and recombination rates are equal to the genome-wide average across all autosomal chromosomes.

Return type

Contig

Returns

A Contig describing the section of the genome.

get_demographic_model(id)[source]

Returns a demographic model with the specified id.

Parameters

id (str) – The string identifier for the demographic model. A complete list of IDs for each species can be found in the “Demographic Models” subsection for the species in the Catalog.

Return type

DemographicModel

Returns

A DemographicModel that defines the requested model.

class stdpopsim.Genome[source]

Class representing the genome for a species.

Variables
  • chromosomes (list) – A list of Chromosome objects.

  • mutation_rate_citations (list) – A list of Citation objects providing justification for the mutation rate estimate.

  • recombination_rate_citations (list) – A list of Citation objects providing justification for the recombination rate estimate.

  • assembly_citations (list) – A list of Citation objects providing reference to the source of the genome assembly.

  • length (int) – The total length of the genome.

get_chromosome(id)[source]

Returns the chromosome with the specified id.

Parameters

id (str) – The string ID of the chromosome. A complete list of chromosome IDs for each species can be found in the “Genome” subsection for the species in the Catalog.

Return type

Chromosome

Returns

A Chromosome that defines properties of the chromosome such as length, mutation rate, and recombination rate.

property mean_mutation_rate

The length-weighted mean mutation rate across all chromosomes.

property mean_recombination_rate

The length-weighted mean recombination rate across all chromosomes.

class stdpopsim.Chromosome[source]

Class representing a single chromosome for a species.

Variables
  • id (str) – The string identifier for the chromosome.

  • length (int) – The length of the chromosome.

  • mutation_rate (float) – The mutation rate used when simulating this chromosome.

  • recombination_rate (float) – The recombination rate used when simulating this chromosome (if not using a genetic map).

  • synonyms (list of str) – List of synonyms that may be used when requesting this chromosome by ID, e.g. from the command line interface.

class stdpopsim.Contig[source]

Class representing a contiguous region of genome that is to be simulated. This contains the information about mutation rates and recombination rates that are needed to simulate this region.

Variables
  • mutation_rate (float) – The rate of mutation per base per generation.

  • recombination_map (msprime.simulations.RecombinationMap) – The recombination map for the region. See the msprime documentation for more details.

  • mask_intervals (array-like (?)) – Intervals to keep in simulated tree sequence, as a list of (left_position, right_position), such that intervals are non-overlapping and in ascending order. Should have shape Nx2, where N is the number of intervals.

  • exclude (bool) – If True, mask_intervals specify regions to exclude. If False, mask_intervals specify regions in keep.

Note

To run stdpopsim simulations with alternative, user-specified mutation or recombination rates, a new contig can be created based on an existing one. For instance, the following will create a new_contig that, when simulated, will have double the mutation rate of the old_contig:

new_contig = stdpopsim.Contig(
    mutation_rate=old_contig.mutation_rate * 2,
    recombination_map=old_contig.recombination_map,
    genetic_map=old_contig.genetic_map,
)
class stdpopsim.Citation[source]

A reference to the literature that should be acknowledged by users of stdpopsim.

Variables
  • doi (str) – The DOI for the publication providing the definitive reference.

  • author (str) – Short author list, .e.g, “Author 1 et. al”.

  • year (int) – Year of publication as a 4 digit integer, e.g. 2008.

because(reasons)[source]

Returns a new Citation with the given reasons.

fetch_bibtex()[source]

Retrieve the bibtex of a citation from Crossref.

static merge(citations)[source]

Returns a deduplicated list of Citation objects.

class stdpopsim.Annotation[source]

Class representing a GFF3 annotation file.

Variables
  • id (str) – String that uniquely identifies the annotation.

  • species (Species) – The species to which this annotation applies.

  • url (str) – The URL where the packed and compressed GFF3 can be found.

  • zarr_url (str) – The URL of the zarr cache of the GFF3.

  • zarr_sha256 (str) – The SHA256 checksum of the zarr cache.

  • description (str) – One line description of the annotation.

  • citations (list of Citation) – List of citations for the annotation.

download()[source]

Downloads the zarr URL and stores it in the cache directory.

get_annotation_type_from_chromomosome(a_type, chrom_id, full_table=False)[source]

Returns all elements of type a_type from chromosome specified

get_chromosome_annotations(id)[source]

Returns the pandas dataframe for the chromosome with the specified id.

get_genes_from_chromosome(chrom_id, full_table=False)[source]

Returns all elements of type gene from annotation

is_cached()[source]

Returns True if this annotation is cached locally.

Demographic Models

class stdpopsim.DemographicModel[source]

Class representing a demographic model.

Instances of this class are constructed by model implementors, following the developer documentation. To instead obtain a pre-specified model as listed in the Catalog, see Species.get_demographic_model.

Variables
  • id (str) – The unique identifier for this model. DemographicModel IDs should be short and memorable, and conform to the stdpopsim naming conventions for demographic models.

  • description (str) – A short description of this model as it would be used in written text, e.g., “Three population Out-of-Africa”. This should describe the model itself and not contain author or year information.

  • long_description (str) – A concise, but detailed, summary of the model.

  • generation_time (int) – Mean inter-generation interval, in years.

  • populations (list of Population) – A list of Population, to provide each population with a unique ID and description.

  • qc_model (DemographicModel or None) – An independent implementation of the model, against which the model’s accuracy is validated. This should not be set by the user, and may be None if no QC implementation exists yet.

  • citations (list of Citation) – A list of Citation, that describe the primary reference(s) for the model.

  • demographic_events (list of msprime.DemographicEvent) – A list of msprime.DemographicEvent subclasses, that define changes to the populations through time, such as population size changes or mass migrations. See the msprime API documentation for more information.

  • population_configurations (list of msprime.PopulationConfiguration) – A list of msprime.PopulationConfiguration, one for each population, to set the population metadata, initial size and growth rate parameters.

  • migration_matrix (list of list of int) – The initial migration matrix. See the migration_matrix parameter to msprime.simulate().

static empty(**kwargs)[source]

Return a model with the mandatory attributes filled out.

equals(other, rtol=1e-08, atol=1e-05)[source]

Returns True if this model is equal to the specified model to the specified numerical tolerance (as defined by numpy.allclose).

We use the ‘equals’ method here rather than the equality operator because we need to be able to specifiy the numerical tolerances.

get_demography_debugger()[source]

Returns an msprime.DemographyDebugger instance initialized with the parameters for this model. Please see the msprime documentation for details on how to use a DemographyDebugger.

Returns

A DemographyDebugger instance for this DemographicModel.

Return type

msprime.DemographyDebugger

get_samples(*args)[source]

Returns a list of msprime.Sample objects, with the number of samples from each population determined by the positional arguments. For instance, model.get_samples(2, 5, 7) would return a list of 14 samples, two of which are from the model’s first population (i.e., with population ID model.populations[0].id), five are from the model’s second population, and seven are from the model’s third population. The number of of arguments must be less than or equal to the number of “sampling” populations, model.num_sampling_populations; if the number of arguments is less than the number of sampling populations, then remaining numbers are treated as zero.

register_qc(qc_model)[source]

Register a QC model implementation for this model.

verify_equal(other, rtol=1e-08, atol=1e-05)[source]

Equivalent to the equals() method, but raises a UnequalModelsError if the models are not equal rather than returning False.

class stdpopsim.Population[source]

Class recording metadata representing a population in a simulation.

Variables
  • id (str) – The id of the population.

  • description (str) – a short description of the population

  • sampling_time (int) – an integer value which indicates how many generations prior to the present individuals should samples should be drawn from this population. If None, sampling not allowed from this population (default = 0).

asdict()[source]

Returns a dictionary representing the metadata about this population.

Generic models

The Catalog contains simulation models from the literature that are defined for particular species. It is also useful to be able to simulate more generic models, which are documented here. Please see the Running a generic model for examples of using these models.

class stdpopsim.PiecewiseConstantSize(N0, *args)[source]

Class representing a generic simulation model that can be run to output a tree sequence. This is a piecewise constant size model, which allows for instantaneous population size change over multiple epochs in a single population.

Parameters
  • N0 (float) – The initial effective population size

  • args – Each subsequent argument is a tuple (t, N) which gives the time at which the size change takes place and the population size.

The usage is best illustrated by an example:

model1 = stdpopsim.PiecewiseConstantSize(N0, (t1, N1))  # One change
model2 = stdpopsim.PiecewiseConstantSize(N0, (t1, N1), (t2, N2))  # Two changes
class stdpopsim.IsolationWithMigration(NA, N1, N2, T, M12, M21)[source]

Class representing a generic simulation model that can be run to output a tree sequence. A generic isolation with migration model where a single ancestral population of size NA splits into two populations of constant size N1 and N2 time T generations ago, with migration rates M12 and M21 between the split populations. Sampling is disallowed in population index 0, as this is the ancestral population.

Parameters
  • NA (float) – The initial ancestral effective population size

  • N1 (float) – The effective population size of population 1

  • N2 (float) – The effective population size of population 2

  • T (float) – Time of split between populations 1 and 2 (in generations)

  • M12 (float) – Migration rate from population 1 to 2

  • M21 (float) – Migration rate from population 2 to 1

Example usage:

model1 = stdpopsim.IsolationWithMigration(NA, N1, N2, T, M12, M21)

Simulation Engines

Support for additional simulation engines can be implemented by subclassing the abstract Engine class, and registering an instance of the subclass with register_engine(). These are usually not intended to be instantiated directly, but should be accessed through the main entry point, get_engine().

stdpopsim.get_engine(id)[source]

Returns the simulation engine with the specified id.

Parameters

id (str) – The string identifier for the requested engine. The currently supported engines are “msprime” and “slim”.

Returns

A simulation engine object with a simulate() method.

Return type

Engine

stdpopsim.get_default_engine()[source]

Returns the default simulation engine (msprime).

Return type

Engine

stdpopsim.register_engine(engine)[source]

Registers the specified simulation engine.

Parameters

engine (Engine) – The simulation engine object to register.

class stdpopsim.Engine[source]

Abstract class representing a simulation engine.

To implement a new simulation engine, one should inherit from this class. At a minimum, the id, description and citations attributes must be set, and the simulate() and get_version() methods must be implemented. See msprime example in engines.py.

Variables
  • id (str) – The unique identifier for the simulation engine.

  • description (str) – A short description of this engine.

  • citations (list of Citation) – A list of citations for the simulation engine.

get_version()[source]

Returns the version of the engine.

Return type

str

simulate(demographic_model, contig, samples, *, seed=None, dry_run=False)[source]

Simulates the model for the specified contig and samples. demographic_model, contig, and samples must be specified.

Parameters
  • demographic_model (DemographicModel) – The demographic model to simulate.

  • contig (Contig) – The contig, defining the length, mutation rate, and recombination rate(s).

  • samples (list of msprime.simulations.Sample) – The samples to be obtained from the simulation.

  • seed (int) – The seed for the random number generator.

  • dry_run (bool) – If True, the simulation engine will return None without running the simulation.

Returns

A succinct tree sequence.

Return type

tskit.trees.TreeSequence or None

class stdpopsim.engines._MsprimeEngine[source]

Bases: stdpopsim.engines.Engine

description = 'Msprime coalescent simulator'
id = 'msprime'
simulate(demographic_model, contig, samples, *, seed=None, msprime_model=None, msprime_change_model=None, dry_run=False, **kwargs)[source]

Simulate the demographic model using msprime. See Engine.simulate() for definitions of parameters defined for all engines.

Parameters
  • msprime_model (str) – The msprime simulation model to be used. One of hudson, dtwf, smc, or smc_prime. See msprime API documentation for details.

  • msprime_change_model (list of (float, str) tuples) – A list of (time, model) tuples, which changes the simulation model to the new model at the time specified.

  • dry_run (bool) – If True, end_time=0 is passed to msprime.simulate() to initialise the simulation and then immediately return.

  • **kwargs – Further arguments passed to msprime.simulate()

class stdpopsim.slim_engine._SLiMEngine[source]

Bases: stdpopsim.engines.Engine

description = 'SLiM forward-time Wright-Fisher simulator'
id = 'slim'
recap_and_rescale(ts, demographic_model, contig, samples, mutation_types=None, extended_events=None, slim_scaling_factor=1.0, seed=None, **kwargs)[source]

Apply post-SLiM transformations to ts. This rescales node times, does recapitation, simplification, and adds neutral mutations.

If the SLiM engine was used to output a SLiM script, and the script was run outside of stdpopsim, this function can be used to transform the SLiM tree sequence following the procedure that would have been used if stdpopsim had run SLiM itself. The parameters after ts have the same meaning as for simulate(), and the values for demographic_model, contig, samples, and slim_scaling_factor should match those that were used to generate the SLiM script with simulate().

Parameters

ts (pyslim.SlimTreeSequence) – The tree sequence output by SLiM.

Warning

The recap_and_rescale() function is provided in the hope that it will be useful. But as we can’t anticipate what changes you’ll make to the SLiM code before using it, the stdpopsim source code should be consulted to determine if the behaviour is appropriate for your case.

simulate(demographic_model, contig, samples, *, seed=None, mutation_types=None, extended_events=None, slim_path=None, slim_script=False, slim_scaling_factor=1.0, slim_burn_in=10.0, dry_run=False)[source]

Simulate the demographic model using SLiM. See Engine.simulate() for definitions of the demographic_model, contig, and samples parameters.

Parameters
  • seed (int) – The seed for the random number generator.

  • slim_path (str) – The full path to the slim executable, or the name of a command in the current PATH.

  • slim_script (bool) – If true, the simulation will not be executed. Instead the generated SLiM script will be printed to stdout.

  • slim_scaling_factor (float) – Rescale model parameters by the given value, to speed up simulation. Population sizes and generation times are divided by this factor, whereas the mutation rate, recombination rate, and growth rates are multiplied by the factor. See SLiM manual: 5.5 Rescaling population sizes to improve simulation performance.

  • slim_burn_in (float) – Length of the burn-in phase, in units of N generations.

  • dry_run (bool) – If True, run the first generation setup and then end the simulation.