API

Species definitions

The Catalog contains a large number of species and simulation model definitions, which are built using a number of classes defined here. These are usually not intended to be instantiated directly, but should be accessed through the main entrypoint, get_species().

stdpopsim.get_species(id)[source]
class stdpopsim.Species(*, id: str, name: str, common_name: str, genome: int, generation_time=1, generation_time_citations=NOTHING, population_size=1, population_size_citations=NOTHING, demographic_models=NOTHING, genetic_maps=NOTHING)[source]

Class representing a species in the catalog.

Variables
  • id (str) – The unique identifier for this species. The species ID is the three first letters of the genus name followed by the first three letters of the species name, and does not contain any spaces or punctuation. The usual scheme is to use the first three letters of the genus and species (similar to the approach used in the UCSC genome browser), e.g., “HomSap” is the ID for Homo Sapiens.

  • name (str) – The full name of this species in binominal nomenclature as it would be used in written text, e.g., “Homo sapiens”.

  • common_name (str) – The name of this species as it would most often be used informally in written text, e.g., “human”, or “Orang-utan”. Where no common name for the species exist, use the most common abbreviation, e.g., “E. Coli”.

  • genome (stdpopsim.Genome) – The Genome instance describing the details of this species’ genome.

  • generation_time (float) – The current best estimate for the generation time of this species in years. Note that individual demographic models in the catalog may or may not use this estimate: each model uses the generation time that was used in the original publication(s).

  • generation_time_citations (list) – A list of Citation objects providing justification for the genertion time estimate.

  • population_size (float) – The current best estimate for the population size of this species. Note that individual demographic models in the catalog may or may not use this estimate: each model uses the populations sizes defined in the original publication(s).

  • population_size_citations (list) – A list of Citation objects providing justification for the population size estimate.

  • demographic_models (list()) – This list of DemographicModel instances in the catalog for this species.

property ensembl_id

Returns the ID of this species for the Ensembl REST API. This is the species name, underscore delimited and in lowercase.

get_contig(chromosome, genetic_map=None, length_multiplier=1)[source]

Returns a Contig instance describing a section of genome that is to be simulated based on empirical information for a given species and chromosome.

Parameters
  • chromosome (str) – The ID of the chromosome to simulate.

  • genetic_map (str) – If specified, obtain recombination rate information from the genetic map with the specified ID. If None, simulate using a default uniform recombination rate on a region with the length of the specified chromosome. The default rates are species- and chromosome- specific, and can be found in the Catalog. (Default: None)

  • length_multiplier (float) – If specified, simulate a region of length length_multiplier times the length of the specified chromosome with the same chromosome-specific mutation and recombination rates. This option cannot currently be used in conjunction with the genetic_map argument.

Return type

Contig

Returns

A Contig describing a simulation of the section of genome.

get_demographic_model(id)[source]

Returns a model with the specified id.

  • TODO explain where we find models from the catalog.

class stdpopsim.Genome(chromosomes=NOTHING, *, mutation_rate_citations=NOTHING, recombination_rate_citations=NOTHING, assembly_citations=NOTHING, assembly_name=None, assembly_accession=None)[source]

Class representing the genome for a species.

Variables
  • chromosomes (list) – A list of Chromosome objects.

  • mutation_rate_citations (list) – A list of Citation objects providing justification for the mutation rate estimate.

  • recombination_rate_citations (list) – A list of Citation objects providing justification for the recombination rate estimate.

  • assembly_citations (list) – A list of Citation objects providing reference to the source of the genome assembly.

  • length (int) – The total length of the genome.

get_chromosome(id)[source]

Returns the chromosome with the specified id.

property mean_mutation_rate

The length-weighted mean mutation rate across all chromosomes.

property mean_recombination_rate

The length-weighted mean recombination rate across all chromosomes.

class stdpopsim.Chromosome(*, id: str, length, recombination_rate: float, mutation_rate: float, synonyms=NOTHING)[source]

Class representing a single chromosome for a species.

Todo

Define the facilities that this object provides.

class stdpopsim.Contig(*, recombination_map=None, mutation_rate: float = None, genetic_map=None)[source]

Class representing a contiguous region of genome that is to be simulated. This contains the information about mutation rates and recombination rates that are needed to simulate this region.

Variables
  • mutation_rate (float) – The rate of mutation per base per generation.

  • recombination_map (msprime.simulations.RecombinationMap) – The recombination map for the region. See the msprime documentation for more details.

class stdpopsim.GeneticMap(species, id=None, url=None, file_pattern=None, description=None, long_description=None, citations=None)[source]

Class representing a genetic map for a species. Provides functionality for downloading and cacheing recombination maps from a remote URL.

Variables
  • url (str) – The URL where the packed and compressed genetic map can be obtained.

  • file_pattern (str) – The pattern used to map name individual chromosome to files, suitable for use with Python’s str.format() method.

download()[source]

Downloads this genetic map from the source URL and stores it in the cache directory. If the map directory already exists it is first removed.

get_chromosome_map(id)[source]

Returns the genetic map for the chromosome with the specified id.

is_cached()[source]

Returns True if this map is cached locally.

class stdpopsim.DemographicModel(*, id: str, description: str, long_description: str, generation_time: int, citations=NOTHING, demographic_events=NOTHING, population_configurations=NOTHING, migration_matrix=NOTHING, populations=NOTHING, qc_model=None)[source]

Class representing a demographic model.

This class is indended to be used by model implementors. To instead obtain a pre-specified model, see Species.get_demographic_model.

Variables
  • id (str) – The unique identifier for this model. DemographicModel IDs should be short and memorable, perhaps as an abbreviation of the model’s name.

  • description (str) – A short description of this model as it would be used in written text, e.g., “Three population Out-of-Africa”. This should describe the model itself and not contain author or year information.

  • long_description (str) – A concise, but detailed, summary of the model.

  • generation_time (int) – Mean inter-generation interval, in years.

  • populations (list of Population) – TODO

  • qc_model (DemographicModel or None) – An independent implementation of the model, against which the model’s accuracy is validated. This should not be set by the user, and may be None if no QC implementation exists yet.

  • citations (list of Citation) – TODO

  • demographic_events (list of msprime.DemographicEvent) – TODO

  • population_configurations (list of msprime.PopulationConfiguration) – TODO

  • migration_matrix (list of list of int) – TODO

static empty(**kwargs)[source]

Return a model with the mandatory attributes filled out.

equals(other, rtol=1e-08, atol=1e-05)[source]

Returns True if this model is equal to the specified model to the specified numerical tolerance (as defined by numpy.allclose).

We use the ‘equals’ method here rather than the equality operator because we need to be able to specifiy the numerical tolerances.

get_demography_debugger()[source]

Returns an msprime.DemographyDebugger instance initialized with the parameters for this model. Please see the msprime documentation for details on how to use a DemographyDebugger.

Returns

A DemographyDebugger instance for this DemographicModel.

Return type

msprime.DemographyDebugger

get_samples(*args)[source]

Returns a list of msprime.Sample objects, with the number of samples from each population determined by the positional arguments. For instance, model.get_samples(2, 5, 7) would return a list of 14 samples, two of which are from the model’s first population (i.e., with population ID model.populations[0].id), five are from the model’s second population, and seven are from the model’s third population. The number of of arguments must be less than or equal to the number of “sampling” populations, model.num_sampling_populations; if the number of arguments is less than the number of sampling populations, then remaining numbers are treated as zero.

register_qc(qc_model)[source]

Register a QC model implementation for this model.

verify_equal(other, rtol=1e-08, atol=1e-05)[source]

Equivalent to the equals() method, but raises a UnequalModelsError if the models are not equal rather than returning False.

class stdpopsim.Citation(*, doi: str, author: str, year: int, reasons=NOTHING)[source]

A reference to the literature that should be acknowledged by users of stdpopsim.

Variables
  • doi (str) – The DOI for the publication providing the definitive reference.

  • author (str) – Short author list, .e.g, “Author 1 et. al”.

  • year (int) – Year of publication as a 4 digit integer, e.g. 2008.

because(reasons)[source]

Returns a new Citation with the given reasons.

fetch_bibtex()[source]

Retrieve the bibtex of a citation from Crossref.

static merge(citations)[source]

Returns a deduplicated list of Citation objects.

Generic models

The Catalog contains simulation models from the literature that are defined for particular species. It is also useful to be able to simulate more generic models, which are documented here. Please see the Running a generic model for examples of using these models.

class stdpopsim.PiecewiseConstantSize(N0, *args)[source]

Class representing a generic simulation model that can be run to output a tree sequence. This is a piecewise constant size model, which allows for instantaneous population size change over multiple epochs in a single population.

Variables
  • N0 (float) – The initial effective population size

  • args – Each subsequent argument is a tuple (t, N) which gives the time at which the size change takes place and the population size.

The usage is best illustrated by an example:

model1 = stdpopsim.PiecewiseConstantSize(N0, (t1, N1)) # One change
model2 = stdpopsim.PiecewiseConstantSize(N0, (t1, N1), (t2, N2)) # Two changes
class stdpopsim.IsolationWithMigration(NA, N1, N2, T, M12, M21)[source]

Class representing a generic simulation model that can be run to output a tree sequence. A generic isolation with migration model where a single ancestral population of size NA splits into two populations of constant size N1 and N2 time T generations ago, with migration rates M12 and M21 between the split populations. Sampling is disallowed in population index 0, as this is the ancestral population.

Variables
  • NA (float) – The initial ancestral effective population size

  • N1 (float) – The effective population size of population 1

  • N2 (float) – The effective population size of population 2

  • T (float) – Time of split between populations 1 and 2 (in generations)

  • M12 (float) – Migration rate from population 1 to 2

  • M21 (float) – Migration rate from population 2 to 1

Example usage:

model1 = stdpopsim.IsolationWithMigration(NA, N1, N2, T, M12, M21)

Simulation Engines

Support for additional simulation engines can be implemented by subclassing the abstract Engine class, and registering an instance of the subclass with register_engine(). These are usually not intended to be instantiated directly, but should be accessed through the main entrypoint, get_engine().

stdpopsim.get_engine(id)[source]

Returns the simulation engine with the specified id.

stdpopsim.get_default_engine()[source]

Returns the default simulation engine (msprime).

stdpopsim.register_engine(engine)[source]

Registers the specified simulation engine.

class stdpopsim.Engine[source]

Abstract class representing a simulation engine.

To implement a new simulation engine, one should inherit from this class. At a minimum, the id, description and citations attributes must be set, and the simulate() and get_version() methods must be implemented. See msprime example in engines.py.

Variables
  • id (str) – The unique identifier for the simulation engine.

  • description (str) – A short description of this engine.

  • citations (list of Citation) – A list of citations for the simulation engine.

get_version()[source]

Returns the version of the engine.

Return type

str

simulate(demographic_model=None, contig=None, samples=None, seed=None, dry_run=False)[source]

Simulates the model for the specified contig and samples.

Parameters
  • demographic_model (DemographicModel) – The demographic model to simulate.

  • contig (msprime.simulations.Contig) – The contig, defining the length and recombination rate(s).

  • samples (list of msprime.simulations.Sample) – The samples to be obtained from the simulation.

  • seed (int) – The seed for the random number generator.

  • dry_run (bool) – If True, the simulation engine will return None without running the simulation.

Returns

A succinct tree sequence.

Return type

tskit.trees.TreeSequence or None

class stdpopsim.engines._MsprimeEngine[source]

Bases: stdpopsim.engines.Engine

description = 'Msprime coalescent simulator'
id = 'msprime'
simulate(demographic_model=None, contig=None, samples=None, seed=None, msprime_model=None, msprime_change_model=None, dry_run=False)[source]

Simulate the demographic model using msprime. See Engine.simulate() for definitions of parameters defined for all engines.

Parameters
  • msprime_model (str) – The msprime simulation model to be used. One of hudson, dtwf, smc, or smc_prime. See msprime API documentation for details.

  • msprime_change_model (list of (float, str) tuples) – A list of (time, model) tuples, which changes the simulation model to the new model at the time specified.

  • dry_run (bool) – If True, end_time=0 is passed to msprime.simulate() to initialise the simulation and then immediately return.

class stdpopsim.slim_engine._SLiMEngine[source]

Bases: stdpopsim.engines.Engine

description = 'SLiM forward-time Wright-Fisher simulator'
id = 'slim'
recap_and_rescale(ts, demographic_model, contig, samples, slim_scaling_factor=1.0, seed=None, **kwargs)[source]

Apply post-SLiM transformations to ts. This rescales node times, does recapitation, simplification, and adds neutral mutations.

If the SLiM engine was used to output a SLiM script, and the script was run outside of stdpopsim, this function can be used to transform the SLiM tree sequence following the procedure that would have been used if stdpopsim had run SLiM itself. The parameters after ts have the same meaning as for simulate(), and the values for demographic_model, contig, samples, and slim_scaling_factor should match those that were used to generate the SLiM script with simulate().

Parameters

ts (pyslim.SlimTreeSequence) – The tree sequence output by SLiM.

Warning

The recap_and_rescale() function is provided in the hope that it will be useful. But as we can’t anticipate what changes you’ll make to the SLiM code before using it, the stdpopsim source code should be consulted to determine if it’s behaviour is appropriate for your case.

simulate(demographic_model=None, contig=None, samples=None, seed=None, slim_path=None, slim_script=False, slim_scaling_factor=1.0, slim_burn_in=10.0, dry_run=False)[source]

Simulate the demographic model using SLiM. See Engine.simulate() for definitions of the demographic_model, contig, and samples parameters.

Parameters
  • seed (int) – The seed for the random number generator.

  • slim_path (str) – The full path to the slim executable, or the name of a command in the current PATH.

  • slim_script (bool) – If true, the simulation will not be executed. Instead the generated SLiM script will be printed to stdout.

  • slim_scaling_factor (float) – Rescale model parameters by the given value, to speed up simulation. Population sizes and generation times are divided by this factor, whereas the mutation rate, recombination rate, and growth rates are multiplied by the factor. See SLiM manual: 5.5 Rescaling population sizes to improve simulation performance.

  • slim_burn_in (float) – Length of the burn-in phase, in units of N generations.

  • dry_run (bool) – If True, run the first generation setup and then end the simulation.