Simulator class
Module containing simulator class.
- class chromax.simulator.Simulator(genetic_map: Path | DataFrame, trait_names: List[str] | None = None, chr_column: str = 'CHR.PHYS', position_column: str = 'cM', recombination_column: str = 'RecombRate', mutation_probability: float = 0.0, h2: ndarray | None = None, seed: int | None = None, device: Device = None, backend: str | Client = None)[source]
Breeding simulator class. It can perform the most common operation of a breeding program.
- Parameters:
genetic_map (Path or DataFrame) – the path, or dataframe, containing the genetic map. It needs to have all the columns specified in trait_names, CHR.PHYS (with the name of the marker chromosome), and one between cM or RecombRate.
trait_names (List of strings) – column names in the genetic_map. The values of the columns are the marker effects on the trait for each marker. The default value is Yield.
chr_column (str) – name of the column containing the chromosome identifier. The default value is CHR.PHYS.
position_column (str) – name of the column containing the position in cM of the marker. The default value is cM.
recombination_column (str) – name of the column containing the probability that a recombination happens before the current marker and after the previous one. The default value is RecombRate.
mutation_probability (float) – The probability of having a mutation in a marker.
h2 (array of float) – narrow-sense heritability value for each trait. The default value is 0.5 for each trait.
seed (int) – the random seed for reproducibility.
device (XLA Device) – the device for computing simulations. It will be automatically selected if not specified; by default to the first available GPU or TPU, or the CPU if neither is present.
backend (str or XLA client) – the backend of the device. Common choices are gpu, cpu or tpu.
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> f1 = simulator.load_population(sample_data.genome) >>> f2, _ = simulator.random_crosses(f1, n_crosses=10, n_offspring=20) >>> f2.shape (10, 20, 9839, 2)
- set_seed(seed: int)[source]
Set random seed for reproducibility.
- Parameters:
seed (int) – random seed.
- load_population(file_name: Path | str) Bool[Array, 'n m d'] [source]
Load a population from file.
- Parameters:
file_name (path) – path of the file with the population genome.
- Returns:
loaded population of shape (n, m, d), where n is the number of individual, m is the total number of marker, and d is the diploidy of the population.
- Return type:
ndarray
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> f1 = simulator.load_population(sample_data.genome) >>> f1.shape (371, 9839, 2)
- save_population(population: Bool[Array, 'n m d'], file_name: Path | str)[source]
Save a population to file.
- Parameters:
population (ndarray) – population to save.
- File_name:
file path to save the population.
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> f1 = simulator.load_population(sample_data.genome) >>> f2, _ = simulator.random_crosses(f1, n_crosses=10, n_offspring=20) >>> simulator.save_population(f2, "pop_file")
- cross(parents: Bool[Array, 'n 2 m d']) Bool[Array, 'n m d'] [source]
Main function that computes crosses from a list of parents.
- Parameters:
parents (ndarray) – parents to compute the cross. The shape of the parents is (n, 2, m, d), where n is the number of parents, m is the number of markers, and d is the ploidy.
- Returns:
offspring population of shape (n, m, d).
- Return type:
ndarray
- Example:
>>> from chromax import Simulator, sample_data >>> import numpy as np >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> f1 = simulator.load_population(sample_data.genome) >>> parents_indices = np.array([ [1, 5], [4, 7], [5, 6] ]) >>> parents = f1[parents_indices] >>> f2 = simulator.cross(parents) >>> f2.shape (3, 9839, 2)
- property differentiable_cross_func: Callable
Experimental features that return a differentiable version of the cross function.
- The differentiable crossing function takes as input:
- population (array): starting population from which performing the crosses.
The shape of the population is (n, m, d).
- cross_weights (array): Array of shape (l, n, d). It is used to compute
l crosses, starting from a weighted average of the n possible parents. When the n-axis has all zeros except of a single element equals to one, this function is equivalent to the cross function.
random_key (JAX random key): random key used for recombination sampling.
And returns a population of shape (l, m, d).
- Example:
>>> from chromax import Simulator, sample_data >>> import numpy as np >>> import jax >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> diff_cross = simulator.differentiable_cross_func >>> def mean_gebv(pop, weights, random_key): new_pop = diff_cross(pop, weights, random_key) return simulator.GEBV(new_pop, raw_array=True).mean() >>> grad_f = jax.grad(mean_gebv, argnums=1) >>> f1 = simulator.load_population(sample_data.genome) >>> weights = np.random.uniform(size=(10, len(f1), 2)) >>> weights /= weights.sum(axis=1, keepdims=True) >>> random_key = jax.random.key(42) >>> grad_value = grad_f(f1, weights, random_key) >>> grad_value.shape (10, 371, 2)
- double_haploid(population: Bool[Array, 'n m d'], n_offspring: int = 1) Bool[Array, 'n n_offspring m d'] [source]
Computes the double haploid of the input population.
- Parameters:
population (ndarray) – input population of shape (n, m, 2).
n_offspring (int) – number of offspring per plant. The default value is 1.
- Returns:
output population of shape (n, n_offspring, m, 2). This population will be homozygote.
- Return type:
ndarray
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> f1 = simulator.load_population(sample_data.genome) >>> dh = simulator.double_haploid(f1, n_offspring=10) >>> dh.shape (371, 10, 9839, 2)
- diallel(population: Bool[Array, 'n m d'], n_offspring: int = 1) Bool[Array, 'n*(n-1)/2 n_offspring m d'] [source]
Diallel crossing function (crossing between every possible couple) except self-crossing.
- Parameters:
population (ndarray) – input population of shape (n, m, d).
n_offspring (int) – number of offspring per cross. The default value is 1.
- Returns:
output population of shape (l, n_offspring, m, d), where l is the number of possible pair, i.e n * (n-1) / 2.
- Return type:
ndarray
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> f1 = simulator.load_population(sample_data.genome)[:10] >>> f2 = simulator.diallel(f1, n_offspring=10) >>> f2.shape (45, 10, 9839, 2)
- random_crosses(population: Bool[Array, 'n m d'], n_crosses: int, n_offspring: int = 1) Tuple[Bool[Array, 'n_crosses n_offspring m d'], Int[Array, 'n_crosses 2']] [source]
Computes random crosses on a population.
- Parameters:
population (ndarray) – input population of shape (n, m, d).
n_crosses (int) – number of random crosses to perform.
n_offspring (int) – number of offspring per cross. The default value is 1.
- Returns:
output population of shape (n_crosses, n_offspring, m, d) and parent indices of shape (n_crosses, 2) of performed crosses.
- Return type:
tuple of two ndarrays
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> f1 = simulator.load_population(sample_data.genome) >>> f2, parent_ids = simulator.random_crosses(f1, 100, n_offspring=10) >>> f2.shape (100, 10, 9839, 2) >>> parent_ids.shape (100, 2)
- select(population: Bool[Array, '_g n m d'], k: int, f_index: Callable[[Bool[Array, 'n m d']], Float[Array, 'n']] | None = None) Tuple[Bool[Array, '_g k m d'], Int[Array, '_g k']] [source]
Function to select individuals based on their score (index).
- Parameters:
population (ndarray) – input population of shape (n, m, d), or shape (g, n, m, d), to select k individual from each group population group g.
k (int) – number of individual to select.
f_index (Callable) – function that computes a score from each individual. The function accepts as input the population, i.e. and array of shape (n, m, d) and returns a n float numbers. The default f_index is the conventional index, i.e. the sum of the marker effects masked with the SNPs from the genetic_map.
- Returns:
output population of shape (k, m, d) or (g, k, m, d), depending on the input population, and respective indices of shape (k,) or (g, k)
- Return type:
tuple of two ndarrays
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map, trait_names=["Yield"]) >>> f1 = simulator.load_population(sample_data.genome) >>> len(f1), simulator.GEBV(f1).mean().values (371, [8.223844]) >>> f2, selected_indices = simulator.select(f1, k=20) >>> len(f2), simulator.GEBV(f2).mean().values (20, [14.595136]) >>> selected_indices.shape (20,)
- GEBV(population: Bool[Array, 'n m d'], *, raw_array: bool = False) DataFrame | ndarray [source]
Computes the Genomic Estimated Breeding Values using the data from the genetic_map.
- Parameters:
population (ndarray) – input population of shape (n, m, d).
raw_array (bool) – whether to return a raw array or a DataFrame. Default value is False.
- Returns:
a DataFrame (or array) with n rows and a column for each trait. It contains the GEBV of each trait for each individual.
- Return type:
DataFrame or ndarray
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map) >>> f1 = simulator.load_population(sample_data.genome) >>> simulator.GEBV(f1).mean() Heading Date 0.196119 Protein Content -0.228718 Plant Height -5.888406 Thousand Kernel Weight -1.029418 Yield 8.223843 Fusarium Head Blight 5.318052 Spike Emergence Period -0.933169 dtype: float32
- create_environments(num_environments: int) Float[Array, 'num_environments'] [source]
Create environments to phenotype the population.
In practice, it generates random numbers from a normal distribution.
- Parameters:
num_environments (int) – number of environments to create.
- Returns:
array of floating point numbers. This output can be used for the function phenotype.
- Return type:
ndarray
- phenotype(population: Bool[Array, 'n m d'], *, num_environments: int | None = None, environments: ndarray | None = None, raw_array: bool = False) DataFrame | ndarray [source]
Simulates the phenotype of a population.
This uses the Genotype-by-Environment model described in AlphaSimR.
- Parameters:
population (ndarray) – input population of shape (n, m, d)
num_environments (int) – number of environments to test the population. Default value is 1.
environments (ndarray) – environments to test the population. Each environment must be represented by a floating number in the range (-1, 1). When drawing new environments use normal distribution to maintain heretability semantics.
raw_array (bool) – whether to return a raw array or a DataFrame. Default value is False.
- Returns:
a DataFrame (or array) with n rows and a column for each trait. It contains the simulated phenotype for each individual.
- Return type:
DataFrame or ndarray
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map, seed=42) >>> f1 = simulator.load_population(sample_data.genome) >>> envs = simulator.create_environments(4) >>> simulator.phenotype(f1, environments=envs).mean() Heading Date 0.105397 Protein Content -0.172026 Plant Height -5.813669 Thousand Kernel Weight -1.372738 Yield 8.306302 Fusarium Head Blight 4.286477 Spike Emergence Period -0.575061 dtype: float32
- corrcoef(population: Bool[Array, 'n m d']) Float[Array, 'n'] [source]
Computes the correlation coefficient of the population against its centroid.
It can be used as an indicator of variance in the population.
- Parameters:
population (ndarray) – input population of shape (n, m, d)
- Returns:
vector of length n, containing the correlation coefficient of each individual against the average of the population.
- Return type:
ndarray
- Example:
>>> from chromax import Simulator, sample_data >>> simulator = Simulator(genetic_map=sample_data.genetic_map, seed=42) >>> f1 = simulator.load_population(sample_data.genome) >>> corrcoef = simulator.corrcoef(f1) >>> corrcoef.shape (371,)