samplingsimulatorpy package

Submodules

samplingsimulatorpy.draw_samples module

samplingsimulatorpy.draw_samples.draw_samples(pop, reps, sample_size)

Draws samples of various sizes from a population

Parameters:
  • pop (pd.DataFrame) – The virtual population as a dataframe
  • reps (integer) – The number of replications for each sample size as an integer
  • sample_size (list) – The sample size for each one of the samples as a list
Returns:

A dataframe containing the sample numbers and sample values

Return type:

pd.DataFrame

Raises:
  • TypeError – pop input is a valid data frame
  • TypeError – pop name input is a valid string
  • TypeError – reps input is an integer
  • ValueError – reps input is greater than 0
  • TypeError – sample_size array contains only integers

Examples

>>> pop = generate_virtual_pop(100, np.random.normal, 0, 1)
>>> samples = draw_samples(pop, 3, [5, 10, 15, 20])

samplingsimulatorpy.generate_virtual_pop module

samplingsimulatorpy.generate_virtual_pop.generate_virtual_pop(size, population_name, distribution_func, *para)

Create a virtual population

Parameters:
  • size (int) – The size of the virtual population
  • population_name (str) – The population_name of the virtual population
  • distribution_func (func) – The function that came from numpy.random
  • *para (int) – The parameters the distribution_func is using
Returns:

The virtual population as a dataframe

Return type:

pd.DataFrame

Raises:
  • ValueError – size input is greater than 0
  • TypeError – size input is an integer
  • TypeError*para number of parameters for the distribution function

Examples

>>> from samplingsimulatorpy.generate_virtual_pop
                                                import generate_virtual_pop
>>> pop = generate_virtual_pop(100, "Height", np.random.normal, 0, 1)

samplingsimulatorpy.plot_sample_hist module

samplingsimulatorpy.plot_sample_hist.plot_sample_hist(pop, samples)

Creates a facetted plot of sample histograms from a population

Parameters:
  • pop (pd.DataFrame) – The virtual population as a dataframe
  • samples (pd.DataFrame) – The samples as a dataframe
Returns:

A grid of the sample distribution plots

Return type:

altair.vegalite.v3.api.Chart

Raises:
  • TypeError – if pop input is not a valid data frame
  • TypeError – if pop input is an empty data frame
  • ValueError – pop input should only contain numeric values
  • TypeError – if samples input is not a valid data frame
  • ValueError – samples input should only contain numeric values

Examples

>>> pop = generate_virtual_pop(100, "variable", normal, 0, 1)
>>> samples = draw_samples(pop, 3, [5, 10, 15, 20])
>>> plot_sample_hist(pop, samples)

samplingsimulatorpy.plot_sampling_hist module

samplingsimulatorpy.plot_sampling_hist.plot_sampling_hist(samples)

Create a gird of sampling distribution histogram of the mean of different sample sizes drawn from a population

Parameters:

samples (pd.DataFrame) – The samples as a dataframe. It should be an object created by draw_samples function. Otherwise, it should follow the column names of the output of the draw_samples function. If not, the function may not work.

Returns:

A facet chart of the sampling distribution plots

Return type:

altair.vegalite.v3.api.FacetChart

Raises:
  • TypeError – if samples input is not a valid data frame
  • ValueError – samples input should only contain numeric values
  • ValueError – samples data frame should have only 4 columns
  • KeyError – samples input should contain ‘replicate’, ‘size’, and ‘rep_size’ columns

Examples

>>> pop = generate_virtual_pop(1000, "Variable", normal, 0, 1)
>>> samples = draw_samples(pop, 100, [5, 10, 15, 20])
>>> plot_sampling_hist(samples)

samplingsimulatorpy.samplingsimulatorpy module

samplingsimulatorpy.stat_summary module

samplingsimulatorpy.stat_summary.stat_summary(population, samples, parameter)

This function creates a summary stats for population, samples and parameter(s) of interest

Parameters:
  • population (pd.DataFrame) – The virtual population
  • samples (pd.DataFrame) – The drawed samples
  • parameter (list) – The list of parameters
Raises:
  • TypeError – population input should be a dataframe contains value
  • TypeError – samples input should be a dataframe contains value
  • TypeError – parameter input should be a list contains value
  • AttributeError – parameter is interest for the summary stats
Returns:

The summary stats as a dataframe

Return type:

pd.DataFrame

Examples

>>> from samplingsimulatorpy.stat_summary import stat_summary
>>> stat_summary(pop, samples, [np.mean, np.std])

Module contents