singlecelldata package

singlecelldata.singlecell module

class singlecelldata.singlecell.SingleCell(dataset, data, celldata=None, genedata=None)

Bases: object

A python class for managing single-cell RNA-seq datasets.

dataset

A string for the name of the dataset.

Type

str

data

The main dataframe or assay for storing the gene expression counts. The shape of this dataframe is (d x n) where d is the number of genes (features) and n is the number of cells (samples).

Type

Pandas Dataframe

celldata

The dataframe or assay used to store more data (metadata) about cells. The shape of this dataframe is (n x m), where m is number of columns representing different types of information about the cells, such as cell types etc.

Type

Pandas Dataframe

genedata

The dataframe or assay used to store more data (metadata) about genes. The shape of this dataframe is (d x m), where m is number of columns representing different types of information about the genes such as gene names etc.

Type

Pandas Dataframe

dim

Variable representing the dimensionality of the data assay. It is (d, n).

Type

tuple

addCellData(col_data, col_name)

Adds a column in the celldata dataframe.

Parameters
  • col_data (List or Numpy array) – The data to be added to the celldata dataframe. The size of the List or Numpy array should be equal to n.

  • col_name (str) – The name of the data column.

addGeneData(col_data, col_name)

Adds a column in the genedata dataframe.

Parameters
  • col_data (List or Numpy array) – The data to be added to the genedata dataframe. The size of the List or Numpy array should be equal to d.

  • col_name (str) – The name of the data column.

checkCellData(column)

Checks whether a column exists in the celldata dataframe.

Parameters

column (str) – The name of the column.

Returns

True if column exists in the dataframe, False otherwise.

Return type

bool

checkGeneData(column)

Checks whether a column exists in the genedata dataframe.

Parameters

column (str) – The name of the column.

Returns

True if column exists in the dataframe, False otherwise.

Return type

bool

getCellData(column)

Returns data stored in the celldata dataframe.

Parameters

column (str) – The name of the data column.

Returns

A n-dimensional array containing cell data by the column name.

Return type

Numpy array

Raises

ValueError – If column does not exist in the celldata dataframe.

getCounts()

Returns a Numpy array of the counts/data in the data dataframe. This method is called from the class instance and requires no input arguments.

Returns

A (d x n) array of gene expression counts/data.

Return type

Numpy array

getDistinctCellTypes(column)

Returns the unique cell type information stored in the celldata dataframe.

Parameters

column (str) – This parameter is the column name of the cell labels in the celldata assay.

Returns

Containing unique values in the celldata dataframe under the column passed into this function.

Return type

Numpy array

getGeneData(column)

Returns data stored in the genedata dataframe.

Parameters

column (str) – The name of the data column.

Returns

A d-dimensional array containing gene data by the column name.

Return type

Numpy array

Raises

ValueError – If column does not exist in the genedata dataframe.

getNumericCellLabels(column)

Returns the numeric (int) cell labels from the celldata assay which contains the string or int cell labels. This method is useful when computing Rand Index or Adjusted Rand Index after clustering.

Parameters

column (str) – This parameter is the column name of the string or int cell labels in the celldata assay.

Returns

Array containing the integer representation of data in the celldata dataframe under the column passed into this function.

Return type

Numpy array (int)

isSpike(spike_type, gene_names_column)

Prints a message if spike-ins are detected in the dataset. Creates a filter to remove spike-ins from the dataset when counts/data is returned using getCounts() method.

Parameters

spike_type (str) – A string representing the type of spike-in.

print()

Prints a summary of the single-cell dataset.

removeCellData(column)

Removes a column from the celldata dataframe. First checks whether the column exists in the celldata dataframe.

Parameters

column (str) – The name of the data column.

removeGeneData(column)

Removes a column from the genedata dataframe. First checks whether the column exists in the genedata dataframe.

Parameters

column (str) – The name of the data column.

setCounts(new_counts)

Sets the new counts values in the data dataframe.

Parameters

new_counts (Numpy array) – A numpy array with the shape = dim, representing new count values. The data dataframe will be updated with the new count values.