singlecelldata package¶
singlecelldata.singlecell module¶
-
class
singlecelldata.singlecell.
SingleCell
(dataset, data, celldata=None, genedata=None)¶ Bases:
object
A python class for managing single-cell RNA-seq datasets.
-
dataset
¶ A string for the name of the dataset.
- Type
str
-
data
¶ The main dataframe or assay for storing the gene expression counts. The shape of this dataframe is (d x n) where d is the number of genes (features) and n is the number of cells (samples).
- Type
Pandas Dataframe
-
celldata
¶ The dataframe or assay used to store more data (metadata) about cells. The shape of this dataframe is (n x m), where m is number of columns representing different types of information about the cells, such as cell types etc.
- Type
Pandas Dataframe
-
genedata
¶ The dataframe or assay used to store more data (metadata) about genes. The shape of this dataframe is (d x m), where m is number of columns representing different types of information about the genes such as gene names etc.
- Type
Pandas Dataframe
-
dim
¶ Variable representing the dimensionality of the data assay. It is (d, n).
- Type
tuple
-
addCellData
(col_data, col_name)¶ Adds a column in the celldata dataframe.
- Parameters
col_data (List or Numpy array) – The data to be added to the celldata dataframe. The size of the List or Numpy array should be equal to n.
col_name (str) – The name of the data column.
-
addGeneData
(col_data, col_name)¶ Adds a column in the genedata dataframe.
- Parameters
col_data (List or Numpy array) – The data to be added to the genedata dataframe. The size of the List or Numpy array should be equal to d.
col_name (str) – The name of the data column.
-
checkCellData
(column)¶ Checks whether a column exists in the celldata dataframe.
- Parameters
column (str) – The name of the column.
- Returns
True if column exists in the dataframe, False otherwise.
- Return type
bool
-
checkGeneData
(column)¶ Checks whether a column exists in the genedata dataframe.
- Parameters
column (str) – The name of the column.
- Returns
True if column exists in the dataframe, False otherwise.
- Return type
bool
-
getCellData
(column)¶ Returns data stored in the celldata dataframe.
- Parameters
column (str) – The name of the data column.
- Returns
A n-dimensional array containing cell data by the column name.
- Return type
Numpy array
- Raises
ValueError – If column does not exist in the celldata dataframe.
-
getCounts
()¶ Returns a Numpy array of the counts/data in the data dataframe. This method is called from the class instance and requires no input arguments.
- Returns
A (d x n) array of gene expression counts/data.
- Return type
Numpy array
-
getDistinctCellTypes
(column)¶ Returns the unique cell type information stored in the celldata dataframe.
- Parameters
column (str) – This parameter is the column name of the cell labels in the celldata assay.
- Returns
Containing unique values in the celldata dataframe under the column passed into this function.
- Return type
Numpy array
-
getGeneData
(column)¶ Returns data stored in the genedata dataframe.
- Parameters
column (str) – The name of the data column.
- Returns
A d-dimensional array containing gene data by the column name.
- Return type
Numpy array
- Raises
ValueError – If column does not exist in the genedata dataframe.
-
getNumericCellLabels
(column)¶ Returns the numeric (int) cell labels from the celldata assay which contains the string or int cell labels. This method is useful when computing Rand Index or Adjusted Rand Index after clustering.
- Parameters
column (str) – This parameter is the column name of the string or int cell labels in the celldata assay.
- Returns
Array containing the integer representation of data in the celldata dataframe under the column passed into this function.
- Return type
Numpy array (int)
-
isSpike
(spike_type, gene_names_column)¶ Prints a message if spike-ins are detected in the dataset. Creates a filter to remove spike-ins from the dataset when counts/data is returned using getCounts() method.
- Parameters
spike_type (str) – A string representing the type of spike-in.
-
print
()¶ Prints a summary of the single-cell dataset.
-
removeCellData
(column)¶ Removes a column from the celldata dataframe. First checks whether the column exists in the celldata dataframe.
- Parameters
column (str) – The name of the data column.
-
removeGeneData
(column)¶ Removes a column from the genedata dataframe. First checks whether the column exists in the genedata dataframe.
- Parameters
column (str) – The name of the data column.
-
setCounts
(new_counts)¶ Sets the new counts values in the data dataframe.
- Parameters
new_counts (Numpy array) – A numpy array with the shape = dim, representing new count values. The data dataframe will be updated with the new count values.
-