Sleipnir::CPCL Class Reference

Encapsulates a PCL (or, in a pinch, CDT) formatted microarray data file. More...

#include <pcl.h>

Inheritance diagram for Sleipnir::CPCL:

Sleipnir::CPCLPair

Public Types

enum  ENormalize {
  ENormalizeNone, ENormalizeZScore, ENormalizeRow, ENormalizeMinMax,
  ENormalizeColumn, ENormalizeMean
}
 Ways in which a PCL's data entries can be normalized. More...

Public Member Functions

 CPCL (bool fHeader=true)
 Create a new PCL object with or without a header row.
void Open (const CPCL &PCL)
 Create a new PCL by copying the given one.
bool Open (std::istream &istm, size_t iSkip)
 Load a PCL from the given text stream.
void Open (const std::vector< size_t > &veciGenes, const std::vector< std::string > &vecstrGenes, const std::vector< std::string > &vecstrExperiments)
 Create a new PCL using the given genes, experiments, and gene order.
void Open (const std::vector< std::string > &vecstrGenes, const std::vector< std::string > &vecstrExperiments, const std::vector< std::string > &vecstrFeatures)
 Create a new PCL using the given genes, experiments, and features.
void OpenBinary (std::istream &istm)
 Load a PCL from the given binary stream.
void Save (std::ostream &ostm, const std::vector< size_t > *pveciGenes=NULL) const
 Save a PCL to the given text stream.
void SaveBinary (std::ostream &ostm) const
 Save a PCL to the given binary stream.
void SaveGene (std::ostream &ostm, size_t iGene, size_t iOriginal=-1) const
 Save a single gene row to the given text stream.
void SaveHeader (std::ostream &ostm, bool fCDT=false) const
 Save the PCL's header row to the given text stream.
bool SortGenes (const std::vector< size_t > &veciOrder)
 Reorder the PCL's genes based on the order of the given indices.
void RankTransform ()
 Rank transform each row of the PCL in increasing order.
bool AddGenes (const std::vector< std::string > &vecstrGenes)
 Appends new, empty gene rows to the end of the PCL using the given gene IDs.
void Normalize (ENormalize eNormalize=ENormalizeRow)
 Normalizes the PCL's values in the requested manner.
void Impute (size_t iNeighbors, float dMinimumPresent, const CDat &DatSimilarity)
 Impute missing values using the knnimpute algorithm; optionally mask genes with too many missing values.
void Impute (size_t iNeighbors, float dMinimumPresent, const IMeasure *pMeasure, bool fPrecompute=true)
 Impute missing values using the knnimpute algorithm; optionally mask genes with too many missing values.
void Save (const char *szFile=NULL) const
 Save a PCL to the given text file.
bool Open (const char *szFile, size_t iSkip)
 Load a PCL from the given text file.
bool Open (std::istream &istm)
 Load a PCL from the given text stream using the default number of skip columns.
void Reset ()
 Empties the PCL and deallocates all associated memory.
void Clear ()
 Sets all PCL entries to zero (without changing size or memory allocation).
size_t GetFeatures () const
 Return the number of features (skip columns) in the PCL.
const std::string & GetFeature (size_t iFeature) const
 Return the label (header) of the requested feature.
const std::string & GetFeature (size_t iGene, size_t iFeature) const
 Return the requested gene's value for the given feature.
void SetFeature (size_t iGene, size_t iFeature, const std::string &strValue)
 Set the requested gene's value for the given feature.
float & Get (size_t iGene, size_t iExperiment) const
 Returns the value at the requested PCL position.
float * Get (size_t iGene) const
 Return a single gene's row from the PCL.
void Set (size_t iGene, const float *adValues)
 Set a single gene's row in the PCL.
const CDataMatrixGet () const
 Return the PCL's underlying data matrix.
size_t GetGenes () const
 Return the number of gene rows in the PCL.
const std::vector< std::string > & GetGeneNames () const
 Returns the vector of gene names associated with this PCL.
size_t GetExperiments () const
 Return the number of experiment columns in the PCL.
const std::string & GetGene (size_t iGene) const
 Returns the gene name at the given PCL row.
void SetGene (size_t iGene, const std::string &strGene)
 Set the gene name at the given index.
const std::string & GetExperiment (size_t iExperiment) const
 Returns the experiment label at the given PCL column.
size_t GetExperiment (const std::string &strExperiment) const
 Returns the PCL column of the given experiment label.
void SetExperiment (size_t iExperiment, const std::string &strExperiment)
 Set the experiment label at the given PCL column.
void MaskGene (size_t iGene, bool fMask=true)
 Set the mask value for a given gene row.
bool IsMasked (size_t iGene) const
 Return true if the requested gene index is masked.
void Set (size_t iGene, size_t iExperiment, float dValue)
 Set the value at the requested PCL position.
size_t GetGene (const std::string &strGene) const
 Return the index of the given gene name, or -1 if it is not included in the PCL.
std::string GetFeature (size_t iGene, const char *szFeature) const
 Return the requested gene's value for the given feature.
void Randomize ()
 Randomizes the values in each row of the PCL.

Static Public Member Functions

static int Distance (const char *szFile, size_t iSkip, const char *szSimilarityMeasure, bool fNormalize, bool fZScore, bool fAutocorrelate, const char *szGeneFile, float dCutoff, size_t iLimit, CPCL &PCL, CDat &Dat, IMeasure::EMap eMap=IMeasure::EMapCenter)
 Kitchen sink method for completely loading a PCL and calculating its pairwise similarity scores into a CDat.
static size_t GetSkip ()
 Return the default number of skip columns between the gene IDs and experimental values.
static const char * GetExtension ()
 Returns the standard extension (including period) for PCL files.

Detailed Description

Encapsulates a PCL (or, in a pinch, CDT) formatted microarray data file.

A canonical PCL is a tab-delimited text format used to store microarray datasets. Each row represents a single gene, each column represents a single microarray, and each cell stores the gene's expression value for that condition. There are usually two header rows, one naming each column and one listing the microarray conditions' weights. There are usually three header columns: unique gene identifiers in the first column, human-readable gene name synonyms in the second column, and gene weights in the third. Thus, a simple PCL might be:

 GID    NAME    GWEIGHT Condition 1 Condition 2 Condition 3
 EWEIGHT            1   1   1
 YKL032C    IXR1    1   -0.02   0.03    -0.1
 YMR027W    HRT2    1   -0.018  0.013   -0.016
 YLR220W    CCC1    1   -0.0050 -0.01   0.015</pre>

In practice, the format of this file is very flexible: the first row's always a header, the first column's always unique gene IDs, and most other columns are data, but everything else changes. There can be an arbitrary number of feature columns between the gene IDs and the first experimental column (Sleipnir refers to these as "skip" columns). The EWEIGHT row may or may not be present. Missing values might be notated by empty cells or the string NA. CPCL will correctly parse all of these cases.

So in practice, a PCL is a full matrix of data in which each row represents a gene and each column an experiment. Associated with each gene are zero or more features, possibly including the name and weight; associated with each experiment is a name (drawn from the header row). The EWEIGHT row is discarded. More generally, a PCL can be used to hold any data in which vectors of values are associated with a set of genes.

See also:
CDat | CDataMatrix

Definition at line 69 of file pcl.h.


Member Enumeration Documentation

Ways in which a PCL's data entries can be normalized.

Enumerator:
ENormalizeNone  Perform no normalization and leave PCL values unchanged.
ENormalizeZScore  Subtract the global average from every value and divide by the global standard deviation.
ENormalizeRow  Subtract the row average from every value and divide by the row standard deviation.
ENormalizeMinMax  Subtract the global minimum from every value and divide by the global maximum (transforming all values to the range [0, 1]).
ENormalizeColumn  Subtract the column average from every value and divide by the column standard deviation.
ENormalizeMean  Subtract the global minimum from every value and divide by the global mean (transforming all values to the range [0, inf] with mean 1).

Definition at line 75 of file pcl.h.


Constructor & Destructor Documentation

Sleipnir::CPCL::CPCL ( bool  fHeader = true  )  [inline]

Create a new PCL object with or without a header row.

Parameters:
fHeader If true, associate a standard PCL header row with this object.
Remarks:
Header setting influences both Open and Save (by way of SaveHeader).

Definition at line 146 of file pcl.h.


Member Function Documentation

int Sleipnir::CPCL::Distance ( const char *  szFile,
size_t  iSkip,
const char *  szSimilarityMeasure,
bool  fNormalize,
bool  fZScore,
bool  fAutocorrelate,
const char *  szGeneFile,
float  dCutoff,
size_t  iLimit,
CPCL PCL,
CDat Dat,
IMeasure::EMap  eMap = IMeasure::EMapCenter 
) [static]

Kitchen sink method for completely loading a PCL and calculating its pairwise similarity scores into a CDat.

Parameters:
szFile If non-null, file from which PCL is to be loaded; if null, standard input is used.
iSkip Number of columns to skip in the PCL file between gene IDs and experimental data.
szSimilarityMeasure String identifier of similarity measure to use for CDat generation.
fNormalize If true, normalize the generated CDat to the range [0, 1].
fZScore If true, normalize the generated CDat to z-scores (subtract mean, divide by standard deviation).
fAutocorrelate If true, autocorrelate the requested similarity measure.
szGeneFile If non-null, only convert genes in the given file to pairwise scores in the CDat.
dCutoff If finite, remove all pairwise scores less than the given cutoff.
iLimit If not equal to -1 and the PCL contains more genes than this limit, do not precalculate pairwise scores; instead, configure the CDat to calculate scores on the fly as needed from the PCL.
PCL Output PCL with the loaded data.
Dat Output CDat with the calculated pairwise scores.
eMap Way in which returned value should be centered (implementation-specific).
Returns:
0 on successes, a nonzero value on failure.
The (many) steps performed by this method are as follows:
  1. A PCL is loaded from szFile into PCL using Open. If szFile is null, the PCL is loaded from standard input.
  2. An IMeasure object is constructed by iterating over the available implementations and finding one whose name corresponds with szSimilarityMeasure. Names which are distance measures (e.g. Euclidean) are automatically inverted. If requested, the measure is autocorrelated.
  3. If given, szGeneFile is loaded into a CGenes object.
  4. If iLimit is not -1 and the PCL contains more than the limiting number of genes, Dat is given a reference to PCL and configured to calculate pairwise scores as needed. Processing then stops.
  5. Otherwise, an empty CDat is initialized to contain either the genes in szGeneFile (if given) or all of the genes in the PCL.
  6. Gene pairs are assigned scores in the CDat using the requested similarity measure.
  7. If given, scores below dCutoff are replaced with missing values.
  8. If requested, the remaining scores are normalized either to the range [0, 1] or to z-scores.

Remarks:
This method is written to make it easy to construct tools that must load a PCL and immediately convert it to pairwise scores using some user-selected similarity measure.

Definition at line 153 of file pcl.cpp.

References Sleipnir::IMeasure::Clone(), Sleipnir::CDat::ENormalizeMinMax, Sleipnir::CDat::ENormalizeZScore, Sleipnir::CDat::Get(), Get(), GetExperiments(), Sleipnir::CGenes::GetGene(), GetGene(), Sleipnir::CGenes::GetGeneNames(), GetGeneNames(), Sleipnir::CDat::GetGenes(), GetGenes(), Sleipnir::CGenes::GetGenes(), Sleipnir::CGene::GetName(), Sleipnir::CMeta::GetNaN(), Sleipnir::CMeta::IsNaN(), Sleipnir::IMeasure::IsRank(), Sleipnir::IMeasure::Measure(), Sleipnir::CDat::Normalize(), Sleipnir::CDat::Open(), Sleipnir::CGenes::Open(), Open(), RankTransform(), and Sleipnir::CDat::Set().

static size_t Sleipnir::CPCL::GetSkip (  )  [inline, static]

Return the default number of skip columns between the gene IDs and experimental values.

Returns:
Default number of skip columns.

Definition at line 121 of file pcl.h.

Referenced by Open().

static const char* Sleipnir::CPCL::GetExtension (  )  [inline, static]

Returns the standard extension (including period) for PCL files.

Returns:
Standard extension (including period) for PCL files, ".pcl".

Definition at line 132 of file pcl.h.

Referenced by Sleipnir::CCoalesceCluster::Open(), and Sleipnir::CCoalesceCluster::Save().

void Sleipnir::CPCL::Open ( const CPCL PCL  ) 

Create a new PCL by copying the given one.

Parameters:
PCL PCL to be copied into the current one.
Remarks:
All values are copied into newly allocated memory within the current PCL, so it's safe to destroy the input PCL after the new one is opened.

Definition at line 285 of file pcl.cpp.

References Get(), Reset(), and Set().

Referenced by Sleipnir::CCoalesce::Cluster(), Distance(), Sleipnir::CCoalesceCluster::Initialize(), Open(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CPCLPair::Open(), Sleipnir::CCoalesceCluster::Open(), and Sleipnir::CCoalesceCluster::Save().

bool Sleipnir::CPCL::Open ( std::istream &  istm,
size_t  iSkip 
)

Load a PCL from the given text stream.

Parameters:
istm Stream from which PCL file is loaded.
iSkip Number of feature columns to skip between the gene IDs and first experimental column.
Returns:
True if the PCL was opened successfully.
See also:
Save

Definition at line 332 of file pcl.cpp.

References GetExperiments(), GetGene(), and GetGenes().

void Sleipnir::CPCL::Open ( const std::vector< size_t > &  veciGenes,
const std::vector< std::string > &  vecstrGenes,
const std::vector< std::string > &  vecstrExperiments 
)

Create a new PCL using the given genes, experiments, and gene order.

Parameters:
veciGenes Order in which genes will be placed in the new PCL.
vecstrGenes Gene IDs to be used in the new PCL.
vecstrExperiments Experiment labels to be used in the new PCL.
This creates an empty PCL (all entries missing) with the requested number of gene rows and experiment columns; the given gene IDs and experiment labels are inserted into the appropriate header rows/columns. However, the gene order will be the order of indices given in veciGenes. For example, if veciGenes is [1, 0, 2] and vecstrGenes contains [A, B, C], the order of genes within the new PCL will be [B, A, C]. Experiment order is unaffected and will be as given in vecstrExperiments.

Remarks:
veciGenes and vecstrGenes must be of the same length.
See also:
SortGenes

Definition at line 665 of file pcl.cpp.

References GetExperiments(), GetGenes(), Sleipnir::CMeta::GetNaN(), and Reset().

void Sleipnir::CPCL::Open ( const std::vector< std::string > &  vecstrGenes,
const std::vector< std::string > &  vecstrExperiments,
const std::vector< std::string > &  vecstrFeatures 
)

Create a new PCL using the given genes, experiments, and features.

Parameters:
vecstrGenes Gene IDs to be used in the new PCL.
vecstrExperiments Experiment labels to be used in the new PCL.
vecstrFeatures Feature labels to be used in the new PCL (possibly empty).
This creates an empty PCL (all entries missing) with the requested number of gene rows and experiment columns; the given gene IDs and experiment labels are inserted into the appropriate header rows/columns. The PCL will contain the given number of feature columns between the gene IDs and experimental values, the values for which will be initialized to empty strings.

Definition at line 614 of file pcl.cpp.

References GetExperiments(), GetGenes(), Sleipnir::CMeta::GetNaN(), Reset(), SetExperiment(), and SetGene().

void Sleipnir::CPCL::OpenBinary ( std::istream &  istm  ) 

Load a PCL from the given binary stream.

Parameters:
istm Stream from which PCL file is loaded.
See also:
SaveBinary

Definition at line 573 of file pcl.cpp.

References GetExperiments(), GetGenes(), and Reset().

void Sleipnir::CPCL::Save ( std::ostream &  ostm,
const std::vector< size_t > *  pveciGenes = NULL 
) const

Save a PCL to the given text stream.

Parameters:
ostm Stream into which PCL file is saved.
pveciGenes If non-null, generate an initial CDT GENE column using the given original gene indices.
If called without a vector of gene indices, saves a standard PCL to the given text stream using the CPCL's current gene ID list, features, experiments, and data. If a vector of gene indices is provided, the output will instead be in CDT format, with an extra initial column of IDs of the form "GENE###", where the number indicates the gene's original index in the pre-clustered file.

Remarks:
If pveciGenes is non-null, it must be of the same length as the PCL's gene list.
See also:
Open | CClustHierarchical

Definition at line 518 of file pcl.cpp.

References GetGenes(), SaveGene(), and SaveHeader().

Referenced by Save(), and Sleipnir::CCoalesceCluster::Save().

void Sleipnir::CPCL::SaveBinary ( std::ostream &  ostm  )  const

Save a PCL to the given binary stream.

Parameters:
ostm Stream into which PCL file is saved.
Remarks:
Most PCLs are saved as text files; binary files are more compact but very simple, encoding the size and gene/feature/experiment values as simple character strings and saving data using the underlying CFullMatrix::SaveBinary method.
See also:
OpenBinary

Definition at line 543 of file pcl.cpp.

References Get(), GetExperiment(), GetExperiments(), GetFeature(), GetFeatures(), GetGene(), and GetGenes().

void Sleipnir::CPCL::SaveGene ( std::ostream &  ostm,
size_t  iGene,
size_t  iOriginal = -1 
) const

Save a single gene row to the given text stream.

Parameters:
ostm Stream into which gene row is saved.
iGene Gene index to be saved.
iOriginal If not equal to -1, generate an initial CDT GENE ID using the given original gene index.
If called with iGene set to -1, saves a single row of a standard PCL to the given text stream using the CPCL's current gene ID, features, and values for the row. If a gene index is provided, the output will instead be in CDT format, with an extra initial column of IDs of the form "GENE###", where the number indicates the gene's original index in the pre-clustered file.

See also:
Save

Definition at line 482 of file pcl.cpp.

References Get(), GetGene(), and Sleipnir::CMeta::IsNaN().

Referenced by Save().

void Sleipnir::CPCL::SaveHeader ( std::ostream &  ostm,
bool  fCDT = false 
) const

Save the PCL's header row to the given text stream.

Parameters:
ostm Stream into which PCL header is saved.
fCDT If true, generate an initial CDT GENE column header.
If called with fCDT false, saves standard PCL header and EWEIGHT rows to the given text stream using the CPCL's current feature and experimental headers. If fCDT is true, the output will instead be in CDT format, with an extra initial column header for gene index identifiers.

See also:
Save

Definition at line 439 of file pcl.cpp.

Referenced by Save().

bool Sleipnir::CPCL::SortGenes ( const std::vector< size_t > &  veciOrder  ) 

Reorder the PCL's genes based on the order of the given indices.

Parameters:
veciOrder Index order in which genes should be placed.
Returns:
True if genes were reordered successfully.
Reorders the PCL's gene rows based on the given indices. For example, if the current row order is [A, B, C] and the given indices are [1, 0, 2], the new row order will be [B, A, C].

Definition at line 712 of file pcl.cpp.

References GetGene(), GetGenes(), and Sleipnir::CMeta::Permute().

void Sleipnir::CPCL::RankTransform (  ) 

Rank transform each row of the PCL in increasing order.

Replaces all values in the PCL with their increasing ranks by row. For example, a row containing [-0.1, 0.1, 0.3] would be replaced with [0, 1, 2]; a row containing [0.5, 0.4, 0.3] would be replaced with [2, 1, 0].

See also:
IMeasure::IsRank

Definition at line 738 of file pcl.cpp.

Referenced by Distance(), and Sleipnir::CDatasetCompact::Open().

bool Sleipnir::CPCL::AddGenes ( const std::vector< std::string > &  vecstrGenes  ) 

Appends new, empty gene rows to the end of the PCL using the given gene IDs.

Parameters:
vecstrGenes Gene names to be appended to the PCL.
Returns:
True if the gene rows were appended successfully.
Remarks:
New rows will initially have empty strings for all features and missing values for all data.

Definition at line 766 of file pcl.cpp.

References Sleipnir::CMeta::GetNaN(), and SetGene().

void Sleipnir::CPCL::Normalize ( ENormalize  eNormalize = ENormalizeRow  ) 

Normalizes the PCL's values in the requested manner.

Parameters:
eNormalize Algorithm by which the PCL's values should be normalized.
See also:
ENormalize

Definition at line 800 of file pcl.cpp.

References Sleipnir::CStatistics::Average(), ENormalizeColumn, ENormalizeMean, ENormalizeMinMax, ENormalizeRow, ENormalizeZScore, Get(), GetExperiments(), GetGenes(), Sleipnir::CMeta::IsNaN(), Set(), and Sleipnir::CStatistics::Variance().

Referenced by Sleipnir::CCoalesceCluster::Initialize().

void Sleipnir::CPCL::Impute ( size_t  iNeighbors,
float  dMinimumPresent,
const CDat DatSimilarity 
)

Impute missing values using the knnimpute algorithm; optionally mask genes with too many missing values.

Parameters:
iNeighbors Number of nearest neighbors to use for imputation.
dMinimumPresent Fraction of conditions that must be present; genes with fewer than this many values present will be masked instead of imputed.
DatSimilarity CDat similarity matrix from which nearest neighbors are determined.
Imputes missing values in the PCL using the knnimpute algorithm of Troyanskaya et al, Bioinformatics 2001. Briefly, for each gene with missing values, the k nearest neighbors (most similar genes) are found, and the missing value is replaced with a weighted average of the values in the neighbors. Optionally, Impute can also completely mask genes that are missing too many values; genes with less than the requested fraction of data present will be completely masked rather than imputed.

Remarks:
iNeighbors must be greater than zero, dMinimumPresent must be between zero and one inclusive, and DatSimilarity must be of exactly the same size as the current PCL.

Definition at line 909 of file pcl.cpp.

References Sleipnir::CDat::Get(), Get(), GetExperiments(), GetGenes(), Sleipnir::CMeta::GetNaN(), Sleipnir::CMeta::IsNaN(), MaskGene(), and Set().

Referenced by Impute().

void Sleipnir::CPCL::Impute ( size_t  iNeighbors,
float  dMinimumPresent,
const IMeasure pMeasure,
bool  fPrecompute = true 
)

Impute missing values using the knnimpute algorithm; optionally mask genes with too many missing values.

Parameters:
iNeighbors Number of nearest neighbors to use for imputation.
dMinimumPresent Fraction of conditions that must be present; genes with fewer than this many values present will be masked instead of imputed.
pMeasure Similarity measure with which nearest neighbors are determined.
fPrecompute If true, precompute and cache in memory all pairwise similarities; otherwise, calculate these on the fly from values in the PCL.
Imputes missing values in the PCL using the knnimpute algorithm of Troyanskaya et al, Bioinformatics 2001. Briefly, for each gene with missing values, the k nearest neighbors (most similar genes) are found, and the missing value is replaced with a weighted average of the values in the neighbors. Optionally, Impute can also completely mask genes that are missing too many values; genes with less than the requested fraction of data present will be completely masked rather than imputed.

Remarks:
iNeighbors must be greater than zero, dMinimumPresent must be between zero and one inclusive. If precomputation is requested, imputation will be faster, but memory proportional to the number of genes squared will be required. If it is not, imputation will be slower, but no additional memory is required. As a rule of thumb, precomputation is desirable for anything less than ~25,000 genes (which will consume roughly 1GB of memory; if you have RAM to spare, feel free to precompute at will).

Definition at line 997 of file pcl.cpp.

References Get(), GetExperiments(), GetGeneNames(), Sleipnir::CDat::GetGenes(), Impute(), Sleipnir::IMeasure::Measure(), Sleipnir::CDat::Open(), and Sleipnir::CDat::Set().

void Sleipnir::CPCL::Save ( const char *  szFile = NULL  )  const [inline]

Save a PCL to the given text file.

Parameters:
szFile File into which PCL file is saved.
Remarks:
If null, output defaults to stdout.
See also:
Save

Definition at line 179 of file pcl.h.

References Save().

bool Sleipnir::CPCL::Open ( const char *  szFile,
size_t  iSkip 
) [inline]

Load a PCL from the given text file.

Parameters:
szFile File from which PCL file is loaded.
iSkip Number of feature columns to skip between the gene IDs and first experimental column.
Returns:
True if the PCL was opened successfully.
See also:
Save

Reimplemented in Sleipnir::CPCLPair.

Definition at line 202 of file pcl.h.

References Open().

bool Sleipnir::CPCL::Open ( std::istream &  istm  )  [inline]

Load a PCL from the given text stream using the default number of skip columns.

Parameters:
istm Stream from which PCL file is loaded.
Returns:
True if the PCL was opened successfully.
See also:
Save

Definition at line 222 of file pcl.h.

References GetSkip(), and Open().

size_t Sleipnir::CPCL::GetFeatures (  )  const [inline]

Return the number of features (skip columns) in the PCL.

Returns:
Number of feature columns in the PCL.

Definition at line 249 of file pcl.h.

Referenced by SaveBinary().

const std::string& Sleipnir::CPCL::GetFeature ( size_t  iFeature  )  const [inline]

Return the label (header) of the requested feature.

Parameters:
iFeature Index of the feature label to retrieve.
Returns:
Feature header at the requested index.
Remarks:
For efficiency, no bounds checking is performed. The given value must be smaller than GetFeatures.

Definition at line 266 of file pcl.h.

Referenced by GetFeature(), and SaveBinary().

const std::string& Sleipnir::CPCL::GetFeature ( size_t  iGene,
size_t  iFeature 
) const [inline]

Return the requested gene's value for the given feature.

Parameters:
iGene Gene for which feature value should be returned.
iFeature Index of feature value to be retrieved.
Returns:
Feature value for the given gene and feature indices.
Remarks:
For efficiency, no bounds checking is performed. The given values must be smaller than GetGenes and GetFeatures.

Definition at line 287 of file pcl.h.

void Sleipnir::CPCL::SetFeature ( size_t  iGene,
size_t  iFeature,
const std::string &  strValue 
) [inline]

Set the requested gene's value for the given feature.

Parameters:
iGene Gene for which feature value should be set.
iFeature Index of feature value to be set.
strValue Feature value to set for the given gene and feature indices.
Remarks:
For efficiency, no bounds checking is performed. The given values must be smaller than GetGenes and GetFeatures.

Definition at line 308 of file pcl.h.

float& Sleipnir::CPCL::Get ( size_t  iGene,
size_t  iExperiment 
) const [inline]

Returns the value at the requested PCL position.

Parameters:
iGene Gene row.
iExperiment Experiment column.
Returns:
Value at the requested PCL position.
Remarks:
For efficiency, no bounds checking is performed. The given gene and experiment must be smaller than GetGenes and GetExperiments.
See also:
Set

Definition at line 332 of file pcl.h.

References Sleipnir::CFullMatrix< tType >::Get().

Referenced by Distance(), Sleipnir::CBayesNetSmile::Evaluate(), Open(), Sleipnir::CDatasetCompact::Open(), and Sleipnir::CCoalesceCluster::Subtract().

float* Sleipnir::CPCL::Get ( size_t  iGene  )  const [inline]

Return a single gene's row from the PCL.

Parameters:
iGene PCL row.
Returns:
Requested PCL row.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetGenes. The returned array will contain a number of elements equal to GetExperiments.
See also:
Set

Definition at line 353 of file pcl.h.

References Sleipnir::CFullMatrix< tType >::Get().

void Sleipnir::CPCL::Set ( size_t  iGene,
const float *  adValues 
) [inline]

Set a single gene's row in the PCL.

Parameters:
iGene PCL row.
adValues Values to which row is set.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetGenes. The given array must contain a number of elements equal to GetExperiments.
See also:
Get

Definition at line 374 of file pcl.h.

Referenced by Sleipnir::CBayesNetSmile::Evaluate(), Impute(), Normalize(), and Open().

const CDataMatrix& Sleipnir::CPCL::Get (  )  const [inline]

Return the PCL's underlying data matrix.

Returns:
Data matrix by which the PCL is backed.

Definition at line 385 of file pcl.h.

Referenced by Impute(), Normalize(), Randomize(), SaveBinary(), and SaveGene().

size_t Sleipnir::CPCL::GetGenes (  )  const [inline]

const std::vector<std::string>& Sleipnir::CPCL::GetGeneNames (  )  const [inline]

Returns the vector of gene names associated with this PCL.

Returns:
Vector of this PCL's gene names.
Remarks:
Returned vector size will be identical to GetGenes.

Definition at line 413 of file pcl.h.

Referenced by Distance(), Impute(), and Sleipnir::CDatasetCompact::Open().

size_t Sleipnir::CPCL::GetExperiments (  )  const [inline]

Return the number of experiment columns in the PCL.

Returns:
Number of experiment columns in the PCL.
See also:
GetGenes

Definition at line 427 of file pcl.h.

Referenced by Sleipnir::CCoalesce::Cluster(), Distance(), Sleipnir::CBayesNetSmile::Evaluate(), Impute(), Normalize(), Open(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CPCLPair::Open(), Sleipnir::CCoalesceCluster::Open(), OpenBinary(), Randomize(), Sleipnir::CCoalesceCluster::Save(), and SaveBinary().

const std::string& Sleipnir::CPCL::GetGene ( size_t  iGene  )  const [inline]

Returns the gene name at the given PCL row.

Parameters:
iGene Index of gene name to return.
Returns:
Gene name at the requested row.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetGenes.

Definition at line 444 of file pcl.h.

Referenced by Sleipnir::CCoalesce::Cluster(), Distance(), Sleipnir::CBayesNetSmile::Evaluate(), Open(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CCoalesceCluster::Open(), Sleipnir::CCoalesceCluster::Save(), SaveBinary(), SaveGene(), and SortGenes().

void Sleipnir::CPCL::SetGene ( size_t  iGene,
const std::string &  strGene 
) [inline]

Set the gene name at the given index.

Parameters:
iGene Row of gene name to modify.
strGene Gene name to store at the requested index.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetGenes.
See also:
GetGene

Definition at line 464 of file pcl.h.

Referenced by AddGenes(), and Open().

const std::string& Sleipnir::CPCL::GetExperiment ( size_t  iExperiment  )  const [inline]

Returns the experiment label at the given PCL column.

Parameters:
iExperiment Index of experiment label to return.
Returns:
Experiment label at the requested column.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetExperiments.

Definition at line 481 of file pcl.h.

Referenced by Sleipnir::CCoalesce::Cluster(), Sleipnir::CBayesNetSmile::Evaluate(), Sleipnir::CCoalesceCluster::Open(), Sleipnir::CCoalesceCluster::Save(), and SaveBinary().

size_t Sleipnir::CPCL::GetExperiment ( const std::string &  strExperiment  )  const [inline]

Returns the PCL column of the given experiment label.

Parameters:
strExperiment Experimental label of index to return.
Returns:
Column of the requested experiment label, or -1 if it is not found.
Remarks:
Unlike gene IDs, experiment labels are not hashed, so this can be slow.

Definition at line 498 of file pcl.h.

void Sleipnir::CPCL::SetExperiment ( size_t  iExperiment,
const std::string &  strExperiment 
) [inline]

Set the experiment label at the given PCL column.

Parameters:
iExperiment Index of experiment label to set.
strExperiment Experiment label to which the requested column is set.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetExperiments.

Definition at line 520 of file pcl.h.

Referenced by Open(), and Sleipnir::CCoalesceCluster::Save().

void Sleipnir::CPCL::MaskGene ( size_t  iGene,
bool  fMask = true 
) [inline]

Set the mask value for a given gene row.

Parameters:
iGene Index of gene row to be masked or unmasked.
fMask If true, mask given gene; if false, unmask it.
PCLs allow individual gene rows to be masked. This generally excludes those rows from processing, particularly from any saved output or from use in any machine learning algorithm consuming a PCL.

Remarks:
For efficiency, no bounds checking is performed. The given value must be smaller than GetGenes.
See also:
IsMasked

Definition at line 543 of file pcl.h.

Referenced by Impute(), and Sleipnir::CCoalesceCluster::Save().

bool Sleipnir::CPCL::IsMasked ( size_t  iGene  )  const [inline]

Return true if the requested gene index is masked.

Parameters:
iGene Gene row to test for masking.
Returns:
True if the requested gene row is masked.
Remarks:
For efficiency, no bounds checking is performed. The given value must be smaller than GetGenes.
See also:
MaskGene

Definition at line 566 of file pcl.h.

Referenced by Sleipnir::CSVM::Evaluate().

void Sleipnir::CPCL::Set ( size_t  iGene,
size_t  iExperiment,
float  dValue 
) [inline]

Set the value at the requested PCL position.

Parameters:
iGene PCL row.
iExperiment PCL column.
dValue Value to store.
Remarks:
For efficiency, no bounds checking is performed. The given row and column must be smaller than GetGenes and GetExperiments.
See also:
Get

Definition at line 590 of file pcl.h.

size_t Sleipnir::CPCL::GetGene ( const std::string &  strGene  )  const [inline]

Return the index of the given gene name, or -1 if it is not included in the PCL.

Parameters:
strGene Gene name to retrieve.
Returns:
Index of the requested gene name, or -1 if it is not in the PCL.
See also:
GetGeneNames

Definition at line 607 of file pcl.h.

std::string Sleipnir::CPCL::GetFeature ( size_t  iGene,
const char *  szFeature 
) const [inline]

Return the requested gene's value for the given feature.

Parameters:
iGene Gene for which feature value should be returned.
szFeature Label of feature for which value should be retrieved.
Returns:
Feature value for the given gene and feature, or an empty string if the feature label is unrecognized.
Remarks:
For efficiency, no bounds checking is performed. The given value must be smaller than GetGenes.

Definition at line 630 of file pcl.h.

References GetFeature().

void Sleipnir::CPCL::Randomize (  )  [inline]

Randomizes the values in each row of the PCL.

Randomly shuffles each gene's vector of values.

Definition at line 645 of file pcl.h.

References Get(), GetExperiments(), and GetGenes().


The documentation for this class was generated from the following files:

Generated on Fri Jun 19 12:48:38 2009 for Sleipnir by doxygen 1.5.5