Sleipnir::CDat Class Reference

Stores a continuously valued half matrix paired with a list of names for matrix elements. More...

#include <dat.h>

Inheritance diagram for Sleipnir::CDat:

Sleipnir::CDataPair

Public Types

enum  EFilter {
  EFilterInclude = 0, EFilterTerm = EFilterInclude + 1, EFilterExclude = EFilterTerm + 1, EFilterPixie = EFilterExclude + 1,
  EFilterEdge = EFilterPixie + 1, EFilterHefalmp = EFilterEdge + 1
}
 Ways in which nodes/edges can be removed to filter a CDat. More...
enum  EFormat { EFormatBinary = 0, EFormatText = EFormatBinary + 1, EFormatPCL = EFormatText + 1, EFormatSparse = EFormatPCL + 1 }
 Ways in which a CDat can be persisted to/from disk. More...
enum  ENormalize { ENormalizeMinMax = 0, ENormalizeZScore = ENormalizeMinMax + 1, ENormalizeSigmoid = ENormalizeZScore + 1 }
 Ways in which a CDat can have its edge values normalized. More...

Public Member Functions

bool Open (const char *szFile, bool fMemmap=false, size_t iSkip=2, bool fZScore=false, bool fDuplicates=false)
 Open a CDat stored in the given file, guessing the format from the file's extension.
bool Open (std::istream &istm, EFormat eFormat=EFormatBinary, float dDefault=HUGE_VAL, bool fDuplicates=false, size_t iSkip=2, bool fZScore=false)
 Open a CDat stored in the given stream with the given format, processing missing values and duplicates as specified.
bool Open (const CSlim &Slim)
 Construct a CDat from the given ontology slim.
bool Open (const CSlim &SlimPositives, const CSlim &SlimNonnegatives)
 Construct a CDat from the given ontology slims.
bool Open (const std::vector< std::string > &vecstrGenes, bool fClear=true, const char *szFile=NULL)
 Construct a new CDat with the given gene names.
bool Open (const std::vector< std::string > &vecstrGenes, const CDistanceMatrix &MatValues)
 Construct a new CDat with the given gene names and values.
bool Open (const std::vector< CGenes * > &vecpPositives, const std::vector< CGenes * > &vecpNonnegatives, float dPValue, const CGenome &Genome)
 Construct a CDat from the given gene sets.
bool Open (const CDat &DatKnown, const std::vector< CGenes * > &vecpOther, const CGenome &Genome, bool fKnownNegatives)
 Construct a CDat from the given known gene relationships and gene sets.
bool Open (const CPCL &PCL, const IMeasure *pMeasure, bool fMeasureMemory)
 Construct a new CDat backed by the given PCL and similarity measure.
bool Open (const CDat &Dat)
 Construct a copy of the given CDat.
bool OpenGenes (std::istream &istm, bool fBinary, bool fPCL=false)
 Open the genes from a CDat stored in the stream with the given format.
bool OpenGenes (const char *szFile, size_t iSkip=2)
 Open the genes from a CDat stored in the given file, guessing the format from the file's extension.
void Save (std::ostream &ostm, EFormat eFormat=EFormatBinary) const
 Save a CDat to the given stream in the requested format.
void Save (const char *szFile) const
 Save a CDat to the given file, guessing the format from the file's extension.
void SaveDOT (std::ostream &ostm, float dCutoff=HUGE_VAL, const CGenome *pGenome=NULL, bool fUnlabeled=false, bool fHashes=true, const std::vector< float > *pvecdColors=NULL, const std::vector< float > *pvecdBorders=NULL) const
 Generate a DOT-formatted graph from the CDat, suitable for processing with AT&T's Graphviz software.
void SaveGDF (std::ostream &ostm, float dCutoff=HUGE_VAL) const
 Generate a GDF-formatted graph from the CDat, suitable for processing with the GUESS graph exploration system.
void SaveNET (std::ostream &ostm, float dCutoff=HUGE_VAL) const
 Generate a NET-formatted graph from the CDat.
void SaveMATISSE (std::ostream &ostm, float dCutoff=HUGE_VAL, const CGenome *pGenome=NULL) const
 Generate a MATISSE-formatted graph from the CDat, suitable for processing with the MATISSE software.
void Invert ()
 Replace each finite value in the CDat with one minus that value.
void Rank ()
 Replace each finite value in the CDat with its integer rank.
bool FilterGenes (const char *szGenes, EFilter eFilter, size_t iLimit=-1)
 Remove edges from the CDat based on the given gene file and filter type.
void FilterGenes (const CGenes &Genes, EFilter eFilter, size_t iLimit=-1, float dEdgeAggressiveness=0.5)
 Remove edges from the CDat based on the given gene set and filter type.
void Normalize (ENormalize eNormalize)
 Normalize each finite value in the CDat by a specific function.
size_t GetGene (const std::string &strGene) const
 Return the index of the given gene name, or -1 if it is not included in the CDat.
float & Get (size_t iY, size_t iX) const
 Return the value at the requested CDat position.
size_t GetGenes () const
 Returns the number of elements (genes) in the CDat.
const CDistanceMatrixGet () const
 Returns the symmetric matrix containing the CDat's values.
CDistanceMatrixGet ()
 Returns the symmetric matrix containing the CDat's values.
bool Set (size_t iY, size_t iX, float dValue)
 Set the value at the requested CDat position.
std::string GetGene (size_t iGene) const
 Returns the gene name at the given CDat position.
const std::vector< std::string > & GetGeneNames () const
 Returns the vector of gene names associated with this CDat.
void Set (size_t iY, const float *adValues)
 Set an entire row of CDat values efficiently.
const float * Get (size_t iY) const
 Get an entire row of CDat values efficiently.
float * Get (size_t iY)
 Get an entire row of CDat values efficiently.
void SetGene (size_t iGene, const std::string &strGene)
 Set the gene name at the given index.

Detailed Description

Stores a continuously valued half matrix paired with a list of names for matrix elements.

Conceptually, a CDat stores a list of weighted pairs; this is equivalent to a weighted undirected graph with labeled nodes, or a symmetric matrix with labels for each matrix element. CDat entries are stored as continuous values, although they can be discretized in various ways. CDats can be constructed in several ways, read from disk, persisted to disk in multiple file formats, or calculated from existing gene sets, microarray data, or gold standards. In practice, a CDat is simply a continuously valued symmetric matrix (in which zero or more values may be missing) paired with a list of element names (assumed to be genes), but this data structure is sufficiently flexible to represent nearly any biological dataset.

CDats can be loaded (by Open) and/or stored (by Save) from/to disk in the following formats:

See also:
CDataPair | CHalfMatrix

Definition at line 75 of file dat.h.


Member Enumeration Documentation

Ways in which nodes/edges can be removed to filter a CDat.

See also:
FilterGenes
Enumerator:
EFilterInclude  Remove any edge including a node outside the given set.
EFilterTerm  Remove any positive edge including a node outside the given set.
EFilterExclude  Remove any edge including a node in the given set.
EFilterPixie  Perform a bioPIXIE query using the given set and remove any edge not in the resulting subgraph.
EFilterEdge  Remove any edge not including a node in the given set.
EFilterHefalmp  Perform a HEFalMp query using the given set and remove any edge not in the resulting subgraph.

Definition at line 84 of file dat.h.

Ways in which a CDat can be persisted to/from disk.

See also:
Open | Save
Enumerator:
EFormatBinary  Binary format listing null-terminated element name strings followed by floating point values.
EFormatText  Text format listing element name pairs followed by numerical value strings.
EFormatPCL  PCL file from which pairwise scores are calculated using some similarity measure.
EFormatSparse  Binary format listing null-terminated element name strings followed by index/value pairs.

Definition at line 124 of file dat.h.

Ways in which a CDat can have its edge values normalized.

See also:
Normalize
Enumerator:
ENormalizeMinMax  Linearly transform the minimum score to 0 and the maximum to 1.
ENormalizeZScore  Z-score all edges (subtract mean, divide by standard deviation).
ENormalizeSigmoid  Sigmoid transform scores to the range [0, 1].

Definition at line 154 of file dat.h.


Member Function Documentation

bool Sleipnir::CDat::Open ( const char *  szFile,
bool  fMemmap = false,
size_t  iSkip = 2,
bool  fZScore = false,
bool  fDuplicates = false 
)

Open a CDat stored in the given file, guessing the format from the file's extension.

Parameters:
szFile Filename from which CDat is loaded.
fMemmap If true, memory map file rather than allocating memory and copying its contents.
iSkip If the given file is a PCL, the number of columns to skip between the ID and experiments.
fZScore If true and the given file is a PCL, z-score similarity measures after pairwise calculation.
fDuplicates If true, allow duplicates (DAT format only), ignoring all but the last value for each gene pair.
Returns:
True if CDat was successfully opened.
Opens a CDat from the given file, guessing the file type from its extension: DAT, DAB, DAS, or PCL.

Remarks:
A memory mapped CDat cannot be modified; doing so will generally cause a crash. fMemmap is ignored if the given file is not a DAB, and iSkip and fZScore are ignored if the given file is not a PCL. PCLs are always converted to pairwise scores using CMeasurePearNorm. If the extension is not recognized, DAB format is assumed.
See also:
Save | CPCL

Definition at line 980 of file dat.cpp.

References EFormatBinary, EFormatPCL, EFormatText, and Sleipnir::CMeta::MapRead().

Referenced by Sleipnir::CPCL::Distance(), Sleipnir::CPCL::Impute(), Sleipnir::CDataPair::Open(), and Open().

bool Sleipnir::CDat::Open ( std::istream &  istm,
EFormat  eFormat = EFormatBinary,
float  dDefault = HUGE_VAL,
bool  fDuplicates = false,
size_t  iSkip = 2,
bool  fZScore = false 
)

Open a CDat stored in the given stream with the given format, processing missing values and duplicates as specified.

Parameters:
istm Stream from which CDat is loaded.
eFormat Format in which the stream should be parsed.
dDefault Default value inserted for missing pairs (DAT format only).
fDuplicates If true, allow duplicates (DAT format only), ignoring all but the last value for each gene pair.
iSkip If the given stream contains a PCL, the number of columns to skip between the ID and experiments.
fZScore If true and the given stream contains a PCL, z-score similarity measures after pairwise calculation.
Returns:
True if CDat was successfully opened.
Opens a CDat from the given stream with the given format.

Remarks:
dDefault and fDuplicates are ignored for non-DAT formats; iSkip and fZScore are ignored for non-PCL formats. Specifying the format incorrectly will generally cause Bad Things.
See also:
Save | CPCL

Definition at line 433 of file dat.cpp.

References EFormatPCL, EFormatSparse, and EFormatText.

bool Sleipnir::CDat::Open ( const CSlim Slim  ) 

Construct a CDat from the given ontology slim.

Parameters:
Slim Set of ontology terms from which to generate a CDat.
Returns:
True if CDat was generated successfully.
Generates a CDat over all genes within the given slim. Gene pairs coannotated to at least one slim term are given a value of 1; all other gene pairs are given a value of 0. This is useful for rapidly generating gold standards from functional catalogs.

See also:
IOntology

Reimplemented in Sleipnir::CDataPair.

Definition at line 127 of file dat.cpp.

References Get(), Sleipnir::CSlim::GetGeneNames(), Sleipnir::CSlim::GetGenes(), Sleipnir::CSlim::GetSlim(), Sleipnir::CSlim::GetSlims(), Sleipnir::CMeta::IsNaN(), Open(), and Set().

bool Sleipnir::CDat::Open ( const CSlim SlimPositives,
const CSlim SlimNonnegatives 
)

Construct a CDat from the given ontology slims.

Parameters:
SlimPositives Set of ontology terms from which to generate related gene pairs.
SlimNonnegatives Set of ontology terms from which to generate agnostic gene pairs (neither related nor unrelated).
Returns:
True if CDat was generated successfully.
Generates a CDat over all genes within the given slims. Gene pairs coannotated to at least one positive slim term are given a value of 1, other gene pairs coannotated to at least one nonnegative slim term are given no value (NaN), and all other gene pairs are given a value of 0. This is useful for rapidly generating gold standards from functional catalogs.

See also:
IOntology

Definition at line 174 of file dat.cpp.

References Get(), Sleipnir::CSlim::GetGeneNames(), Sleipnir::CSlim::GetGenes(), Sleipnir::CSlim::GetSlim(), Sleipnir::CSlim::GetSlims(), Sleipnir::CMeta::IsNaN(), Open(), and Set().

bool Sleipnir::CDat::Open ( const std::vector< std::string > &  vecstrGenes,
bool  fClear = true,
const char *  szFile = NULL 
)

Construct a new CDat with the given gene names.

Parameters:
vecstrGenes Gene names and size to associate with the CDat.
fClear If true, set each value of the new CDat to missing (NaN).
szFile If non-null, use the given file as memory-mapped backing for the new CDat.
Returns:
True if CDat was generated successfully.
Generates a CDat over the given genes (and of the same size), optionally initializing it to contain no values or backing it with a memory-mapped file rather than physical memory.

Definition at line 862 of file dat.cpp.

References GetGene(), GetGenes(), Sleipnir::CMeta::GetNaN(), Sleipnir::CHalfMatrix< float >::GetSpace(), Sleipnir::CMeta::MapWrite(), and Set().

bool Sleipnir::CDat::Open ( const std::vector< std::string > &  vecstrGenes,
const CDistanceMatrix MatScores 
)

Construct a new CDat with the given gene names and values.

Parameters:
vecstrGenes Gene names to associate with the CDat.
MatScores Values to associate with the CDat.
Returns:
True if CDat was generated successfully.
Generates a CDat over the given genes (and of the same size) and associate it with the given existing pairwise values.

Remarks:
vecstrGenes must be the same size as MatScores. MatScores will not be copied; the new CDat will thus not consume a substantial amount of memory, but MatScores must not be disposed of before the current CDat.

Reimplemented in Sleipnir::CDataPair.

Definition at line 920 of file dat.cpp.

References Sleipnir::CHalfMatrix< tType >::GetSize().

bool Sleipnir::CDat::Open ( const std::vector< CGenes * > &  vecpPositives,
const std::vector< CGenes * > &  vecpNonnegatives,
float  dPValue,
const CGenome Genome 
)

Construct a CDat from the given gene sets.

Parameters:
vecpPositives Set of gene sets from which to generate related gene pairs.
vecpNonnegatives Set of ontology terms from which to generate agnostic gene pairs (neither related nor unrelated), possibly empty.
dPValue Hypergeometric p-value of overlap below which nonnegative gene set pairs are considered agnostic rather than negative.
Genome Genome containing all genes of interest.
Returns:
True if CDat was generated successfully.
Generates a CDat over all genes in the given genome in which:
  • Any gene pair coannotated to at least one positive gene set is assigned value 1.
  • Any other gene pair coannotated to at least one nonnegative gene set is assigned no value (NaN).
  • Any other gene pair annotated to nonnegative gene sets with hypergeometric p-value of overlap less than the given cutoff is assigned no value (NaN).
  • Any other gene pair is assigned value 0. This is useful for rapidly generating gold standards from gene sets of interest: any gene pair known to participate in some function of interest will be marked as related, any gene pair known to participate in two unrelated functions will be marked as unrelated, and other gene pairs will be left unmarked.

Definition at line 328 of file dat.cpp.

References Get(), Sleipnir::CGenes::GetGene(), Sleipnir::CGenome::GetGeneNames(), GetGenes(), Sleipnir::CGenes::GetGenes(), Sleipnir::CGene::GetName(), Sleipnir::CMeta::GetNaN(), Sleipnir::CStatistics::HypergeometricCDF(), Sleipnir::CGenes::IsGene(), Sleipnir::CMeta::IsNaN(), Open(), and Set().

bool Sleipnir::CDat::Open ( const CDat DatKnown,
const std::vector< CGenes * > &  vecpOther,
const CGenome Genome,
bool  fKnownNegatives 
)

Construct a CDat from the given known gene relationships and gene sets.

Parameters:
DatKnown Known pairwise scores, either positive or negative as indicated.
vecpOther Gene sets, either positive or nonnegative as indicated (possibly empty).
Genome Genome containing all genes of interest.
fKnownNegatives If true, DatKnown contains known negative gene pairs (0 scores); if false, it contains known related gene pairs (1 scores). In the former case, positives are generated from pairs coannotated to the given gene sets; in the latter, negatives are generated from pairs not coannotated to the given gene sets.
Returns:
True if CDat was generated successfully.
Constructs a CDat by copying either the positive (1) or negative (0) values from DatKnown and calculating negatives or positives from vecpOther. In either case, values are calculated for all genes in the given genome.
  • If fKnownNegatives is true, negative (0) gene pairs are first copied from DatKnown. Then any other gene pair coannotated to at least one of the given gene sets is given a positive (1) score. Remaining gene pairs are given no value (NaN).
  • If fKnownNegatives is false, positive (1) gene pairs are first copied from DatKnown. Then any other gene pair coannotated to at least one of the given gene sets is given a missing (NaN) value. Remaining gene pairs are given a negative (0) score.

Reimplemented in Sleipnir::CDataPair.

Definition at line 247 of file dat.cpp.

References Get(), GetGene(), Sleipnir::CGenome::GetGeneNames(), GetGenes(), Sleipnir::CMeta::GetNaN(), Sleipnir::CMeta::IsNaN(), Open(), and Set().

bool Sleipnir::CDat::Open ( const CPCL PCL,
const IMeasure pMeasure,
bool  fMeasureMemory 
)

Construct a new CDat backed by the given PCL and similarity measure.

Parameters:
PCL PCL from which CDat genes and pairwise values are drawn.
pMeasure Similarity measure used to calculate pairwise scores between genes.
fMeasureMemory If true, the CDat is responsible for freeing the given similarity measure.
Returns:
True if CDat was generated successfully.
This opens a new CDat without precalculated similarity scores; the CDat retains a reference to the given PCL and similarity measure and calculates pairwise scores as needed. This can greatly reduce the amount of memory required by the CDat, but it can increase runtime when specific pairwise values are requested repeatedly.

Remarks:
fMeasureMemory should be false if a static or stack-based similarity measure object is used; it can be set to true for cloned/new allocated similarity measure objects that should be cleaned up with the CDat. The given PCL is not copied and should thus not be destroyed before the current CDat.

Definition at line 493 of file dat.cpp.

bool Sleipnir::CDat::Open ( const CDat Dat  ) 

Construct a copy of the given CDat.

Parameters:
Dat Data to be copied.
Returns:
True if the copy was successful.

Definition at line 286 of file dat.cpp.

References Get(), GetGeneNames(), GetGenes(), and Open().

bool Sleipnir::CDat::OpenGenes ( std::istream &  istm,
bool  fBinary,
bool  fPCL = false 
)

Open the genes from a CDat stored in the stream with the given format.

Parameters:
istm Stream from which genes are loaded.
fBinary If true, assume DAB format.
fPCL If true, assume PCL format.
Returns:
True if genes were successfully loaded.
Like Open, OpenGenes will load a CDat from the stream in the requested format, but it will load only the CDat's size and gene list from the stream. This is useful for rapidly obtaining gene lists and counts from large CDats without incurring the memory/time penalty of loading or memory mapping the whole file.

Remarks:
Attempting to access a CDat's data after opening only its genes is a poor idea.
See also:
CPCL

Definition at line 667 of file dat.cpp.

Referenced by Sleipnir::CDatasetCompact::Open(), and OpenGenes().

bool Sleipnir::CDat::OpenGenes ( const char *  szFile,
size_t  iSkip = 2 
)

Open the genes from a CDat stored in the given file, guessing the format from the file's extension.

Parameters:
szFile Filename from which CDat genes are loaded.
iSkip If the given file is a PCL, the number of columns to skip between the ID and experiments.
Returns:
True if genes were successfully loaded.
Like Open, OpenGenes will guess the appropriate file format from the given filename's extension, but it will load only a CDat's size and gene list from the file. This is useful for rapidly obtaining gene lists and counts from large CDats without incurring the memory/time penalty of loading or memory mapping the whole file.

Remarks:
Attempting to access a CDat's data after opening only its genes is a poor idea. If the extension is not recognized, DAB format is assumed.

Definition at line 625 of file dat.cpp.

References EFormatPCL, EFormatText, and OpenGenes().

void Sleipnir::CDat::Save ( std::ostream &  ostm,
EFormat  eFormat = EFormatBinary 
) const

Save a CDat to the given stream in the requested format.

Parameters:
ostm Stream into which CDat is saved.
eFormat Format in which the CDat should be saved.
Remarks:
The binary flag of the stream must match the intended format (text for DAT, binary for DAB/DAS). CDats cannot be saved to PCLs, only loaded from them.
See also:
Open

Definition at line 777 of file dat.cpp.

References EFormatSparse, and EFormatText.

Referenced by Save().

void Sleipnir::CDat::Save ( const char *  szFile  )  const

Save a CDat to the given file, guessing the format from the file's extension.

Parameters:
szFile Filename into which CDat is saved.
Save a CDat to the given file, guessing the format (DAT, DAB, or DAS) from the extension. If null, the CDat will be saved as a DAT to standard output.

Remarks:
CDats cannot be saved to PCLs, only loaded from them. If the extension is not recognized, DAB format is assumed.
See also:
Open

Definition at line 741 of file dat.cpp.

References EFormatPCL, EFormatText, and Save().

void Sleipnir::CDat::SaveDOT ( std::ostream &  ostm,
float  dCutoff = HUGE_VAL,
const CGenome pGenome = NULL,
bool  fUnlabeled = false,
bool  fHashes = true,
const std::vector< float > *  pvecdColors = NULL,
const std::vector< float > *  pvecdBorders = NULL 
) const

Generate a DOT-formatted graph from the CDat, suitable for processing with AT&T's Graphviz software.

Parameters:
ostm Stream into which DOT is saved.
dCutoff If finite, edge weights below this cutoff are not included in the DOT.
pGenome If non-null, name synonyms from this genome are used to label gene nodes.
fUnlabeled If true, do not label gene nodes and use only minimal unique identifiers when generating the DOT.
fHashes If true, include # marks in hexadecimal color strings within the DOT. This is moronic, but some DOT parsers (*coughboostcough*) will randomly crash if # characters are included in the DOT.
pvecdColors If non-null, contains weights between 0 and 1 interpolating node colors between cyan and yellow.
pvecdBorders If non-null, contains border widths in pixels for each node.
SaveDOT is one of the most useful methods for visualizing the weighted graph implicit in a CDat, particularly in combination with FilterGenes and EFilterPixie/EFilterHefalmp. Calling SaveDOT will generate a DOT graph file containing (optionally) each node and edge in the CDat, with edges colored by weight (scaled from green for the minimum weight to red at the maximum). Nodes can optionally be colored or given varying border widths based on external information, or labeled with different gene names (synonyms) than used internally by the CDat. DOT files can be visualized by a variety of software, notably the Graphviz package from AT&T.

Remarks:
If given, pvecdColors and pvecdBorders must be of the same size as the CDat.
See also:
SaveGDF | SaveNET | SaveMATISSE

Definition at line 1366 of file dat.cpp.

References Sleipnir::CColor::c_Black, Sleipnir::CColor::c_Cyan, Sleipnir::CColor::c_Green, Sleipnir::CColor::c_Red, Sleipnir::CColor::c_White, Sleipnir::CColor::c_Yellow, Sleipnir::CMeta::Filename(), Get(), Sleipnir::CGenome::GetGene(), GetGene(), GetGenes(), Sleipnir::CGene::GetName(), Sleipnir::CGene::GetSynonym(), Sleipnir::CGene::GetSynonyms(), Sleipnir::CColor::Interpolate(), Sleipnir::CMeta::IsNaN(), and Sleipnir::CColor::ToRGB().

void Sleipnir::CDat::SaveGDF ( std::ostream &  ostm,
float  dCutoff = HUGE_VAL 
) const

Generate a GDF-formatted graph from the CDat, suitable for processing with the GUESS graph exploration system.

Parameters:
ostm Stream into which GDF is saved.
dCutoff If finite, edge weights below this cutoff are not included in the GDF.
Calling SaveGDF will generate a GDF graph file containing (optionally) each node and edge in the CDat. Nodes are labeled minimally, and edges are not given any inherent color information (although this can be added later). GDFs are visualizable using the GUESS graph exploration software, although it doesn't deal well with large or highly connected graphs.

Remarks:
Edge weights are scaled by 10x in order to fall roughly within GDF's expected range.
See also:
SaveDOT | SaveNET | SaveMATISSE

Definition at line 1457 of file dat.cpp.

References Sleipnir::CMeta::Filename(), Get(), GetGene(), GetGenes(), and Sleipnir::CMeta::IsNaN().

void Sleipnir::CDat::SaveNET ( std::ostream &  ostm,
float  dCutoff = HUGE_VAL 
) const

Generate a NET-formatted graph from the CDat.

Parameters:
ostm Stream into which NET is saved.
dCutoff If finite, edge weights below this cutoff are not included in the NET.
Calling SaveNET will generate a NET graph file containing (optionally) each node and edge in the CDat. Nodes are labeled minimally, and edges are not given any inherent color information (although this can be added later). I honestly don't remember at all what software uses the NET format, and "net"'s impossible to Google these days (thanks, Microsoft...)

See also:
SaveDOT | SaveGDF | SaveMATISSE

Definition at line 1496 of file dat.cpp.

References Get(), GetGene(), GetGenes(), and Sleipnir::CMeta::IsNaN().

void Sleipnir::CDat::SaveMATISSE ( std::ostream &  ostm,
float  dCutoff = HUGE_VAL,
const CGenome pGenome = NULL 
) const

Generate a MATISSE-formatted graph from the CDat, suitable for processing with the MATISSE software.

Parameters:
ostm Stream into which MATISSE file is saved.
dCutoff If finite, edge weights below this cutoff are not included in the MATISSE file.
pGenome If non-null, name synonyms from this genome are used to label gene nodes.
Calling SaveMATISSE will generate a MATISSE graph file containing (optionally) each node and edge in the CDat. Nodes are labeled either minimally or using the given genome's names/sysnonyms, and edges are not given any inherent color information (although this can be added later). MATISSE files are visualizable using the MATISSE software, although it doesn't deal well with large or highly connected graphs.

See also:
SaveDOT | SaveGDF | SaveNET

Definition at line 1533 of file dat.cpp.

References Get(), GetGene(), Sleipnir::CGenome::GetGene(), GetGenes(), Sleipnir::CGene::GetGloss(), Sleipnir::CGene::GetSynonym(), Sleipnir::CGene::GetSynonyms(), and Sleipnir::CMeta::IsNaN().

void Sleipnir::CDat::Invert (  ) 

Replace each finite value in the CDat with one minus that value.

Remarks:
Most useful in combination with CDat::Normalize ( true ).

Definition at line 1104 of file dat.cpp.

References Get(), GetGenes(), Sleipnir::CMeta::IsNaN(), and Set().

void Sleipnir::CDat::Rank (  ) 

Replace each finite value in the CDat with its integer rank.

Remarks:
This can be costly in memory/processing time for large CDats.

Definition at line 1569 of file dat.cpp.

References Get(), GetGenes(), Sleipnir::CMeta::IsNaN(), and Set().

bool Sleipnir::CDat::FilterGenes ( const char *  szGenes,
EFilter  eFilter,
size_t  iLimit = -1 
)

Remove edges from the CDat based on the given gene file and filter type.

Parameters:
szGenes File from which gene names are loaded, one per line.
eFilter Way in which to use the given genes to remove edges.
iLimit For EFilterPixie and EFilterHefalmp, the maximum number of genes to retain.
Returns:
True if the filter was executed successfully.
Remove edges and nodes (by removing all incident edges) from the CDat based on one of several algorithms. For details, see EFilter.

Remarks:
EFilterTerm and to some degree EFilterEdge don't make a lot of sense for CDats that do not represent gold standards.

Definition at line 1136 of file dat.cpp.

References Sleipnir::CGenes::Open().

void Sleipnir::CDat::FilterGenes ( const CGenes Genes,
EFilter  eFilter,
size_t  iLimit = -1,
float  dEdgeAggressiveness = 0.5 
)

Remove edges from the CDat based on the given gene set and filter type.

Parameters:
Genes Gene set used to filter the CDat.
eFilter Way in which to use the given genes to remove edges.
iLimit For EFilterPixie and EFilterHefalmp, the maximum number of genes to retain.
dEdgeAggressiveness For EFilterPixie and EFilterHefalmp, higher values result in more aggressive edge trimming. NaN completely skips edge trimming.
Remove edges and nodes (by removing all incident edges) from the CDat based on one of several algorithms. For details, see EFilter.

Remarks:
EFilterTerm and to some degree EFilterEdge don't make a lot of sense for CDats that do not represent gold standards.

Definition at line 1174 of file dat.cpp.

References EFilterEdge, EFilterExclude, EFilterHefalmp, EFilterInclude, EFilterPixie, EFilterTerm, Get(), Sleipnir::CGenes::GetGene(), GetGene(), Sleipnir::CGenes::GetGenes(), GetGenes(), Sleipnir::CGene::GetName(), Sleipnir::CMeta::GetNaN(), and Set().

void Sleipnir::CDat::Normalize ( ENormalize  eNormalize  )  [inline]

Normalize each finite value in the CDat by a specific function.

Parameters:
eNormalize Method by which scores are normalized.
Remarks:
Values are left unchanged if ( dMax == dMin ) or ( dStd == 0 ).
See also:
ENormalize | Invert

Definition at line 216 of file dat.h.

References ENormalizeMinMax, and ENormalizeZScore.

Referenced by Sleipnir::CPCL::Distance(), and Sleipnir::CDatasetCompact::Open().

size_t Sleipnir::CDat::GetGene ( const std::string &  strGene  )  const [inline]

Return the index of the given gene name, or -1 if it is not included in the CDat.

Parameters:
strGene Gene name to retrieve.
Returns:
Index of the requested gene name, or -1 if it is not in the CDat.
See also:
GetGeneNames

Definition at line 243 of file dat.h.

Referenced by Sleipnir::CDataFilter::Attach(), FilterGenes(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CDataset::Open(), Open(), SaveDOT(), SaveGDF(), SaveMATISSE(), SaveNET(), and Sleipnir::CStatistics::WilcoxonRankSum().

float& Sleipnir::CDat::Get ( size_t  iY,
size_t  iX 
) const [inline]

Return the value at the requested CDat position.

Parameters:
iY CDat row.
iX CDat column.
Returns:
Value at the requested CDat position.
Remarks:
For efficiency, no bounds checking is performed. The given row and column must be smaller than GetGenes. As a symmetric matrix, the value at position XY will always equal the value at position YX.
See also:
Set

Definition at line 267 of file dat.h.

Referenced by Sleipnir::CPCL::Distance(), Sleipnir::CPCL::Impute(), Open(), and Sleipnir::CStatistics::WilcoxonRankSum().

size_t Sleipnir::CDat::GetGenes (  )  const [inline]

Returns the number of elements (genes) in the CDat.

Returns:
Number of elements (genes) in the CDat.
Remarks:
Since a symmetric matrix must be square, the number of rows equals the number of columns and is thus referred to as the number of elements (genes).

Definition at line 282 of file dat.h.

Referenced by Sleipnir::CPCL::Distance(), FilterGenes(), Sleipnir::CPCL::Impute(), Invert(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CDataset::Open(), Open(), Rank(), SaveDOT(), SaveGDF(), SaveMATISSE(), SaveNET(), and Sleipnir::CStatistics::WilcoxonRankSum().

const CDistanceMatrix& Sleipnir::CDat::Get (  )  const [inline]

Returns the symmetric matrix containing the CDat's values.

Returns:
Symmetric matrix containing the CDat's values.

Definition at line 293 of file dat.h.

Referenced by FilterGenes(), Invert(), Open(), Rank(), SaveDOT(), SaveGDF(), SaveMATISSE(), and SaveNET().

CDistanceMatrix& Sleipnir::CDat::Get (  )  [inline]

Returns the symmetric matrix containing the CDat's values.

Returns:
Symmetric matrix containing the CDat's values.

Definition at line 304 of file dat.h.

bool Sleipnir::CDat::Set ( size_t  iY,
size_t  iX,
float  dValue 
) [inline]

Set the value at the requested CDat position.

Parameters:
iY CDat row.
iX CDat column.
dValue Value to store.
Returns:
True if the value was stored successfully.
Remarks:
For efficiency, no bounds checking is performed. The given row and column must be smaller than GetGenes.
See also:
Get

Definition at line 331 of file dat.h.

Referenced by Sleipnir::CPCL::Distance(), FilterGenes(), Sleipnir::CPCL::Impute(), Invert(), Open(), and Rank().

std::string Sleipnir::CDat::GetGene ( size_t  iGene  )  const [inline]

Returns the gene name at the given CDat position.

Parameters:
iGene Index of gene name to return.
Returns:
Gene name at the requested index.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetGenes.

Definition at line 348 of file dat.h.

const std::vector<std::string>& Sleipnir::CDat::GetGeneNames (  )  const [inline]

Returns the vector of gene names associated with this CDat.

Returns:
Vector of this CDat's gene names.
Remarks:
Returned vector size will be identical to GetGenes.

Definition at line 362 of file dat.h.

Referenced by Open().

void Sleipnir::CDat::Set ( size_t  iY,
const float *  adValues 
) [inline]

Set an entire row of CDat values efficiently.

Parameters:
iY CDat row.
adValues Values to store.
Remarks:
For efficiency, no bounds checking is performed. The given row must be smaller than GetGenes, and the given array must be non-null and have length exactly (size - iY - 1).
See also:
Get

Definition at line 383 of file dat.h.

const float* Sleipnir::CDat::Get ( size_t  iY  )  const [inline]

Get an entire row of CDat values efficiently.

Parameters:
iY CDat row.
Returns:
Retrieved values.
Remarks:
For efficiency, no bounds checking is performed. The given row must be smaller than GetGenes and the returned array will have length exactly (size - iY - 1).
See also:
Set

Definition at line 404 of file dat.h.

References Sleipnir::CHalfMatrix< tType >::Get().

float* Sleipnir::CDat::Get ( size_t  iY  )  [inline]

Get an entire row of CDat values efficiently.

Parameters:
iY CDat row.
Returns:
Retrieved values.
Remarks:
For efficiency, no bounds checking is performed. The given row must be smaller than GetGenes and the returned array will have length exactly (size - iY - 1).
See also:
Set

Definition at line 425 of file dat.h.

References Sleipnir::CHalfMatrix< tType >::Get().

void Sleipnir::CDat::SetGene ( size_t  iGene,
const std::string &  strGene 
) [inline]

Set the gene name at the given index.

Parameters:
iGene Index of gene name to modify.
strGene Gene name to store at the requested index.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetGenes.
See also:
GetGene

Definition at line 445 of file dat.h.


The documentation for this class was generated from the following files:

Generated on Fri Jun 19 12:48:36 2009 for Sleipnir by doxygen 1.5.5