Combiner

Combiner combines things! Where things include combining PCLs into one large PCL, DAT/DABs into one averaged DAT/DAB, or DAT/DABs into a DAD dataset file. Multiple PCLs are combined into a single wide PCL by aligning genes' expression vectors and inserting missing values for genes not present in some dataset(s). DAT/DABs are combined into a single DAT/DAB by averaging pairwise scores in a configurable manner. DAT/DABs are combined into a DAD by ordering their individual scores as detailed in Sleipnir::CDatasetCompact::Save.

Usage

Basic Usage

 Combiner -t pcl -o <combined.pcl> <data.pcl>*

Create a new PCL file combined.pcl containing all genes in the microarray PCL files data.pcl, with new expression vectors consisting of the concatenation of all data from these input files. In other words, take the input PCLs, line up each gene's values, smoosh them all together, and plop the result into the output file.

 Combiner -t dat -o <combined.dab> -n <data.dab>*

Create a new DAT/DAB file combined.dab in which each gene pair's score is the average of the normalized (by z-scoring) scores from all input DAT/DAB files data.dab. The combination method can be modified using -m.

 Combiner -t dab -o <combined.dad> <data.dab>*

Create a new DAD file combined.dad containing the discretized gene pair scores from all input DAT/DAB files data.dab, which must be accompanied by appropriate QUANT files. This is equivalent to Dab2Dad.

Detailed Usage

package "Combiner"
version "1.0"
purpose "PCL and data file combination tool"

section "Main"
option  "type"      t   "Output data file type"
                        values="pcl","dat","dab","module"   default="pcl"
option  "method"    m   "Combination method"
                        values="min","max","mean","gmean","hmean","sum" default="mean"
option  "output"    o   "Output file"
                        string  typestr="filename"

section "Modules"
option  "jaccard"   j   "Minimum Jaccard index for module equivalence"
                        float   default="0.5"
option  "intersection"  r   "Minimum intersection fractino for module inheritance"
                        double  default="0.666"

section "Optional"
option  "skip"      k   "Columns to skip in input PCLs"
                        int default="2"
option  "memmap"    p   "Memory map input files"
                        flag    off
option  "normalize" n   "Normalize inputs before combining"
                        flag    off
option  "subset"    s   "Subset size (none if zero)"
                        int default="0"
option  "verbosity" v   "Message verbosity"
                        int default="5"

Flag Default Type Description
None None PCL or DAT/DAB files Input files to be combined; must all be of an appropriate type for the requested output.
-t pcl pcl, dat, or dab Type of combination to perform: pcl combines PCLs into a PCL, dat combines DAT/DABs into a DAT/DAB, and dad combines DAT/DABs into a DAD.
-m mean mean, min, max, gmean, or hmean Type of DAT/DAB combination to perform when combining pairwise scores. Options are to calculate the arithmetic, geometric, or harmonic mean, or to retain only the minimum or maximum value for each gene pair.
-o stdout PCL, DAT/DAB, or DAD file Output file of the type specified by -t.
-k 2 Integer Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL.
-p off Flag If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped.
-n off Flag If on, normalize input edges to the range [0,1] before processing.
-s 0 Integer If nonzero, process input DAT/DABs in subsets of the requested size as described in Sleipnir::CDataSubset.


Generated on Fri Jun 19 12:48:33 2009 for Sleipnir by doxygen 1.5.5