Copyright © 2005 Steven A. Eschrich
This manual describes the GENE application. GENE stands for Gene Expression and Normalization Engine, a program designed to calculate gene expressions from Affymetrix CEL files. It implements two of the more popular methods, MAS 5.0 and RMA, in a graphical interface for easy end-user access.
GENE does not provide any microarray analysis, it merely computes the gene expression from raw Affymetrix CEL files. It does so easily and efficiently, such that many CEL files can be processed with modest resources.
In order to use GENE you will need access to the CDF file from Affymetrix, which describes the design of the chip. With the CEL files and the corresponding CDF file, GENE will generate a single tab-delimited text file containing the normalized gene expression values. This file is suitable for loading into Microsoft Excel or Bioconductor (R) for further analysis.
The GENE application is a graphical application with a simple tabbed
window interface. There are three tabs: CEL Files, RMA, and
MAS5.0. Each section is described individually in this manual. A Log
Window, at the bottom of the application, is used for displaying
important information throughout the execution of GENE.
The GENE application takes CEL files as input and produces expression
summary values. In order to do this, the first step is to specify the
CEL files to include. You can use the "Add Files..." button or the
"File" menu, "Add Cel Files" to bring up the file selection
dialog.
From the "Add Cel Files" you can choose one or more CEL files to add to the GENE application. Once the files are selected, they will be listed along with the chip type in the "CEL Files" window (see figure below). You must use CEL files of the same chip type.
NOTE: GENE accepts CEL/CDF files in both the old (text) format and the
newer binary format. The file type is automatically identified when
loading the data.
In addition to selecting the CEL files, you must also indicate the CDF file, corresponding to your chip type. You may leave this blank if the correct CDF is in the same directory as the first CEL file in the list. Or you may set a global option (see Options section) indicating the directory to look for CDF files.
MAS 5.0 is the name of the Affymetrix algorithm used for producing gene expression signal (see Affymetrix WebSite) for more details on the algorithm.
The algorithm consists of background correction, calculation of the probe summary and scaling (typically termed probeset-level normalization). The options available within GENE include:
Finally, you can select the name of the output file to write
expression results. This file is a tab-delimited output file with
columns labeled with CEL file names (without the .CEL extension) and
rows corresponding to probesets. The default is exprs-mas.txt
in the working directory.
RMA is a popular model-based approach to normalization and calculating gene expression for Affymetrix microarrays. The approach consists of background correction, quantile normalization and a modeling probe-specific effects across multiple arrays using a median-polish method for fitting the model.
The options for this method include:
Finally, you can select the name of the output file to write
expression results. This file is a tab-delimited output file with
columns labeled with CEL file names (without the .CEL extension) and
rows corresponding to probesets. The default is exprs-rma.txt
in the working directory.
At present there is only a single preference that can be stored: the
default CDF directory to use. We typically store all of the CDF files
in one directory so they can be accessed. If you set this directory to
a value, it will be used to find CDF files when running MAS or RMA.