GENE User Manual 1.0

Table of Contents


Node:Top, Next:, Up:(dir)

GENE User Manual

This is the manual for the GENE application, or Gene Expression and Normalization Engine, version 1.0.

Copyright © 2005 Steven A. Eschrich


Node:Overview, Next:, Previous:Top, Up:Top

Overview

This manual describes the GENE application. GENE stands for Gene Expression and Normalization Engine, a program designed to calculate gene expressions from Affymetrix CEL files. It implements two of the more popular methods, MAS 5.0 and RMA, in a graphical interface for easy end-user access.

GENE does not provide any microarray analysis, it merely computes the gene expression from raw Affymetrix CEL files. It does so easily and efficiently, such that many CEL files can be processed with modest resources.

In order to use GENE you will need access to the CDF file from Affymetrix, which describes the design of the chip. With the CEL files and the corresponding CDF file, GENE will generate a single tab-delimited text file containing the normalized gene expression values. This file is suitable for loading into Microsoft Excel or Bioconductor (R) for further analysis.

The GENE application is a graphical application with a simple tabbed window interface. There are three tabs: CEL Files, RMA, and MAS5.0. Each section is described individually in this manual. A Log Window, at the bottom of the application, is used for displaying important information throughout the execution of GENE.
GENE_main.png


Node:Input Files, Next:, Previous:Overview, Up:Top

Input Files

The GENE application takes CEL files as input and produces expression summary values. In order to do this, the first step is to specify the CEL files to include. You can use the "Add Files..." button or the "File" menu, "Add Cel Files" to bring up the file selection dialog.
GENE_main.png

From the "Add Cel Files" you can choose one or more CEL files to add to the GENE application. Once the files are selected, they will be listed along with the chip type in the "CEL Files" window (see figure below). You must use CEL files of the same chip type.

NOTE: GENE accepts CEL/CDF files in both the old (text) format and the newer binary format. The file type is automatically identified when loading the data.
GENE_selected.png

In addition to selecting the CEL files, you must also indicate the CDF file, corresponding to your chip type. You may leave this blank if the correct CDF is in the same directory as the first CEL file in the list. Or you may set a global option (see Options section) indicating the directory to look for CDF files.


Node:MAS 5.0, Next:, Previous:Input Files, Up:Top

MAS 5.0

MAS 5.0 is the name of the Affymetrix algorithm used for producing gene expression signal (see Affymetrix WebSite) for more details on the algorithm.

The algorithm consists of background correction, calculation of the probe summary and scaling (typically termed probeset-level normalization). The options available within GENE include:

Finally, you can select the name of the output file to write expression results. This file is a tab-delimited output file with columns labeled with CEL file names (without the .CEL extension) and rows corresponding to probesets. The default is exprs-mas.txt in the working directory.
GENE_MAS5.png


Node:RMA, Next:, Previous:MAS 5.0, Up:Top

RMA

RMA is a popular model-based approach to normalization and calculating gene expression for Affymetrix microarrays. The approach consists of background correction, quantile normalization and a modeling probe-specific effects across multiple arrays using a median-polish method for fitting the model.

The options for this method include:

Finally, you can select the name of the output file to write expression results. This file is a tab-delimited output file with columns labeled with CEL file names (without the .CEL extension) and rows corresponding to probesets. The default is exprs-rma.txt in the working directory.
GENE_RMA.png


Node:Options, Previous:RMA, Up:Top

Options

At present there is only a single preference that can be stored: the default CDF directory to use. We typically store all of the CDF files in one directory so they can be accessed. If you set this directory to a value, it will be used to find CDF files when running MAS or RMA.
GENE_Options.png