Help for the Fitness Browser
The fitness data is collected using randomly barcoded transposons (RB-TnSeq). Each fitness experiment is based on a pool of 30,000 to 500,000 mutant strains. Every mutant strain has a transposon inserted at a random location in the genome, and each transposon includes a random barcode that allows us to track the abundance of that strain by using PCR followed by DNA sequencing ("BarSeq"). To link the barcode to the location in the genome, we use a more complicated TnSeq-like protocol.
For each fitness experiment, we compare the abundance of each strain at the end of the experiment to its abundance at the beginning. The beginning sample is also referred to as the "Time0" sample. Typically, we recover the pool of mutants from the freezer in rich media, wash the cells and take Time0 sample(s), and transfer the washed cells into many different tubes or wells. Thus, many different conditions may be compared to the same Time0 sample(s).
For details, see our methods paper (Wetmore et al, mBio 2015).
Fitness values are log2 ratios that describe the change in abundance of mutants in that gene during the experiment. For most of the fitness experiments, which are growth experiments, the change reflects how well the mutants grow. Fitness = 0 means that mutants in this gene grew well as other mutants and probably about as well as wild type strains. Fitness < 0 means that the gene was important for fitness and the mutants were less abundant at the end of the experiment than at the beginning. For example, fitness = -1 means that mutants in the gene were half as abundant at the end of the experiment, compared to the beginning. Fitness > 0 means that the gene was detrimental to fitness and that mutants had a growth advantage.
In general, if -1 < fitness < 1, then the gene has a subtle phenotype that might be statistically significant (see t scores) but will probably be difficult to interpret. Fitness < -2 or fitness > 2 are strong fitness effects. In the typical experiment, the pool of mutants doubles 4-8 times, so in principle, a conditionally essential gene should have fitness of -4 to -8. However, it is not possible to tell the difference between little or no growth with a pooled assay. (Also, very low fitness values are more noisy because they are based on a log2 ratio with a small numerator – in the typical experiment, a fitness value of -1 is reliably different from 0, but -5 is not reliably different from -4.)
More rigorously, gene fitness is the weighted average of strain fitness, across strains that has a transposon inserted within that gene. A strain's fitness is the log2 ratio of abundance at the end of the experiment compared to its abundance at the beginning of the experiment, where we use the number of reads for each strain's barcode as a proxy for its abundance. The gene fitness is normalized so that the typical gene has a fitness of zero. For genes on large chromosomes, the gene fitness values are also normalized for changes in copy number along the chromosome.
Although most experiments are based on growth, this site also includes assays of motility or survival. For a motility assay, the experimental samples might be the cells that reached the outer ring of an agar plate, or that stayed in the inner ring where the cells were originally placed. For a survival assay, the cells are stressed or starved for a period of time; then, to distinguish viable cells from dead cells, all cells are transferred to a rich medium and recovered for a few generations.orthologous genes.
Cofitness(gene 1, gene 2) is the linear (Pearson) correlation of their fitness patterns. Alternatively, if two genes in the same organism have similar fitness patterns, then we say that they are cofit.
If two genes have similar fitness patterns (cofitness > 0.75), and they are among the most cofit genes (rank = 1 or rank = 2), then they are likely to function in the same pathway. For genes with strong fitness patterns, often the most cofit genes are other genes in the same operon, so we look a little farther down the list to find genes that may have related functions.
Conserved cofitness: If two genes have cofitness > 0.6, and their orthologs have cofitness > 0.6, then this is stronger evidence of a functional relationship.
If we have relatively little data for an organism, then cofitness results will not be available for any of its genes.
- |fit| > 1
- |t| > 5
- |fit|95 < 1, where |fit|95 is the 95th percentile of |fit| across all experiments for this gene
- |fit| > |fit95| + 0.5
We use "orthologs" to refer to similar proteins in different organisms that may carry out the same function, without regard to their evolutionary history. Thus they are putative functional orthologs, not evolutionary orthologs. The "orthologs" in this web site are bidirectional best hits from protein BLAST. We also require that the BLAST alignment cover 80% of each protein.
Many of these "orthologs" actually have different functions. If either gene has a strong fitness pattern, you may be able to use conserved phenotypes or conserved cofitness to confirm that the genes have conserved functions and are truly functional orthologs.
- PFam domains, computed with HMMer3
- TIGRFam domains or families, computed with HMMer3
- The best hit to KEGG, computed with RAPSearch2 and minimum 80% coverage and 30% identity
- The best hit to Swiss-Prot (the curated part of UniProt), computed with RAPSearch2 and minimum 80% coverage and 30% identity
- The best hit to annotated enzymes in MetaCyc, computed with RAPSearch2 and minimum 80% coverage and 30% identity.
- The SEED annotation, computed with the SEED API
Fitness Browser includes links to other analysis tools (see the protein page) as well as a homologs page (computed using BLAST).
Or, you can use Fitness BLAST for genomes to identify orthologs in our data set for an entire genome at once. It takes less than a minute and we plan to store the results indefinitely.
- Wetmore et al 2015 -- carbon source experiments for Escherichia coli BW25113, Shewanella oneidensis MR-1, Shewanella amazonensis SB2B, Phaeobacter inhibens BS107, and Pseudomonas stutzeri RCH2
- Rubin et al 2015 -- the mutant library for Synechococcus elongatus PCC 7942
Most of the data is not published. Contact Adam Deutschbauer for more information about the unpublished data.
This site was developed by ENIGMA - Ecosystems and Networks Integrated with Genes and Molecular Assemblies, a Scientific Focus Area Program at Lawrence Berkeley National Laboratory, and supported by the U.S. Department of Energy, Office of Science, Office of Biological & Environmental Research under contract number DE-AC02-05CH11231.