Augustus loop

1/1/2024

Even with multiple cores, this process can take a long time because of the many genes. While BUSCO is great for this purpose, one big disadvantage is that BUSCO will attempt to train Augustus with all “complete” genes, which with certain taxa sets results in thousands of genes. BUSCO can, therefore, be co-opted as a tool for automated Augustus training in new genomic resources using hundreds to thousands of conserved, single-copy (usually) genes and it also utilizes multiple cores. Importantly, in utilizing Augustus, the authors of BUSCO have included an option (off by default) that automates Augustus training using the complete BUSCOs identified. BUSCO utilizes three key pieces of software: BLAST, HMMER, and Augustus. This results in upwards of thousands of genes to profile and their presence/absence, completeness, and copy number are a great way of evaluating the quality of new genomic resources.

These genes have been identified and curated as part of the OrthoDB project. The approach is to use a set of highly conserved genes across certain taxonomic groups (e.g., Insecta, Vertebrata, Mammalia) that are single-copy in most species examined. BUSCO’s original intent was as a quality-control assessment of genome assemblies and annotations. One method of training Augustus that I was drawn to and that I have written about before is to use the popular software BUSCO. While both will work, I have found the former to be difficult to use and slow (can only use 1 core) and the latter is more of a black box (and is pretty new). Fortunately, there are some ways of automating the process using tools, like autoAugTrain (packaged with Augustus) and BRAKER2. One critical step in using Augustus - especially for new genomic resources for divergent taxa - is the optimization process. Augustus is a very popular piece of software for gene prediction and is commonly used for whole-genome gene annotation.

0 Comments

Augustus loop

Leave a Reply.

Author

Archives

Categories