Community tools

Microbiome science methods and bioinformatics code are constantly evolving leading to an array of tools and data products available to the community. These tools, however, can be cumbersome to install given outdated operating systems and dependencies associated with code in repositories like GitHub. Moreover, users may not have access to the compute resources required to run the tool on their data. To this end, iMicrobe develops tools as Singularity containers, or packaged virtual machines, that encapsulate the operating system, dependencies, and tool code to ensure reproducibility and allow the code to run on any computational architecture. Community developers can contribute tools by releasing containers to Biocontainers or Docker Hub. Currently, all tools are deployed at TACC Stampede2 using the Agave API. However, the containerized tools can theoretically be run on any compute resource including cloud resources such as Amazon Web Services (AWS) or Google Cloud Platform. Containers are paired up with the frontend in iMicrobe using a JSON file that encodes the input, parameters, and outputs for the user interface on the frontend. This file also specifies CPU and memory requirements to run the tool on a single server, or a high-performance compute cluster (Stampede 2). By working closely with developers, iMicrobe streamlines community-driven tool development and accessibility to a variety of tools in a simple web-based platform. Further, users can compare and contrast tools on their data sets to optimize their analyses.

Bioinformatics tools can be deployed directly through the iMicrobe site using the Agave API and XSEDE high performance compute resources. To run an analysis, users add data from their datastore or shopping cart, select parameters, and launch the tool. Users can track the status of their jobs directly on the site, and view results and interactive data visualizations. As with any file, users can share results with collaborators. Provenance of primary data, derived files, and analyses are tracked in CyVerse by keeping all files in the analysis directory, along with data products and a log file record about the job, including data sources, app versioning, and the parameters selected for that run. CyVerse also maintains the the job history (including the App id) to allow researchers to track and reproduce other researcher’s experiments.

Last updated