The iMicrobe Platform
Last updated
Last updated
The iMicrobe platform unites tools and microbiome data sets in a unified framework by leveraging National Science Foundation-supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available web-based platform to: (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) publish bioinformatics tools to run on highly-scalable computing resources (Figure 1). iMicrobe is not a data repository, but rather a search engine that connects remote datastores to make data discoverable and allow users to compute on data derived from disparate repositories. Similarly, community tools can be encapsulated in a “container” or virtual machine, connected to the iMicrobe web-based frontend via a descriptive JSON file, and run using compute at XSEDE. Taken together, iMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a simple web-based platform.
To save on development time and effort associated with downloading data sets from silo-ed platforms, iMicrobe hosts virtual data collections by collating web-accessible remote microbiome datasets (Figure 1). Users explore these remote data collections using an advanced metadata search, select data of interest, and add it to their “shopping cart”. Users can also upload and organize their own project data in their private datastore, that they can share with other CyVerse users. Data that have been added to the shopping cart or uploaded to the user’s datastore can be used as input to integrated bioinformatics tools (Apps) that run on free XSEDE high-performance compute resources. Apps in iMicrobe are open source and available on GitHub. Community developers can contribute tools to iMicrobe by making containers available through Biocontainers or Docker hub. Once an App finishes running, the results and log containing the provenance of the run are written back to the user’s datastore. These results can be shared with other CyVerse users, similar to project datasets.