OmniSci Data Science Foundation
Last updated
Last updated
OmniSci provides an integrated data science foundation built on several open source components of the PyData stack. This set of tools is integrated with OmniSci Immerse and allows users to switch from dashboards to an integrated notebook environment, also connected to OmniSciDB in the background. Data scientists or analysts can switch from visual data exploration with Immerse to a deeper dive on a specific dataset, build predictive models using standard python-based Data Science libraries and tools, and push results back into OmniSciDB for use with Immerse.
There are several components of the OmniSci Data Science Foundation. Let's look at each in turn.
OmniSci provides deep integration with JupyterLab, the next-generation version of the most popular notebook environment and workflow used by Data Scientists for interactive computing. Data Scientists can access JupyterLab as simply as clicking an icon from within Immerse.
In addition to the seamless integration with Immerse, you can also use JupyterLab with OmniSci by creating an explicit connection object, either via the pymapd API
or via the Ibis API, which builds on pymapd.
You can learn more about JupyterLab here.
Ibis is a productivity API for Data Scientists working in Python when they need to analyze data in remote SQL-based data stores such as OmniSciDB. Inspired by the popular pandas toolkit for data analysis (and also created by Wes McKinney), Ibis provides a pythonic API that compiles to SQL under the hood. Combined with OmniSciDB's scale and speed, this offers a familiar but far more powerful method for data scientists to analyze very large datasets 'in-place'.
A key feature of Ibis is its support for multiple SQL databases backends, and also pandas as a native backend. Combined with Altair, this integration allows users to explore multiple datasets across different data sources.
Another key component of the OmniSci Data Science foundation is Altair. Building on the same Vega data visualization engine used by Immerse for geospatial charts, Altair provides a pythonic API over Vega-Lite, a simpler subset of the full Vega specification for declarative charting based on the 'Grammar of Graphics' paradigm. The OmniSci Data Science Foundation goes further and includes interface code to enable Altair to transparently use Ibis expressions in lieu of pandas data frames. This allows the creation of data visualization over much larger datasets in OmniSci without the need to write any SQL.
A major component of the OmniSci Data Science Foundation is the Nvidia RAPIDs toolkit. This is a collection of foundational libraries for GPU-accelerated Data Science and Machine Learning, with popular algorithms for clustering, classification and linear models, as well as a GPU-based dataframe (cudf). OmniSci allows configurable output to cudf from any query (including via Ibis or pymapd), so that users can quickly run machine learning algorithms on top of query results from OmniSci.
In addition to the above frameworks and tools, the Data Science Foundation docker container includes Facebook's Prophet library for forecasting, and prefect, a lightweight but powerful workflow engine, to allow data scientists to build and manage workflows in python.