Interactive Data Exploration with Altair

Explore very large datasets with an open Python visualization API powered by Vega-Lite

Introduction to Altair

Tip: The Altair open source project documentation is the best place to learn more, and stay up to date

OmniSci's Data Science Foundation includes the Altair visualization library. Here is a good overview of Altair from the project website

Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub.

With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

Altair and Ibis

While Altair is typically used with smaller, local datasets, OmniSci Data Science Foundation has integrated it with Ibis (and this integration itself is open-sourced). This unique combination allows truly interactive visualization over extremely large datasets consisting of billions of data points, all with minimal Python code.

In addition, Altair allows for composable visualization, which allows for more possibilities than just local data exploration, when combined with Ibis underneath. Because Ibis can support multiple storage backends, it is possible, for example, to create charts that cover more than one (remote) data source at one time.

Examples

Here are some examples of what you can do with Altair and ibis together inside the OmniSci data science foundation.

You will need to use JupyterLab > 2.0 for the following examples

First, install ibis-vega-transform, which in turn installs Altair and Ibis.

pip install ibis-vega-transform
jupyter labextension install ibis-vega-transform

A simple example

Below is a minimal example of Ibis and Altair together, starting with a simple pandas dataframe

import altair as alt
import ibis
import ibis_vega_transform
import pandas as pd

source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

connection = ibis.pandas.connect({'source': source })
table = connection.table('source')

alt.Chart(table).mark_bar().encode(
    x='a',
    y='b'
)

This should produce an image like this

Some things to note:

  • You can use Altair with pandas, without Ibis. The purpose of this example is to show how Ibis can support Pandas itself as a backend similar to how it supports OmniSci as a backend

Using Ibis and Altair together

The next step is to try and use Ibis with a more scalable backend than Pandas. We'll naturally use OmniSci for this example, but you can try these with other Ibis backends, supported via the ibis-vega-transform project, that bridges Altair to Ibis. This combination is extremely powerful, as we'll soon see.

First, we'll connect to an OmniSci server. In this case, we can use a public OmniSci server so you can follow along, but you can use any OmniSci server you've installed or have access to.

Adding interactivity

Altair provides many ways to add interactivity to charts. Actions like selection and brush filters can provide more dynamic data visualizations in Altair, that allow you to explore data in a far richer manner, beyond creating static charts.

Finishing

Last updated