Data transformations
Philosophy
A central challenge of data visualization is the need to transform your data to a target structure. DuckPlot aims to solve this problem by allowing users to target the same visualizaiton with different underlying data structures. Let's take a long and wide view of the same sales data (souce):
Long view
Loading data...
Wide view
Loading data...
Because Observable Plot marks expect Tidy Data, DuckPlot performs a data transformation based on the specified columns of interest.
DuckPlot allows you to work with the wide view by automatically transforming the data to a long structure. In doing so, it will always transform the input data to a long structure with generic column names:
x | y | color |
---|---|---|
2013-05-13 | 64.5014 | AAPL |
2013-05-14 | 64.8357 | AAPL |
2013-05-15 | 62.7371 | AAPL |
2013-05-16 | 60.4629 | AAPL |
Multiple Y Columns
Specified Y columns will be unpivoted to create two columns: y
, and color
, where the y
column holds the value, and the color
holds the name of the unpivoted columns.
// Use wide data to show the AAPL and GOOG stock prices
duckPlot
.table("stocks_wide")
.x("Date")
.y(["AAPL", "GOOG"]) // These become the values in the color column
.mark("line");
Multiple color columns
Color columns will be concatenated into a new column
Given a long data structure where the metric is also stored in the column, you can specify each Symbol-metric pair as the desired color.
Input data
Loading data...
// Use long data to show the high and low prices for each stock
duckPlot
.table("stocks_long")
.x("Date")
.y("Value")
.color(["Symbol", "Metric"])
.mark("line");
Multiple Y columns and color columns
If you specify mulitple y values AND a color column, the y columns will first be UNPIVOTED, and then the resulting color
column will be concatenated with the specified color
columns.
Input data
Loading data...
duckPlot
.table("stocks_long_alt")
.x("Date")
.y(["AAPL", "GOOG"])
.color("Metric")
.mark("line");
Multiple X columns
Multiple X columns are only supported in the case of horizontal bar chart (barX
). As a correlary to mulitple inputting multiple Y axes for other marks, this creates two columns: x
, and color
, where the x
column holds the value, and the color
holds the name of the unpivoted columns.
duckPlot
.table("stocks_wide")
.query(
"select * from stocks_wide where year(Date) = 2017 AND month(Date) = 1"
)
.x(["AMZN", "AAPL"])
.y("Date")
.mark("barX");
Aggregations
For certain mark types (barY
, barX
, areaY
, line
), DuckPlot will automatically aggregate the data based on the data columns (e.g., x
, y
, color
, fx
, fy
...). If there are multiple rows with the same x
, y
, and color
values, DuckPlot will perform a sum
aggregation.
// Compute the total stock price per year for each stock (yes, a little weird!)
duckPlot
.table("stocks")
.query("select *, year(Date)::VARCHAR as year from stocks")
.x("Symbol")
.fx("year")
.y("Close")
.color("Symbol")
.mark("barY");
You can also specify the aggregation type such as the avg
for a more sensible plot:
// Compute the total stock price per year for each stock (yes, a little weird!)
duckPlot
.table("stocks")
.query("select *, year(Date)::VARCHAR as year from stocks")
.fx("year")
.x("Symbol")
.y("Close")
.color("Symbol")
.mark("barY")
.config({ aggregate: "avg" }); // makes more sense!