Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data wide/long reshape functions #142

Open
mhkeller opened this issue Apr 9, 2020 · 13 comments
Open

Data wide/long reshape functions #142

mhkeller opened this issue Apr 9, 2020 · 13 comments

Comments

@mhkeller
Copy link

mhkeller commented Apr 9, 2020

What are your thoughts on adding data reshape functions similar to the melt and wide_to_long functions in pandas or pivoting, gather and spread in the tidyverse?

It's a very common pattern when loading data for charts, such as in the multiline example. I find myself frequently writing these reshape functions in each project and they're often some of the least literate parts of my code. They're especially distracting when trying to teach people chart concepts and they hit a big speed bump right off the bat.

Anyway, it would be a great addition to the JavaScript world. If there are other packages that have already done this that I missed, let me know. I've seen a few "let's rewrite pandas/dplyr in js" packages over the years but none ever gets completed, let alone maintained. Happy to be wrong, though, if someone has broken off these functions somewhere!

@mbostock
Copy link
Member

mbostock commented Apr 9, 2020

Sounds interesting? I’d love to see a sketch of what these might look like. Perhaps some combination of array.flatMap and d3.group?

@mhkeller
Copy link
Author

Using tidyr's pivoting examples I started putting together some ideas here: https://github.com/mhkeller/pivoting.

I think the most readable is 2b but that's an older style.

@mbostock
Copy link
Member

I’ve dropped 1 and 2a into Observable notebooks for easy tinkering:

https://observablehq.com/d/41bc065377cb7e36
https://observablehq.com/d/7021f34babd6fbf6

@mbostock
Copy link
Member

If I were to write the relig_income example in vanilla JavaScript, I’d probably use array.flatMap like so:

data.columns.slice(1).flatMap(income => data.map(({religion, [income]: count}) => ({religion, income, count})))

Here’s another take on your first pivot function:

function pivot(data, columns, name, value) {
  const keep = data.columns.filter(c => !columns.includes(c));
  return data.flatMap(d => {
    const base = keep.map(k => [k, d[k]]);
    return columns.map(c => {
      return Object.fromEntries([
        ...base,
        [name, c],
        [value, d[c]]
      ]);
    });
  });
}

I haven’t evaluated the performance of any approach yet.

https://observablehq.com/d/3ea8d446f5ba96fe

Not directly related to this issue, but I’m also interested in making columnar data easier to use in JavaScript, since that should offer better performance. A column-oriented data structure is typically what I think of as a “data frame”.

@mhkeller
Copy link
Author

Very neat destructuring in the vanilla js example. The question I have that came up working through number two was 'What's the best API to handle transform arguments?' What I did with nested arrays I thought was a bit unwieldy and I hadn't yet gotten to implementing all of the features, such as names_pattern.

An alternative would be to limit the scope of this function and say it doesn't handle column name cleaning or casting (although I could see something like names_pattern being useful). The full workflow for someone doing the multiline example would then be something like:

  1. pivot the raw data
  2. clean with a forEach, map or filter
  3. group or rollup

For large datasets, maybe going through the data multiple times is a pain, performance wise? For the casual user, it can be nice just having one data transformation step, for sure.

I think my preference would be that if there's a manageable API, it would handy to do these transformations within pivot but not at the cost of getting lost in the arguments.

@Fil
Copy link
Member

Fil commented Jun 24, 2020

I made pivot 1 as a generator for your amusement https://observablehq.com/d/ac2a320cf2b0adc4

@Fil
Copy link
Member

Fil commented Jul 17, 2020

@Fil
Copy link
Member

Fil commented Jul 17, 2020

@nachocab
Copy link

nachocab commented Jan 17, 2021

If I were to write the relig_income example in vanilla JavaScript, I’d probably use array.flatMap like so:

data.columns.slice(1).flatMap(income => data.map(({religion, [income]: count}) => ({religion, income, count})))

Regarding the inverse operation (long to wide), is there a more elegant alternative to using d3.groups, array.map and array.reduce?

d3.groups(data, d => d.religion)
  .map(([religion, x]) => {
    return {
      religion,
      ...x.reduce((acc, { income, count }) => {
        acc[income] = count;
        return acc;
      }, {}),
    };
  });

https://observablehq.com/d/faa7e77aa71c7031

@mbostock
Copy link
Member

@nachocab Can you enable link sharing on the notebook so we can see?

@mbostock
Copy link
Member

Here’s another take of the inverse operation, replacing array.map with Array.from, and replacing array.reduce with Object.fromEntries:

Array.from(
  d3.group(data, d => d.religion),
  ([religion, group]) => Object.fromEntries(
    [["religion", religion]].concat(
      group.map(d => [d.income, d.count])
    )
  )
)

@nachocab
Copy link

@mbostock That's beautiful! Thank you for helping me understand those functions more deeply and for pointing out the link sharing bit. I'll remember it for next time. 👍

@pamtbaau
Copy link

pamtbaau commented Sep 19, 2021

Just moved to data visualisation and realised I'm a noop with respect to data manipulation... My conversions from Sqlite based normalised long data to wide was uhhh... less then optimal (to put it mildly) :-(

Build-in long/wide reshape functions would be very welcome.

Btw. thanks for this incredible library!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants