Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cross join (cartesian product) #3437

Open
kevinanewman opened this issue Mar 6, 2023 · 2 comments
Open

Implement cross join (cartesian product) #3437

kevinanewman opened this issue Mar 6, 2023 · 2 comments
Labels
new feature Feature requests for new functionality

Comments

@kevinanewman
Copy link

  • Describe what feature you would like to see implemented.
    Implement cross join (cartesian product) - every row of the left Frame combined with every row of the right Frame.

For example, in pandas:
cross_df = pd.merge(left_df, right_df, how='cross')

conceptually, this works but is very slow:
cross_dt = dt.rbind([dt.cbind(left_dt[i, :], right_dt[j, :]) for i in range(left_dt.nrows) for j in range(right_dt.nrows)])

  • If possible, give an example of how it may look in the code and what result
    will be produced.
  left_dt = dt.Frame({'left_col': [1, 2, 3]})
  right_dt = dt.Frame({'right_col': [4, 5, 6]})
  
  cross_dt = left_dt[:, :, join(right_dt, how='cross')]  # something like this, I'm guessing
  
  print(cross_dt)
     | left_col  right_col
     |    int32      int32
  -- + --------  ---------
   0 |        1          4
   1 |        1          5
   2 |        1          6
   3 |        2          4
   4 |        2          5
   5 |        2          6
   6 |        3          4
   7 |        3          5
   8 |        3          6
  [9 rows x 2 columns]
  • Click "Preview" to check that there are no formatting problems, that the
    request is stated clearly, and that it is not overbroad in scope.

  • Thanks for contributing. We appreciate your input!

@oleksiyskononenko oleksiyskononenko added the new feature Feature requests for new functionality label Mar 8, 2023
@samukweku
Copy link
Collaborator

any specific use case for this, @kevinanewman ?

@kevinanewman
Copy link
Author

In my case, I use this to create the full-factorial combination of two dataframes. Each (relatively small and manageable) dataframe essentially contains sweeps of independent variables. It's easy to work with the small dataframes and then kick off a cross-join to combine them to create the much larger combined dataframe.

Hope this helps, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Feature requests for new functionality
Projects
None yet
Development

No branches or pull requests

3 participants