community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Automatic column mapping for tables with different column names

Meteor

Hi guys,

 

I have two tables, one with 150 columns, one with 110 columns. The one with 150 columns contains basically the same information as the one with 110 columns, but additional columns as well. The column names don't match - sometimes there are similarities, sometimes not.

 

Is there any way to automatically map the best-suited column in the target table for each column in the source table, e.g. by comparing the distribution function that most closely resembles that of my source data?

 

Or any other thoughts on how to automatically generate a mapping between the tables?

 

 

Thanks! 

Comet

You mentioned distribution function - does that mean that the columns with "basically the same information" do not have exactly the same data? How then do you determine if a column is the "same" or not?

Meteor

For example, if all the entries in Table A in Column A have similar entries then the entries in Table B Column Z (e.g. 1 in A, A will always correspond to a 3 in B, Z)

Alteryx
Alteryx

One way you can explore for the column with similarity on the name is to extract the column name with the field info, use a fuzzy match on source table and target table, rename the fields according to the fuzzy match score of one of your table and use a join as the name would be the same. But it doesn't help if names have no similarity. Perhaps the fuzzy match could be used on the value of the column too. 

Comet

If there are potentially no similarities in the column names, you could try comparing the contents of the columns as a concatenated string. This might not work if you have a lot of rows of data, though, as you'd run into the max length. If you do run into a length problem, then you might have to bring the data into R/Python and hash it, and compare the hashes of each column instead.

 

Labels