This is part of a larger python project, but I'm stuck on this part. How can I take dynamic data that is going into a python tool and turn into the a schema using the below python code as an example?
data = {
'col1': [1, 2, 3],
'col2': ['a', 'b', 'c'],
'col3': [True, False, True],
'col4': [1.0, 2.5, 3.8],
'col5': pd.date_range(start='2022-01-01', periods=3),
}
df = pd.DataFrame(data)
# Set up a connection to a new Hyper file
with HyperProcess(telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU) as hyper:
with Connection(hyper.endpoint, hyper_name, CreateMode.CREATE_AND_REPLACE) as connection:
connection.catalog.create_schema('Extract')
# Define the table schema
table_definition = TableDefinition(
table_name='Testing',
columns=[
TableDefinition.Column('col1', SqlType.int()),
TableDefinition.Column('col2', SqlType.text()),
TableDefinition.Column('col3', SqlType.bool()),
TableDefinition.Column('col4', SqlType.double()),
TableDefinition.Column('col5', SqlType.date())
]
I don't know python at all, but I'm hoping this is something basic and easy to answer the python gurus ;)
Kind Regards,
Craig
Hi @csh8428
I'm by no means a python expert, but my understanding is that when you connect data to the python tool it is read into a dataframe. For example you would use this syntax to reference it (#1 is the name of the connection in this instance and would appear when you connect another tool into the python tool).
df = Alteryx.read("#1")
If your data in Alteryx is set-up based on the sample you have in your code the rest should just work.
I dont really use tableau - but I'd also recommend sending in a second input (#2) which is the results of field info (or mapped field info) from your datastream... you would use that to map this:
columns=[ TableDefinition.Column('col1', SqlType.int()), TableDefinition.Column('col2', SqlType.text()), TableDefinition.Column('col3', SqlType.bool()), TableDefinition.Column('col4', SqlType.double()), TableDefinition.Column('col5', SqlType.date())
such that 'col1' represents your fieldheader and SqlType.xxxx represents your Tableau type. I don't know if this can take a variable - but if not - this could be annoying.
@Luke_C The problem is that this macro is for our department and the data going in will be dynamic and change for every workflow.
@apathetichell I think that approach might work. Gonna give it a try.
Hey - just to clarify @Luke_C is just explaining how this part:
data = { 'col1': [1, 2, 3], 'col2': ['a', 'b', 'c'], 'col3': [True, False, True], 'col4': [1.0, 2.5, 3.8], 'col5': pd.date_range(start='2022-01-01', periods=3), } df = pd.DataFrame(data)
works in Alteryx - the sample data in the python tool and the #1 read shows you how to initilize your datastream as a Pandas DF (Alteryx natively intermingles the concept of a DF with a Pandas DF - which is a fiction) This will be dynamic to your Alteryx data and allow you to bring you data in to Python as needed. Your problem is the second component which is mapping your fields in DF1 to column types and headers in Tableau - this will be dynamic - and my hope/assumption is that Tableau is taking in to things: 1) data 2) schema - and that for schema you can dynamically create it. I don't use Tableau much (and Tableau via python minimally) but assuming that understanding is correct you can use a field info stream connected to your python tool as #2 to populate this. I'd build a helper function to identify the Tableau SQLtype for each field.
If you hit an issue - and are using https://pypi.org/project/tableauhyperapi/ - I might try to do the dynamic table generation just for fun. I don't need to do this to export to Tableau - so my goal will just be to create a Tableau Table Definition dynamically in Python.