Alteryx Designer Desktop Discussions

Pranab_C · ‎12-21-2023

Hi Champions,

i need help in extracting text and numbers into different columns from this

What I have is this:-

" Arizona 0 0 (US-AZ) Colorado 0 0 (US-CO)"

"Qatar (QA) 0 0 United Arab Emirates 147 0 (AE)"

" Missouri 70 0 (US-MO)"

" Utah (US- 0 0 UT) Total 217 0"

What I need is, i.e. Arizona in one column and zero in another

Arizona-0, Colorado-0

Qatar-0, United Arab Emirates 147

Missouri-70

Utah-0, Total 217

flying008 · ‎12-21-2023

Hi, @Pranab_C

Because you first line string have no leading space, but your sample data has, so you only need modify the parse expression from ^["\s]+([[:alpha:]]+)\s+(\d+)\s[\d\s]+(?:([[:alpha:]\s]+)\s+(\d+))? to ^["\s]?([[:alpha:]]+)\s+(\d+)\s[\d\s]+(?:([[:alpha:]\s]+)\s+(\d+))? , then all done.

Tips: only change the first + to ? .

^["\s]*?([[:alpha:]]+)\s+(\d+)\s[\d\s]+(?:([[:alpha:]\s]+)\s+(\d+))?

录制_2023_12_22_15_30_20_854.gif

******

If can help you get your want, please mark it as a solution and give a like for more share.

Pranab_C · ‎12-22-2023

It should be in each row, I understand the data is not standard but here is how the data is in most cases. Please see attached the PDF file and the way data would be in 99.9% cases. Your help in getting this extracted would be greatly appreciated.

flying008 · ‎12-22-2023

Hi, @Pranab_C

Maybe you can find the macro of readpdf or readword from gallery.

录制_2023_12_23_09_30_39_999.gif

Table_No	RowID	Location	Workdays (W)	COVID-19 Workdays (CW)	Non-Workdays (O)	COVID-19 Non-Workdays (CO)	Not Specified	Total
1	1	Oman (OM)	0	0	1	0	0	1
1	2	Qatar (QA)	0	0	2	0	0	2
1	3	United Arab Emirates (AE)	147	0	66.5	0	0	213.5
1	4	United States Arizona (US-AZ)	0	0	8	0	0	8
1	5	United States Colorado (US-CO)	0	0	8	0	0	8
1	6	United States Missouri (US-MO)	70	0	47	0	0	117
1	7	United States Utah (USUT)	0	0	15.5	0	0	15.5
1	8	Total	217	0	148	0	0	365

Pranab_C · ‎12-22-2023

Thank you but the issue is that this workflow would be run in gallery and directory would not work in that environment. We are currently using R code to extract PDF, its working fine for the entire PDF except this page. Any suggestions or help would be much appreciated.

Alteryx Designer Desktop Discussions

Regex to extract Text and Number into different columns

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation