Need help with Regex to extract text from string
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Alteryx Champions,
Need help in extracting as depicted in Regex-1.png however the text is getting split as depicted in Regex-2.png.
I need the entire row "Tell us about the states, provinces, and territories you were in during the tax year" in one place and "Done" in the second one, however it is getting split into 3
Solved! Go to Solution.
- Labels:
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Can you upload some sample data that's representative of your input? It looks like some "Done"s appear on separate rows which Regex cannot handle. If done is always the last word, you can use the function GetWord([text], CountWords([text])-1) in a formula to grab it without needing Regex. A little more information would be helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you for replying, please see this
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
That's the input? or the expected output? Thank you for sharing. Would you mind also including what your input looks like once in Alteryx (before any manipulation)? If it's giving you weird spacing/returns for questions, the data cleansing tool is your friend!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Sharing it again, First file is the input and the second one is how it is coming in Alteryx right now, ideally the output should be :-
Column-1 Column-2
Tell us about the states, provinces, and territories you were in during the tax year Done
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Are you able to upload a sample workflow? or share a screenshot of your input tool settings? What file type are you using? The difficulty with the images that are being shared is that there is little information to go off of and there seems to be quite a bit of variability in even just the two rows of input that I can make out in "Regex-2.png".
Fundamentally, you need to identify the underlying structures in your data to accomplish anything useful in Alteryx. That structure is what I am trying to ascertain, but I still need more information to provide the appropriate assistance.
I feel very confident at this point in what your desired output needs to look like, but how to get there from the input still needs further investigation to determine exactly what the input is (which can be helped with answers to the questions provided).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am using an R Code to convert a PDF to .txt in Step-1, text to column to split the data to rows in Step-2 and then Regex to extract the data. I need help with the final step
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you very much! It took some finagling, but I managed to get things into working order. It's a little hard-coded, but shouldn't be too hard to adjust as necessary. My first recommendation is to change your R code, using the method pdf_data(FullPath) instead of pdf_text(FullPath). The difference being that pdf_text() parses through all the text in one large block, where as pdf_data() stores the location of each word in a Tibble.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you so much for the solution, however I am unable to update the path of the file to make it dynamic for the users to select different file with more pages. Current workflow just has one page, we generally have a about 50-60 pages in the PDF. It worked fine with the one page of the PDF
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
