Need help with unexpected intermittent output from TRIM REGEX formula
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have a REGEX TRIM formula that I found and have used successfully a few times but I have a problem with the current data that I can't figure out. I have attached Input, workflow, and output files (that I formatted for easier viewing). Honestly I don't have a good understanding of this formula so I'm hoping someone can help figure out why partial sentences are showing up for some of the concatenated trimmed fields (columns J & L of the Input file and columns I & L of the Output file). In the Output, the Trim has helped to eliminate duplicate data within the field after multiple rows with the same Standard ID were concatenated. In these examples it appears there is duplicate data but that is due to spacing or punctuation... and is expected. The part that doesn't make sense I have bolded in dark red. This sentence doesn't exist on it's own in the original Input but it is part of the paragraph. The attached files are slimmed down with generic data. Please let me know if you see why this is happening (and only on some of the Input).
- Labels:
- Expression
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I understood that this workflow handles removing duplicates.
However, It seems that it is not working correctly because it recognizes the sentence after the comma (,) as a single sentence.
Please replacing it with the following formula fixed the problem.
Fixed Formula
TRIM(
REGEX_Replace([_CurrentField_], "(?s)(^|\n)(.*?)(?=\n\1)+", "\1")
)
Please confirm.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello!
To fix the partial sentences issue in your data, check for hidden characters in your input, review the regex pattern for accuracy, test with a smaller data set, and log choiceadvantages intermediate steps to pinpoint where the problem occurs. If you share the regex pattern and an example of the input and problematic output, I can offer more specific guidance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you @ntakeda but unfortunately this seems to only return the paragraph from the first line item even if the others are different. I cannot see anything different between the paragraphs' coming up with partial lines and the ones that are coming through with the full paragraph when their are differences. I'm trying to break apart the REGEX formula to understand it better since I am still very new to Alteryx and haven't done much yet with expressions. I appreciate the attempt. FYI - I did enter some different details in the Input to see if it would bring back the details for both ID's since they were totally different but it still only brought back one of them.
Actually, I think it has to do with the separators I used in the Summarize tool when I concatenated my fields. I realized that my other fields that had previously worked with the original TRIM(REGEX_Replace... formula weren't working correctly either so I changed the separator for those fields (like the New_Concat_Design Evaluation Conclusion field on Output for line items with multiple Concat_Test ID's). When I changed the separator from ; to , it worked but you are right... this is part of what's causing the problem with the fields I noted in my original question. I guess it's because I'm trying to eliminate a whole paragraph when it is a duplicate. In any case, I wanted to explain what I've figured out so far.
