Hi,
I need help cleaning the text of the attached PDF/ excel exported from PDF. It is a list of New York counties with its cities. How can I clean this up so that I can have two fields (columns)- one for the County and one for the City within that county? See output example below.
Thank you!
County | City |
Albany | Albany |
Albany | Cohoes |
Albany | Watervliet |
Albany | Berne |
Albany | Bethlehem |
etc.
Solved! Go to Solution.
Give this a try!
Edit: Explanation -
Hi @karliu14 ,
My solution is almost the same as that of @Carolyn except for;
I hope this helps.
Workflow
Multi-Row Formula
[County] =
IF RegEx_Match([Value], "^\d.*") OR [Value] = "All Buroughs" THEN [Row-1:County]
ELSEIF StartsWith([Value], "New York City") THEN "New York City"
ELSE RegEx_Replace([Value], "(.*?)County.*", "$1")
ENDIF
Filter
REGEX_Match([Value], "^\d.*") OR [Value] = "All Buroughs"
Formula
[Value] = REGEX_Replace([Value], "\d+\s+", "")
Amazing! I have a lot to learn... Thank you, Yoshiro and Carolyn!
My solution is almost the same as that of @Carolyn except for;
- Sort Tool is not used as I think the column break should come before the page break.
For example, City "Otto" should belong to "Cattaraugus County", instead of "Ontario County".
Shoot! I thought that I did that right but didn't have a chance to triple check. Good catch :)
User | Count |
---|---|
106 | |
82 | |
72 | |
54 | |
40 |