Hello,
I have an input text file that I need to loop through and split into structured columns. So far, the input file is setup without headers and delimiter set "\0" and filtered out Null separator rows. I'm having trouble writing regex to define the datasets and range to be identified and extracted. I'm not sure of what the best approach to a solution is.
The input text file layouts data by Company (302, 303, 304, etc) and Instrument Type (DEP, LON, etc). The start of each company's dataset is identifiable by the Company code row and the range ends with the row containing "Company Totals:". I need to capture every company's entire dataset.
Next step is to split the dataset into columns and add columns for the respective Company and Instrument Type codes. The final output should look like the attached excel file.
Thank you in advance for your help and guidance.
Solved! Go to Solution.
I guess you forgot to attach the text file/input file. Can you please add that to enable iterate regex?
Thanks HW1, your solution worked perfectly. However, when I tested your solution against a larger dataset I discovered a few unexpected exceptions where the regex for Instrument Number failed to capture the data. Please is it possible to modify regex to handle the following Instrument Number:
BAP-78810
BAP-148080
B8241066 C
Q2111999 A
Thank you
I've successfully tested the solution. You've provided a great solution to learn from and build similar solution for other text files. Thank you
Cheers man.
Also, it would be a great idea to learn regex. This solves a whole lot of use cases in itself.
I deal with converting a huge amount of PDF files to text and then to a data frame. I use it a lot.
Agreed, learning regex will enable me to achieve the text to data frame solutions I need.
 
					
				
				
			
		
