Hello everyone,
I am working on a data standardization project.
The goal is to create a model for recognizing differents informations from a field.
So, I have a field with several information such the first name, last name, address, city, postal code...
The idea is to identify each part and isolate the information in a new field.
For example, I will have a column with the first name, one with the last name, a column with the street number, one with the name of the street, one with the city, one with the Zip code..
The problem is that thoose informations are in a different order. I was thinking about using regular expressions but it seems to be difficult to find a pattern wich work all the times.
I would like to be able to use machine learning techniques, for example by creating an algorithm that could identify each piece of information, based on data that has already been clean. Perhaps with a multitude of data, the algorithm will be able to identify the name, the city ....
Unfortunately I don't know how the machine learning algorithms work in this case, but it's something I'd like to learn how to use.
So if you can help me move forward on this project, I would be very grateful to you.
I am attaching an example file to show you the expected result.
Thank you