I would like the help of the community to evolve in extracting data from a specific website. I'm attaching the flow I made to the part I got, I don't know if it's correct and I also don't know which components I should use to follow and complete the data download (extraction).
Solved! Go to Solution.
Hi @jcastro
When downloading data, you will have different options you can go with, you can set up an API, you can download files directly by dynamic URLs or download the HTML data and parsing everything out.
I see you are trying to download it from the CVM Website (I'm Brazilian as well). I would recommend you to download from the Open Portal they have http://dados.cvm.gov.br/dataset?tags=fundos+de+investimento
Here you can select different options to download information.
I have attached a workflow that downloads the monthly information from each Hedge Fund so you can have a better understanding. I'm using the download URLs the CVM website provides to download it.
Pedro.
@pedrodrfaria thank you very much for your help. I think that the way I'm doing it, I won't be able to use a specific URL, right? As I understand it I will always have to search for a specific area of the sites to download the data according to your example, or in some cases an API. I thought it was possible to do it directly via a URL, as in my example.
Hi @jcastro
You can, but you would then need to parse the HTML out in order to get a readable data from the CVM website. I did a project that was about extracting it all from the CVM website and I choose to go with web scrapping from the Portal de Dados instead of the regular CVM website as it is the same information but in a way I can download it much easier.
Please mark the discussion post as closed with an answer if it was answered. If you need help with parsing the HTML out from the specific URL you want to use, I recommend creating a new post as it will allow us to better/quickly help you out.
Pedro.
Thank you so much @ pedrodrfaria
Ok