Hello,
i would like to download all the 2000 rows and 7 columns from given website
https://www.forbes.com/lists/global2000/?sh=1a3b071b5ac0
Solved! Go to Solution.
Use the Download tool to scrape the HTML, then comb through the DownloadData to find what you're interested in. Once you find it, it's a parsing exercise.
To "comb through" the DownloadData, I wrote the data out to CSV and opened it in Notepad++, then searched for something unique in the G2000 table (like Microsoft's "2,054" market value).
Now we can see that the data we need is encased in DIVs of the following structure:
<div class=""marketValue table-cell market value "">$2,054.37 B</div>
Now the only tricky thing left is to write regex to capture this. I'm sure there's more than one way to do this but I ended up with:
<div class=(?:.*?)table-cell(.*?)<\/div>
Some resources that helped me build my regex:
regex101: build, test, and debug regex
Greedy and lazy quantifiers (javascript.info)
regex - What is a non-capturing group in regular expressions? - Stack Overflow