Scraping HTML Regex Tester

1 Star2 Stars3 Stars4 Stars5 Stars (6 votes, average: 4.83 out of 5)
Loading...
Go back to All Questions Login or Register
1176 views
0

Greetings,

I’m having a bit of trouble with the Scraping HTML Regex Tester. I’m learning Regex on-the-fly mostly using this site: https://regexr.com/.

I’m also trying to learn web scrapping a bit using the scrapping tool. Unfortunately the page to the documentation does not work.

An example issue I’m having is that the Regex I use on the website above does not work on the Regex Tester tool. I mostly have to put in characters to see what happens, and use some of the example from your video. For instance, for practice I’m pulling all the names of UFC fighters from this page: http://www.ufc.com/fighter/Weight_Class

The fighter name is under an anchor class=”fighter-name”. Unfortunately the way the page is set up, there is a link before the class name. I finally figured out how to get around this to get to the actual fighter name, however, I can’t seem to figure out how to remove all of the empty spaces after the fighter name. I used:

\s+([^’]*?)

This works to get to the fighter name, but has a bunch of white space after the name which I can’t seem to remove. Everything that I’ve tried on the regex website doesn’t work within the tool. Any help on this, or a link to a documentation file that works? Also, any good recommendations for more web scrapping tutorials? Thanks!

0

Hi Jason,

Unfortunally Im not developer but lets say advanced user and Im not familiar with Python.

Also its very important for me to scarp the needed data into the excel because then Im combibing it to some of mine own data, do some comparing and so on…

Actually I have managed to use the addin to few websites (6) and there is only one that I cant figrure out.

I might even figure how to remove the empty space but problem is that tester just breaks and excel is not responding when Im using it.

Do you have the same issue?

0

Same issue here…

I couldnt find a way to remove empty space.

Also Regex Tester is causing excel to break when I search for the solution…

Simply the best place to learn VBA!