Check the Facts: my first CLI scraper

Welp, there it is. My Snopes CLI is up and running! Not gonna lie, that was a tough nut to crack. The cool thing about projects like this is that it takes you out of the Learn bubble and gives you a little push out into the scary, unending, bottomless sea that is development and lets you stay out there for a second, panicking, before you manage to clamber back onto shore.

I guess what I’m mostly referring to there is Nokogiri and XML, which I definitely still don’t really understand, but I’m way more comfortable with it now. I went into my project confident that I had chosen a website that was perfect for scraping and that I probably wouldn’t hit any major obstacles.

I found out how wrong I was when I realized that certain text elements I wanted to scrape didn’t have any unique selectors, and neither did their parents, but their CHILDREN did. No I tried to figure out how to select an element’s parent using CSS and it turns out… that CSS doesn’t have a parent selector. Bum bum baaaaah!

On one hand, bummer. On the other hand, it was a very interesting find; I love seeing debate in the development community about stuff like this.It reminds me how much this field is constantly evolving. CSS-tricks had a whole article about this issue, and apparently a bunch of programmers have proposed different potential syntaxes for a parent selector, but it was left out of CSS3 due to the potential for overuse. I wondered if I would have to scrap the project.

BUT! Guess what DOES have a parent selector? XM friggin’ L. It took me a minute, but after learning a bit about XML, nodes, and all that jazz, I was able to get what I wanted. Definitely useful and definitely an area for me to explore more later.

Other than that, things went more or less smoothly. Had a period of time where I kept recieving 404s from the scraper, but that kind of just went away by itself somehow. That’s  a little frustrating, not knowing why, but at least I got past it. I have to say though, I’m glad I chose Snopes for this project. It was fun working on something where I would occasionaly also see confirmation that Cracker Barrel is not, in fact, changing its name to ‘Caucasian Barrel’ under pressure from liberals.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s