I have been trying for hours to figure this out. From a building tutorial to just trying to find prebuilt ones, I can’t seem to make it click.

For context I am trying to scrape books myself that I can’t seem to find elsewhere so I can use and post them for others.

The scraper tutorial

Hackernoon tutorial by Ethan Jarell

I initially tried to follow this but I kept having a “couldn’t find module” error. Since I have never touched python prior to this, I am unaware how to fix this and the help links are not exactly helpful. If there’s someone who could guide me through this tutorial that would be great.

Selenium

Selenium Homepage

I don’t really get what this is but I think its some sort of python pack and it tells me to download using the pip command but that doesn’t seem to work (syntax error). I don’t know how to manually add it in because, again, I have little idea of what I’m doing.

Scrapy

Scrapy Homepage

This one seemed like it’d be an out-of-box deal but not only does it need the pip command to download but it has like 5 other dependencies it needs to function which complicates it more for me.

I am not criticizing these wares, I am just asking for help and if someone could help with the simplification of it all or maybe even point me to an easier method that would be amazing!


Updates

  • Figured out that I am supposed to run the command for pip in the command prompt thing on my computer, not the python runner. py -m followed by the pip request

  • Got the Ethan Jarrell tutorial to work and managed to add in selenium, which made me realize that selenium isn’t really helpful with the project. rip xP

  • Spent a bunch of time trying to workshop the basic scraper to work with dynamic sites, unsuccessful

  • Online self-help doesn’t go in as much as I would like, probably due to the legal grey area


  • undefined@lemmy.hogru.ch
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 days ago

    Selenium is a “driver” that controls browsers, you would need some type of software to actually drive it. If you have programming experience it’s pretty easy to get going.

    Personally, I use it in Ruby on Rails development for unit testing but I also use it to log in to websites and perform some actions on behalf of a user (where the websites don’t offer an API).

    I don’t have experience with the others, but thought my comment may or may not be useful.

    • Noah@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 days ago

      I don’t have programming experience and what sorts of software can “drive” the driver?

      • fuckwit_mcbumcrumble@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 days ago

        You’re going to want to do a lot more reading ahead of time then. It’s not hard, but you really need to know some basics about javascript before you start.

      • undefined@lemmy.hogru.ch
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 days ago

        I probably can’t be of much help yet unless for some reason you want to take up programming. I’m just not familiar with web scraping outside programming.

        • Noah@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 days ago

          I wouldn’t mind taking it up if I could just focus on what i’m interested in working on. Python seems simple enough after spending 9 hours trying to get this to work lol. I don’t want to “reinvent the wheel” as much as I just want to be able to understand and work with tools that already exist.