I am considering scraping his place - this place is a goldmine of info

I am considering scraping his place - this place is a goldmine of info

I have an engine that works, and I did it (twice?) for our google community. It grabs the threads too, drilling down till it reaches the end (and grabs the original off site post if applicable).

It can grab the text, and can also make bitmaps.

The posts could (in theory) be moved to a new location. At worst, indexed into a DB.

I really don't need the info, so I would probably need some sort of incentive to do it, (or someone else could code it). It's a lot of work.

My lovely wife once had some fun arranging the articles into a google like view.

You can try my little shoddy job at the link below, by tying in something like "raspberry pi" or "Joe".

https://search.code4sale.com/sbcgplus/search.cgi

Comments

  1. I would expect Google to offer some sort of takeout of the community, but haven't seen anything announced yet.

    ReplyDelete
  2. The best move would be if it could be exported to a Google Group. Then it would remain searchable.

    ReplyDelete
  3. Lars Fosdal I wonder if that work like take out for the posts and collections they currently have (not) in place.

    I wonder if they will share the graphics they capture, and the names and times of people who posted and replied in some sort of GDPR friendly fashion.

    Who knows, they might add a check-box "let us move this to mewe for you".

    Expect from Google? I expect they will bail on any project they start, at any time that suits them, if it does not, in some sense, serve the stockholder.

    That is fiduciary law (and I am OK with that).

    What I am not OK with is the "in some sense" part.

    It is a figure, that someone (or thing) came up with to place a value on the operational costs and liability of a service keeps customers around so they may track and gather data, sell the data, and provide goodwill (and advertising to some extent).

    The liability issue is already out of the bag, so it comes down to liability reduction, operational cost, revenue, goodwill, and future value.

    Somehow, I cannot believe that Google, with its shiploads of servers (that they throw away everyday), cannot manage to keep the feed the going, (and maintain goodwill), with revenue coming in, at something that must somehow be described as some sort of a loss or liability requiring a sunset.

    It seems they employ the smartest people on the planet. Whatever the issue, I am sure they could it is solve it.

    If not, expect Face book to follow.

    And with that, probably a economic collapse (or perhaps the collapse of family units worldwide, as they lookup at the dinner table and discover the other people sitting at the table are not avatars).

    I don't know, but I doubt you will see "take out" for communities.

    You may get offered a "corporate" board though (:

    ReplyDelete
  4. As far as I have read, takeout only contains your own posts, not those of anybody else (I have yet to try a g+ takeout, I only tried the one for gmail.)

    ReplyDelete
  5. Nice idea, to snapshot this community!

    The ironic thing is, if this or Google's own export lets it be hosted somewhere, we'll probably get better Google search of the site than when it was G+.

    ReplyDelete
  6. I did a Takeout of my G+ activities to my GDrive. 219 2Gb files! WTF!

    ReplyDelete

Post a Comment