Rust's static linter is called "Clippy" for a reason.

umbraroze@kbin.social · 9 months ago

Reddit has an user data checkout feature (IIRC, check out the user settings or maybe reddit help pages to find it).

It’s a bit crap though.

It takes a long time to process, especially if you happened to post in the era when the Reddit data infrastructure was horribly terrible instead of merely ordinarily terrible, and apparently this involves some handwork in the worst cases on behalf of the staff.

Some data may be missing or truncated. It doesn’t give you data from privated/banned subreddits (which was a fun thing to discover because last time I tried to do this the blackouts were on), and even for legit stuff, long comments/posts may be truncated. Even so, I’m pretty sure that the dumps just straight up didn’t have all of my posts from several years ago, even if those were on public subreddits. So you need to make sure the checked out data is sensible.

In conjunction to the official dumps, I recommend a few other tools, especially since the dumps aren’t really magnificently usable on their own. One tool that I found personally invaluable is reddit-user-to-sqlite, which allows you to import Reddit data dumps and available live user data (I think it does this by scraping or something, I’m sure it worked despite the API being shut down) to sqlite database, and Datasette is a nice frontend for browsing the posts.

As for scrubbing, there’s tools for that are supposed to work. I think.

umbraroze@kbin.social · 9 months ago

Yup. The robots.txt file is not only meant to block robots from accessing the site, it’s also meant to block bots from accessing resources that are not interesting for human readers, even indirectly.

For example, MediaWiki installations are pretty clever in that by default, /w/ is blocked and /wiki/ is encouraged. Because nobody wants technical pages and wiki histories in search results, they only want the current versions of the pages.

Fun tidbit: in the late 1990s, there was a real epidemic of spammers scraping the web pages for email addresses. Some people developed wpoison.cgi, a script whose sole purpose was to generate garbage web pages with bogus email addresses. Real search engines ignored these, thanks to robots.txt. Guess what the spam bots did?

Do the AI bros really want to go there? Are they asking for model collapse?

umbraroze@kbin.social · 9 months ago

If musk pulled the same kinda shit, he’d be mocked for it too

Yes, it is funny how more people are not calling out the Electric Car Jesus for swooping around in private jets and single-handedly undoing any positive effect his customers can have…

umbraroze@kbin.social · 10 months ago

“What am I supposed to do if 30-50 feral wolves run into my military base within 3-5 mins while my small soldiers are training?” - Putin

umbraroze@kbin.social · 10 months ago

In middle of a couple of worldbuilding projects. Haven’t really had much good ideas for the fantasy project lately.

Ah HA! Maybe I’ll do some mild subversion of expectations.
Maybe one of the most famous sites in this world, where people come to visit from far and wide, has a tiny old withered tree.
…I mean, there could be a lot of legitimate logical reasons why this site could me important. Maybe the tree has a really fascinating story behind it.
Heck, there’s probably many such places on our world too! Can think of at least one from the top of my mind.
I should write this down.

Last year I felt really crappy as far as my writing projects go, but in the last few months, if there’s one thing I’ve learned it’s that even smallest ideas can sometimes break the writer’s block. Keep writing them down!

umbraroze@kbin.social · 11 months ago

Oh content from this blog has been popping up in random places. Methinks it’s le epic trole.

umbraroze@kbin.social · 11 months ago

So yeah, Xfce looks the same as it did 10 years ago.

And?

Desktop environment is meant to launch apps and give me windows and maybe have a file manager. Xfce does that. It’s a desktop environment.

Hey, “modern” desktop environment enthusiasts, if you bring Compiz back from the dead, give us luddites a call, will you? Ohhhh you kids should have seen it back in the day. Windows and Mac users saw Compiz in action and were, like, “wat.” You don’t get them to react that way to modern Linux desktops, no. And all that is lost now. Thanks Wayland.

umbraroze@kbin.social · 11 months ago

Yeah, there’s an important distinction. Just because you could use Linux doesn’t mean you can at any particular moment.

I don’t really do music production; I’m more into writing and visual arts and photography. I could do all of those things on Linux and be perfectly productive. But there’s a difference between being productive and being optimal. My current process happens to be based on software that runs on Windows. (Heck, a lot of the software I use already runs on both Windows and Linux, anyways.)

The key here being that you shouldn’t lock yourself too much to just one tool and one approach, and that actually goes both ways.

umbraroze@kbin.social · 11 months ago

Yeah, the thing is, “a monad is a monoid in the category of endofunctors” is kind of a meme among non-Haskell developers. Personally, I think Haskell is a very interesting language. The mathematical jargon, however, is impenetrable, and this particular expression is kind of the poster child. I’mma go look at Erlang if I want my functional language fix without making my head hurt, thank ye very much.

umbraroze@kbin.social · 11 months ago

It’s a thing! Sadly it won’t rewrite Haskell codebases for you, though.

umbraroze@kbin.social · 11 months ago

Rust's static linter is called "Clippy" for a reason.

umbraroze@kbin.social · 1 year ago

Well of course it has a picture of his parents fighting (???) in it. Dude’s a severely messed up right-winger.

umbraroze@kbin.social · 1 year ago

Newspaper Nerds appreciation day! …Maybe. The dude’s political signalling was fucking all over the place.

umbraroze@kbin.social · edit-2 1 year ago

I was a Slashdot user.

People kept hyping Digg as a Slashdot replacement, but trying to submit posts was actually even more futile in practice than trying to submit articles to Slashdot editors. So much bigger hivemind too. Boring unfunny comment section.

When I first joined Reddit, it seemed like it was mostly populated by Slashdot refugees. Just people posting awesome shit. Great riveting discussions, even before anyone actually read the articles. That sort of stuff.

umbraroze@kbin.social · 1 year ago

Funny thing, in ISO 8601 date isn’t separated by colon. The format is “YYYY-MM-DDTHH:MM:SS+hh:mm”. Date is separated by “-”, time is separated by “:”, date and time are separated by “T” (which is the bit that a lot of people miss). Time zone indicator can also be just “Z” for UTC. Many of these can be omitted if dealing with lesser precision (e.g. HH:MM is a valid timestamp, YYYY-MM is a valid datestamp if referring to just a month). (OK so apparently if you really want to split hairs, timestamps are supposed to be THH:MM etc. Now that’s a thing I’ve never seen anyone use.) Separators can also be omitted though that’s apparently not recommended if quick human legibility is of concern. There’s also YYYY-Wxx for week numbers.

umbraroze@kbin.social · 1 year ago

I use Firefox and I’m kind of old school, I don’t usually leave tabs open.

I just have a bookmark folder for temporary bookmarks. All 500 of them.

umbraroze@kbin.social · 1 year ago

I was about to say “this reminds me of the Hot Dog Stand”.

…but someone actually made Hot Dog Stand. Shit.

Look, I’m a Linux nerd, and there are very few things that scare me. Linux Kernel programmers, maybe - you don’t meddle with them unless the hour is truly dire and we form a delegation to seek their aid after a complex debate as the world burns around us and we climb their mountain together. …And the other thing that scares me are some particular brands of Microsoft ultra fans, for thereover lies madness like we have not seen before.

umbraroze@kbin.social · 1 year ago

Oh you fancy PC people and your fancy syscall instruction.

I still don’t know why I could remember jsr $ab1e. I didn’t even write that much assembly.

umbraroze@kbin.social · 1 year ago

Oh how quaint, someone has discovered that Wikipedia can be vandalised. I’ll have to have you know that that came to us a a real surprise in 2001. Things are more manageable these days. People usually notice these things.

umbraroze@kbin.social · 1 year ago

Well, since it seemed to be a way to support the site and get to see new features ahead of time, so yeah, why not? I only decided not to renew my gold access when it became very clear Spez wouldn't ban the hate subs he loved.

As for getting gold otherwise:

I'm an introvert, ok? I mostly only comment if I have something worthwhile to say.

So the only comments I ever got gilded by others were drunken shitpost. And in one instance some random off the cuff post. …I don't get it.

Anyway. Basically, I didn't want to post any Gold Baits™. because that way lies madness.

umbraroze@kbin.social · 1 year ago

Been using a Suunto 5 Peak watch since May and it's been absolutely great. Dunno if 250€ counts as inexpensive, but like we say in Finland, poor people can't afford to buy cheap shit that breaks right away. (I think they have cheaper options?) Suunto watches talk to phone app which at least on Android is pretty great, and the app can talk to other services which can analyse stuff further.

umbraroze

Rust's static linter is called "Clippy" for a reason.

Rust's static linter is called "Clippy" for a reason.