• 0 Posts
  • 37 Comments
Joined 2 years ago
cake
Cake day: August 16th, 2023

help-circle


  • The way it does math is mostly as people have already assumed - approximating instead of doing it “manually”. It’s 2025 and at this point absolutely nobody should be surprised that AI “confidently describe[s] the standard grade-school method, concealing its actual, bizarre reasoning process”.

    As for poetry,

    Here, the model settled on the word “rabbit” as the word to rhyme with while it was processing “grab it.” Then, it appeared to construct the next line with that ending already decided, eventually spitting out the line “His hunger was like a starving rabbit.”

    this is exactly how many poets write rhymed poetry too, it’s not even remotely bizarre.

    Still, it is interesting and good to see some concrete advancement in the study of AI reasoning. Hopefully it will contribute towards reducing the mystification of the whole thing.




  • While I was looking for an alternative to Goodreads, which was widely known to be horrible long before the recent push against these big corpos, I tried BookWyrm (my first contact with the fediverse). I like their approach and wish them success, but what put me off is exactly what you say, the data they use is messy and lacks a lot of info. E.g. one of the things that makes (or at least made) GR satisfying is the visual aspect, you get these cool charts with the book covers, but Open Library doesn’t have covers on so many books. So should I go to Google Images and add covers for 80% of my “library” of like 500 books? Lots of work.

    For comparison, TMDb, which is the source of data for Letterboxd, seems to have about as high-quality if not better data than IMDb that it is an alternative to (idk if it’s FOSS though?).

    I’ve manually added many dozens books to Goodreads, so I’m not against assisting a site I use and enjoy. (Ofc at this point I regret improving that garbage site.) But the lack of data on BookWyrm was just too much even for me.

    So in the end I just switched to the simplest solution: LibreOffice Calc. But we do need an alternative to GR. I came across BookBrainz a few years ago, it was still early in development. Today it might be better, I should give it a shot and maybe add some data there…




  • Tbh that is an overall miniscule number and I’d say it’s not representative (based on my own occasional visits to that shithole through xcancel.com). It’s a question what they even counted as hate speech. Openly calling for the death of some minority probably counted, but did all those “just noticing things” barely-concealed dogwhistles count?

    Wait, maybe I should read the article before replying to you…

    The study measured overt hate speech, the meaning of which was clear to anyone who saw it – speech attacking identity groups or using toxic language. It did not measure covert types of hate speech, such as coded language used by some extremist groups to spread hate but plausibly deny doing so.


  • Today, the court found (among other things), that a few thousand of the summaries that Ross’s AI produced are way too similar to Westlaw’s summaries for it to be a coincidence.

    This is probably just inevitable when your dataset is not large enough. I would be interested in seeing the LLM’s output compared against the original texts; I do remember the early ChatGPT producing some borderline copies of sentences that you could find online (with one or two words changed).



  • It’s not Meta vs us, but opensource vs Google and Openai.

    I never said it’s Meta vs us. It’s Meta vs (in this particular case) the book publishing industry. You can’t reduce the whole situation to open source vs closed source, there’s other “axes” at play here as well.

    They are being sued for copyright infringement when it’s clearly highly transformative

    They downloaded the entire Libgen and more. Going by the traditional explanations of piracy, that’s like stealing several hundred bookstores worth of books all at once, and then claiming it’s alright because your own writing is not plagiarised from any of the books you’ve stolen. (Piracy is not the same as actual stealing of course, but countless people have been being legally bullied and ruined with that logic.) Meta also got its data from Internet Archive; unless they only obtained their materials that are public domain or under a similar license, they’ve obtained a lot of material that IA has been sentenced for allowing unlimited access to back in 2020 (if you’ve followed the Hachette v. Internet Archive case). The brainfucking conclusion of your and Facebook’s case is that using illegal services is perfectly legal as long as you sufficiently transform the results of the illegal activity.

    The rules are fine as is

    Actually they’re not. Copyright law is insanely restrictive, and I don’t think you’ve dealt much with media if you think it’s fine (but I don’t wish to delve into this further as it’s beyond the scope of discussion).

    Meta isn’t the one trying to change them

    Of course they’re not trying to change them, that’s the point, they will get away with breaking them while being perfectly fine with other actors not being able to do so.



  • If the existence of open source LLMs hinges on the benevolence of one of the few most cancerous tech companies in the world, maybe they’re not really worth it?

    This isn’t about “heroes” and “villains”. Facebook has been and has stayed the “villain”, they’ve done something colossally illegal that any mere mortal would be sued to death for (by an another “villainous” instance, the media system that has made piracy a necessity in the first place), and they’re hoping to get away with it simply on technicalities and by having more money for better lawyers. Rules are rules, if you don’t like them maybe Facebook should try to change them (and not just for themselves, but for the rest of us too)?