While LLMs have been used for… a lot, it seems like this use might be one where it's not only reliable but it appears to outperform existing methods of image compression. Being able to cram more data into less space tends to lead to interesting developments, so I will be keeping my eye on this.

What do you guys think? Seem like it's deserving of less hype than I'm giving it? What kind of security holes do you think this could open?

  • Heresy_generator@kbin.social
    link
    fedilink
    arrow-up
    9
    ·
    1 year ago

    It's neat from a research and proof-of-concept perspective but practically speaking I'd like to see the CPU cycles required for the LLM compression compared to PNG or FLAC compression. We've always known we can increase compression by throwing more computing power at the problem but we settle on a happy medium at the intersection of "good enough" for compression and performance.

    • aard@kyu.de
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      While that's generally true we might want to look into utilizing available cores more - but I guess with LLM it might be harder to scale that while keeping file size the same.

      A lot of the current compression programs only use one thread properly - which was still perfectly fine a few years ago, but thanks to AMD cores have become cheap. Few years ago most notebooks would come with two cores, and either two or four threads, with higher end models with 4c/4t. Something bigger pretty much didn't exist for notebooks, and was expensive for desktops.

      Nowadays you can get 16 cores in a reasonably priced notebook, and if it benefits your work you won't think much about spending a bit extra for a 32 or 64 core CPU in your workstation - where just 6 years ago you'd have had no option for such a notebook, and paid the equivalent of a not too shabby car for the workstation.

  • skip0110@lemm.ee
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 year ago

    I think this model has billions of weights. So I believe that means the model itself is quite large. Since the receiver needs to already have this model, I’d suggest that rather than compressing the data, we have instead pre encoded it, embedded it in the model weights, and thus the “compression” is just basically passing a primary key that points to the data to be compressed in the model.

    It’s like, if you already have a copy of a book, I can “compress” any text in that book into 2 numbers: a page offset, and a word offset on that page. But that’s cheating because, at some point, we had to transfer to book too!

    • puttputt@beehaw.org
      link
      fedilink
      arrow-up
      5
      ·
      1 year ago

      Yeah, it's like saying I can "compress" a png of the Mona Lisa to just the string "Mona Lisa" because I have a database of art.

    • Coffee Junky ❤️@beehaw.org
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      I feel it's somewhere in the middle. Like your book example only works if you already have the book. If this is a model that is a few gigabytes of data, but it works for every movie or audio file it can still be useable. In that case it's not that you have to send the book first, but you do need to have the same dictionary.

  • EdgeOfToday@lemm.ee
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    With a neural network, you wouldn't be able to mathematically prove that the signal is perfectly recovered 100% of the time for all possible inputs. That is the case with PNG and FLAC. If you're just listening to music and need a good compression ratio, then sure, it won't be a big deal if a couple of bits are wrong. But that's also why we have lossy compression. If the goal is to make signal degradation imperceptible to a human, then you could get a much better compression ratio using neural networks. If it's truly critical that the signal isn't corrupted, it would probably be better to just use the original method.

    • astraeus@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      Seems like another “hey, what if we used LLMs for this” scenarios. It might be more effective, but exactly how many more resources are being used to make it do the same work as current compression algorithms? Effective doesn’t mean efficient and I think for lossless applications efficient is truly more important.

      • Butterbee (She/Her)@beehaw.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        A LOT. You can barely run 13b parameter models on a 24gb gfx card and outputs are like a page or so of text. Translate that over to audio and it would have to be broken down into discrete chunks that the model could use as "prompts" to output a section of audio that fit into the models available output. It might compress better, but it would be exceedingly painful and slow to extract even on AI focused cards. And it would use OODLES of watts to get just a little bit better than flac.

    • ZickZack@kbin.social
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      That's not what lossless data compression schemes do:
      In lossless compression the general idea is to create a codebook of commonly occuring patterns and use those as shorthand.
      For example, one of the simplest and now ancient algorithms LZW does the following:

      • Initialize the dictionary to contain all strings of length one.
      • Initialize the dictionary to contain all strings of length one.
      • Emit the dictionary index for W to output and remove W from the input.
      • Add W followed by the next symbol in the input to the dictionary.
      • repeat
        Basically, instead of rewriting long sequences, it just writes down the index into an existing dictionary of already seen sequences.

      However, once this is done, you now need to find an encoding that takes your characterset (the original characters+the new dictionary references) and turns it into bits.
      It turns out that we can do this optimally: Using an algorithm called Arithmetic coding we can align the length of a bitstring to the amount of information it contains.
      "Information" here meaning the statistical concept of information, which depends on the inverse likelihood a certain character is observed.
      Logically this makes sense:
      Let's say you have a system that measures earthquakes. As one would expect, most of the time, let's say 99% of the time, you will see "no earthquake", while in 1% of the cases you will observe "earthquake".
      Since "no earthquake" is a lot more common, the information gain is relatively small (if I told you "the system said no earthquake", you could have guessed that with 99% confidence: not very surprising).
      However if I tell you "there is an earthquake" this is much more important and therefore is worth more information.

      From information theory (a branch of mathematics), we know that if we want to maximize the efficiency of our codec, we have to match the length of every character to its information content. Arithmetic coding now gives us a general way of doing this.

      However, we can do even better:
      Instead of just considering individual characters, we can also add in character pairs!
      Of course, it doesn't make sense to add in every possible character pair, but for some of them it makes a ton of sense:
      For example, if we want to compress english text, we could give a separate codebook entry to the entire sequence "the" and save a ton of bits!
      To do this for pairs of characters in the english alphabet, we have to consider 26*26=676 combinations.
      We can still do that: just scan the text 600 times.
      With 3 character combinations it becomes a lot harder 26*26*26=17576 combinations.
      But with 4 characters its impossible: you already have half a million combinations!
      In reality, this is even worse, since you have way more than 26 characters: you have things like ", . ? ! and your codebook ids which blow up the size even more!

      So, how are we supposed to figure out which character pairs to combine and how many bits we should give them?
      We can try to predict it!
      This technique, called [PPM](Prediction by partial matching) is already very old (~1980s), but still used in many compression algorithms.
      The important trick is now that with deep learning, we can train even more efficient estimators, without loosing the lossless property:
      Remember, we only predict what things we want to combine, and how many bits we want to assign to them!
      The worst-case scenario is that your compression gets worse because the model predicts nonsensical character-combinations to store, but that never changes the actual information you store, just how close you can get to the optimal compression.

      The state-of-the-art in text compression already uses this for a long time (see Hutter Prize) it's just now getting to a stage where systems become fast and accurate enough to also make the compression useful for other domains/general purpose compression.

      • firenzeleon@beehaw.org
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        That’s not what lossless data compression schemes do:

        Yes it is.

        You went into a lot of detail about how they do it, but it's still what they do.

        • BFrizzleFoShizzle@lemmy.nz
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          I think the main point they're disagreeing with is this:

          you wouldn’t be able to mathematically prove that the signal is perfectly recovered 100% of the time for all possible inputs

          They explain why you don't need 100% accuracy - most compression codecs would only use the network for a prediction, which doesn't actually have to be correct. It just has to be "more likely to be correct" than existing algorithms.

          If you want to read up more on the context of these prediction functions, the general class of compression algorithms you'd use for this are called prediction wavelet codecs. FLAC and arguably PNG are both prediction wavelet codecs.