• Vivendi@lemmy.zip
      link
      fedilink
      arrow-up
      6
      arrow-down
      10
      ·
      7 months ago

      Stochastic plagiarism machine AKA LLM models won’t replace anyone

      They already have a 52% percent failure rate and this is with top of the line training data

      • Nevoic@lemm.ee
        link
        fedilink
        arrow-up
        7
        arrow-down
        5
        ·
        edit-2
        7 months ago

        In my line of work (programming) they absolutely do not have a 52% failure rate by any reasonable definition of the word “failure”. More than 9/10 times they’ll produce code at at least a junior level. It won’t be the best code, sometimes it’ll have trivial mistakes in it, but junior developers do the same thing.

        The main issue is confidence, it’s essentially like having a junior developer that is way overconfident for 1/1000th of the cost. This is extremely manageable, and June 2024 is not the end all be all of LLMs. Even if LLMs only got worse, and this is the literal peak, it will still reshape entire industries. Junior developers cannot find a job, and with the massive reduction in junior devs we’ll see a massive reaction in senior devs down the line.

        In the short term the same quality work will be done with far, far fewer programmers required. In 10-20 years time if we get literally no progress in the field of LLMs or other model architectures then yeah it’s going to be fucked. If there is advancement to the degree of replacing senior developers, then humans won’t be required anyway, and we’re still fucked (assuming we still live in a capitalist society). In a proper society less work would actually be a positive for humanity, but under capitalism less work is an existential threat to our existence.

        • Dangerhart@lemm.ee
          link
          fedilink
          arrow-up
          7
          ·
          7 months ago

          This is the exact opposite of my experience. We’ve been using codium in my org and 9/10 times it’s garbage and they will not allow anything that is not on prem. I’m pretty consistently getting recommendations for methods that don’t exist, invalid class names, things that look like the wrong language, etc. To get the recommendations I have to cancel out of auto complete, which is often times much better. It seems like it can make up for someone who doesn’t have good workflows, shortcuts and a premium ide, but otherwise it’s been a waste of time and money.

        • Vivendi@lemmy.zip
          link
          fedilink
          arrow-up
          2
          arrow-down
          4
          ·
          7 months ago

          There is literally a university study that proves over 50% failure in programming tasks. It’s not a rational model, deal with it, get off the Kool aid, and move on.

          • Nevoic@lemm.ee
            link
            fedilink
            arrow-up
            6
            arrow-down
            3
            ·
            7 months ago

            If you didn’t have an agenda/preconceived idea you wanted proven, you’d understand that a single study has never been used by any credible scientist to say anything is proven, ever.

            Only people who don’t understand how data works will say a single study from a single university proves anything, let alone anything about a model trained on billions of parameters across a field as broad as “programming”.

            I could feed GPT “programming” tasks that I know it would fail on 100% of the time. I also could feed it “programming” tasks I know it would succeed on 100% of the time. If you think LLMs have nothing to offer programmers, you have no idea how to use them. I’ve been successfully using GPT4T for months now, and it’s been very good. It’s better in static environments where it can be fed compiler errors to fix itself continually (if you ever look at more than a headline about GPT performance you’d know there’s a substantial difference between zero-shot and 3-shot performance).

            Bugs exist, but code heavily written by LLMs has not been proven to be any more or less buggy than code heavily written by junior devs. Our internal metrics have them within any reasonable margin of error (senior+GPT recently beating out senior+junior, but it’s been flipping back and forth), and senior+GPT tickets get done much faster. The downside is GPT doesn’t become a senior, where a junior does with years of training, though 2 years ago LLMs were at a 5th grade coding level on average, and going from 5th grade to surpassing college level and matching junior output is a massive feat, even if some luddites like yourself refuse to accept it.