A long form response to the concerns and comments and general principles many people had in the post about authors suing companies creating LLMs.

  • Umbrias@beehaw.org
    link
    fedilink
    English
    arrow-up
    13
    ·
    1 year ago

    This isn’t how it works at all. I can, and should, and do, read and consume all sorts of media with the intention of stealing from it for my own works. If you ask for writing advice, this is actually probably one of the first things you’ll hear: read how other people do it.

    So this does not work as an argument, “the intent of the reading” because if so humans could never generate any new media either.

    • Peanut@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      1 year ago

      This is the thing I kept shouting when diffusion models took off. People are effectively saying “make it illegal for neural nets to learn from anything creative or productive anywhere in any way”

      Because despite the differences in architecture, I think it is parallel.

      If the intent and purpose of the tool was to make copies of the work in a way we would consider theft of done by a human, I would understand.

      The same way there isn’t any legal protection on neural nets learning from personal and abstract information to manipulate and predict or control the public, the intended function of the tool should make it illegal.

      But people are too self focused and ignorant to riot enmass about that one.

      The dialogue should also be in creating a safety net as more and more people lose value in the face of new technology.

      But fuck any of that, what if an a.i. learned from a painting I made ten year ago, like every other artists who may have learned from it? Unforgivable.

      I don’t believe it’s reproducing my art, even if asked to do so, and I don’t think I’m entitled to anything.

      Also copyright has been fucked for decades. It hasn’t served the people since long before the Mickey mouse protection act.

      • flyingowlfox@beehaw.org
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        1 year ago

        Regardless of intent, let’s not pretend that the scale at which LLMs “process” information to generate new content is comparable to humans. That is obviously what was intended for copyright laws (so far).

        • Sas [she/her]@beehaw.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          We don’t need to pretend though. People with speed reading skills are faster than most humans as well and could read a lot more books.

          It’s very probable that you read at least one writers whole library, even if it’s as many stories as Terry Pratchett got published which will always be true for human written books as writing them takes longer than reading.

          Obviously the acquirement of those stories has to be made in a legal way and no actual passages should be stored in the model but the amount of data processed should have no say on if it can be used.

          And as written by others here. Making copyright law more strict puts big corps at an advantage because they have big legal teams and money to just pay the copyright fee while your regular user would not be able to.

        • Peanut@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          It’s comparing a bird to a plane, but I still think the process constitutes “learning,” which may sound anthropomorphic to some, but I don’t think we have a more accurate synonym. I think the plane is flying even if the wings aren’t flapping and the plane doesn’t do anything else birds do. I think LLMs, while different, reflect the subconscious aspect of human speech, and reflect the concept of learning from the data more than “copying” the data. It’s not copying and selling content unless you count being prompted into repeating something it was trained on heavily enough for accurate verbatim reconstruction. To me, that’s no more worrying than Disney being able to buy writers that have memorized some of their favorite material, and can reconstruct it on demand. If you ask your intern to reproduce something verbatim with the intent of selling it. I still don’t think the training or “learning” were the issues.

          To accurately address the differences, we probably need new language and ideals for the specific situations that arise in the building of neural nets, but I still consider much of the backlash completely removed from any understanding of what has been done with the “copywrited material.”

          I tend to view it thinking about naturally training these machines in the future with real world content. Should a neural net built to act in the real world be sued if an image of a coca-cola can was in the training data somewhere, and some of the machines end up being used to make cans for a competitor?

          How many layers of abstraction, or how much mixture with other training data do you need to not consider that bit of information to be comparable to the crime of someone intentionally and directly creating an identical logo and product to sell?

          Copyright laws already need an overhaul prior to a.i.

          It’s no coincidence that warner and Disney are so giant right now, and own so much of other people’s ideas. That they have the money to control what ideas get funded or not. How long has Disney been dead? More than half a century. So why does his business own the rights of so many artists who came after?

          I don’t think the copywrite system is ready to handle the complexity of artificial minds at any stage, whether it is the pareidolic aspect of retrieving visual concepts of images in diffusion models, or the complex abilities that arise from current scale LLMs? which again, I believe are able to resemble the subconscious aspect of word predictions that exists in our minds

          We can’t even get people to confidently legislate a simple ethical issue like letting people have consensual relationships with the gender of their own choice. I don’t have hope we can accurately adjust at each stage of development of a technology so complex we don’t even have the language to properly describe the functioning. I just believe that limiting our future and important technology for such grotesquely misdirected egoism would be far more harmful than good

          The greater focus should be in guaranteeing that technological or creative developments benefit the common people, not just the rich. This should have been the focus for the past half century. People refuse this conceptually because they’ve been convinced that any economic re-balancing is evil when it benefits the poor. Those with the ability to change anything are only incentivized to help themselves.

          But everyone is just mad at the machine because “what if it learned from my property?”

          I think the article even promotes Adobe as the ethical alternative. Congrats, you’ve limited the environment so that only the existing owners of everything can advance. I don’t want to pay Adobe a subscription for the rest of my life for the right to create on par with more wealthy individuals. How is this helping the world or creation of art?