I often find myself explaining the same things in real life and online, so I recently started writing technical blog posts.

This one is about why it was a mistake to call 1024 bytes a kilobyte. It’s about a 20min read so thank you very much in advance if you find the time to read it.

Feedback is very much welcome. Thank you.

  • gens@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The mistake is thinking that a 1000 byte file takes up a 1000 bytes on any storage medium. The mistake is thinking that it even matters if a kB means 1000 or 1024 bytes. It only matters for some programmers, and to those 1024 is the number that matters.

    Disregarding reality in favor of pedantics is the real mistake.

  • unreasonabro@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    i mean, you can’t get to 1000 by doubling twos, so, no?

    Reality doesn’t care what you prefer my dude

  • smo@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    This has been my pet rant for a long time, but I usually explain it … almost exactly the other way around to you.

    You can essentially start off with nothing using binary prefixes. IBM’s first magnetic harddrive (the IBM 350 - you’ve probably seen it in the famous “forklifting it into a plane” photo) stored 5 million characters. Not 5*1024*1024 characters, 5,000,000 characters. This isn’t some consumer-era marketing trick - this is 1956, when companies were paying half a million dollars a year (2023-inflated-adjusted) to lease a computer. I keep getting told this is some modern trick - doesn’t it blow your mind to realise hdd manufacturers have been using base10 for nearly 70 years? Line-speed was always a lie base 10, where 1200 baud laughs at your 2^n fetish (and for that matter, baud comes from telegraphs, and was defined before computers existed), 100Mbit ethernet runs on a 25MHz clock, and speaking of clocks - kHz, MHz, MT/s, GT/s etc are always specified in base 10. For some reason no-one asks how we got 3GHz in between 2 & 4GHz CPUs.

    As you say, memory is the trouble-maker. RAM has two interesting properties for this discussion. One is that it heavily favours binary-prefixed “round numbers”, traditionally because no-one wanted RAM with un-used addresses because it made address decoding nightmarish (tl;dr; when 8k of RAM was usually 8x1k chips, you’d use the first 3 bits of the address to select the chip, and the other 10 bits as the address on the chip - if chips didn’t use their entire address space you’d need to actually calculate the address map, and this calculation would have to run multiples of times faster than the cpu itself) . The second, is that RAM was the first place non-CSy types saw numbers big enough for k to start becoming useful. So for the entire generation that started on microcomputers rather than big iron, memory-flavoured-k were the first k they ever tasted.

    I mean, hands up who had a computer with 8-64k of RAM and a cassette deck. You didn’t measure the size of your stored program in kB, but in seconds of tape.

    This shortcut than leaked into filesystems purely as an implementation detail - reading disk blocks into memory is much easier if you’re putting square pegs into square holes. So disk sectors are specified in binary sizes to enable them to fit efficiently into memory regions/pages. For example, CP/M has a 128-byte disk buffer between 0x080 and 0x100 - and its filesystem uses 128-byte sectors. Not a coincidence.

    This is where we start getting into stuff like floppy disk sizes being utter madness. 360k & 720k were 720 and 1440 512-byte sectors. When they doubled up again, we doubled 2800 512-byte sectors gave us 1440k - and because nothing is ever allowed to make sense (or because 1.40625M looks stupid), we used base10 to call this 1.44M.

    So it’s never been that computers used 1024-shaped-k’s. It should be a simple story of “everything uses 1,000s, except memory because reasons”. But once we started dividing base10-flavoured storage devices into base2-flavoured sectors, we lost any hope of this ever looking logical.

  • billwashere@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    Well it’s because computer science has been around for 60+ years and computers are binary machines. It was natural for everything to be base 2. The most infuriating part is why drive manufacturers arbitrarily started calling 1000 bytes a kilobyte, 1000 kilobytes a megabyte, and 1000 megabytes a gigabyte, and a 1000 gigabytes a terabyte when until then a 1 TB was 1099511627776 bytes. They did this simply because it made their drives appear 10% bigger. So good ol’ shrinkflation. You could make drives 10% smaller and sell them for the same price.

    • wischi@programming.devOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Pretty obvious that you didn’t read the article. If you find the time I’d like to encourage you to read it. I hope it clears up some misconceptions and make things clearer why even in those 60+ years it was always intellectually dishonest to call 1024 byte a kilobyte.

      You should at least read “(Un)lucky coincidence”

      • billwashere@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Ok so I did read the article. For one I can’t take an article seriously that is using memes. Thing the second yes drive manufacturers are at fault because I’ve been in IT a very very long time and I remember when HD manufacturers actually changed. And the reason was greed (shrinkflation). I mean why change, why inject confusion where there wasn’t any before. Find the simplest least complex reason and that is likely true (Occam’s razor). Or follow the money usually works too.

        It was never intellectually dishonest to call it a kilobyte, it was convenient and was close enough. It’s what I would have done and it was obviously accepted by lots of really smart people back then so it stuck. If there was ever any confusion it’s by people who created the confusion by creating the alternative (see above).

        If you wanna be upset you should be upset at the gibi, kibi, tebi nonsense that we have to deal with now because of said confusion (see above). I can tell you for a fact that no one in my professional IT career of over 30 years has ever used any of the **bi words.

        You can be upset if you want but it is never really a problem for folks like me.

        Hopefully this helps…

        • CallumWells@lemmy.ml
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 year ago

          I just think that kilobyte should have been 1000 (in binary, so 16 in decimal) bytes and so on. Just keep everything relating to the binary storage in binary. That couldn’t ever become confusing, right?

          • rottingleaf@lemmy.zip
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            1 year ago

            Because your byte is 10 decimal bits, right? EDIT: Bit is actually an abbreviation, BIT, initially, so it would be what, DIT?.. Dits?..

      • λλλ@programming.dev
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        edit-2
        1 year ago

        kilobit = 1000 bits. Kilobyte = 1000 bytes.

        How is anything about that intellectually dishonest??

        The only ones being dishonest are the drive manufacturers, like the person above said. They sell storage drives by advertising them in the byte quantity but they’re actually in the bit quantity.

    • wischi@programming.devOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      If a hard drive has exactly 8’269’642’989’568 bytes what’s the benefit of using binary prefixes instead of decimal prefixes?

      There is a reason for memory like caches, buffer sizes and RAM. But we don’t count printer paper with binary prefixes because the printer communication uses binary.

      There is no(!) reason to label hard drive sizes with binary prefixes.

      • billwashere@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        So here’s the thing. I don’t necessarily disagree with you. And if this had done from the start it would never had been a problem. But it wasn’t and THAT is what caused the confusion. You put a lot of thought and research into your post and I can very much respect that. It’s something you feel strongly about and you took the time to write about your beef with this. IEC changed the nomenclature in the late 90s. But the REASON they changed it was to avoid the confusion caused by the drive manufacturers (I bet you can guess who was in the committee that proposed the change).

        But I can tell you as a professional IT person we never really expect any drive (solid state or otherwise) to be any specific size. RAID, file system overhead, block size fragmentation, etc all take a cut. It’s basically just bistromathics (that’s a Hitchhiker’s reference) and the overall size of any storage system is only vaguely related to actual drive size.

        So I just want to basically apologize for being so flippant before. It’s important enough to you that you took the time to write this. It’s just that I’m getting rather cynical as I get older and just expect the enshittification of every to continue ad infinitum on everything digital.

  • Bazoogle@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    You asked for feedback, so here is my feedback:

    The article is okay. I read most of it, but not all of it, because it seemed overly worded for the sentiment. It could have been condensed quite a bit. I would argue the focus should be more on the fact that there should be a standard in technical documentation, OS’s, specification sheets, etc. That’s the part that impacts most people, and the reason they should care. But that kind of gets lost in all the text.

    Your replies here come off as pretty condescending. You should anticipate most people not reading the article before commenting. Just pay them no attention, or reiterate what you already stated in the article. You shouldn’t just say “did you read the article” and then “it’s in this section of the article”. Just like how people comment on youtube before watching the video, people will comment on the topic without reading the article.

    Maybe they didn’t realize it was an article, maybe they knew it was an article and chose not to read it, or maybe they read it and disagree with some of the things you said. It’s okay for people to disagree with something you said, even if you sincerely believe something you said isn’t a matter of opinion (even though it probably is). You can agree to disagree and move on with your life.

    • wischi@programming.devOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Thank you for taking the time to read it and your feedback.

      Your replies here come off as pretty condescending.

      That was definitely never my intention but a lot of people here said something similar. I should probably work on my English (I’m not a native speaker) to phrase things more carefully.

      You shouldn’t just say “did you read the article” and then “it’s in this section of the article”

      It never crossed my mind this could be interpreted in a negative way. I tried to gauge if someone read it and still disagreed or if someone didn’t read it and disagrees, because those situations are two different things, at least for me. The hint with the sections was also meant as a pointer because I know that most people won’t read the entire thing but maybe have 5min on their hand to read the relevant section.

      • Elias Griffin@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        1 year ago

        I feel bad for you OP, I get this a lot and I’m totally gonna go there because I feel your pain and your article was fantastic! I read almost every word ;p

        This phenomena stems from an aversion to high-confidence people who make highly logical arguments from low self-confidence people who basically make themselves feel unworthy/inadequate when justly critiqued/busted. It makes sense for them to feel that way too, I empathize. It’s hard to overcome the vapid rewarding and inflation in school. They should feel cheated and insolent at this whole situation.

        I’ll be honest in front of the internet; people (in majority mind you, say 70-80% of Americans, I’m American) do not read every word of the article with full attention because of ever present and prevelant distractions, attention deficit, and motivation. They skip sentences or even paragraphs of things they are expecting they already know, apply bias before the conclusion, do not suspend their own perspective to understand yours for only a brief time, and come from a skeptical position no matter if they agreed with it or not!

        In general, people also want to feel they have some valid perspective “truth” (as it’s all relative to them…) of their own to add and they want to be validated and acknowledged for it, as in school.

        Guess what though, Corporations, Schools, Market Analysis, Novelists, PR people, Video Game Makers, Communications Managers and Small and Medium Business already know this! They even take a much more, ehh, progressive? approach about it, let’s say. That is, to really not let them speak/feedback, at all. Nearly all comment sections are gone from websites, comment boxes are gone from retail shops, customer service is a bot, technical writers make videos now to go over what they just wrote, Newspapers write for 4th graders, etc., etc.

        Nothing you said is even remotely condescending and nothing you said was out of order. Don’t defend yourself in these situations because it’s just encouragement for them to do it again. Don’t take it personally yourself, that is just the state of things.

        Improvise, Adapt, Re-engineer, Re-deploy, Overcome, repeat until done.