I was looking at something like this (a hierarchical index of existing lemmy communities) a while back, but discontinued the project when I realized just how slowly I was progressing and that I didn’t want to go through the hassle of setting up to crowdsource it. Lessons learned and questions asked:
Organizing hierarchies is hard, and they do have to be organized from the top down if you want to come out with something good—the number of categories at the topmost level has to be kept small for the hierarchy to remain usable. (The important, widely-used groups on Usenet always fit under either the Big-8 hierarchies or alt—anything else was small, regional/local/server-specific, or non-English. Usenet’s hierarchy isn’t really appropriate for Lemmy, I don’t think—it’s kind of suboptimal even for Usenet, with the alt hierarchy containing a bunch of stuff that should be elsewhere.) I think I ended up with about a dozen top-level categories during my experiment.
Categorization can’t be usefully automated (at least, not at present). All categories have to be assigned by a human if we want them to make sense. At most, uncategorized communities can be auto-dumped into an “unsorted” top-level category for examination.
Many communities don’t fit neatly under one category. Usenet’s hierarchy was strict-tree, so everything had to be in one, and only one, place. I decided to allow three places in my experiment (plus a language marker), so that, say, a community for an amateur softball league in the greater Toronto area could be placed under Lifestyle and culture > Fitness > Sports Leagues > Softball, Sports > Amateur, kids, and local > Softball, and Regional > North America > Canada > Ontario > Toronto. So that’s a thing that any categorization initiative has to think about.
Is it worth noting issues other than “NSFW” that may exist with a given community, since a human has to be looking at these anyway? This could be anything from laissez-faire free-speech-absolutism moderation to the presence of photos showing nipples in a nonsexual context (a breast-feeding community, for example).
Should communities on servers that almost no one federates with (lemmingrad) be included? Communities with illegal material (piracy links)? Communities with really illegal material (CSAM)? Where does the line get drawn?
This whole thing is a good-sized moving target, so a crowdsourcing effort is necessary.
Not all communities present enough information to make them easily categorizable by someone who is not a member.
Should a line be drawn regarding category abuse? The Usenet alt hierarchy contains a lot of what I would characterize as joking juvenile vandalism (alt.newgroup.for.fun.fun.fun is a SFW example of the kind of thing I’m talking about—many of the others have NSFW-ish names). Is this even worth guarding against?
I was looking at something like this (a hierarchical index of existing lemmy communities) a while back, but discontinued the project when I realized just how slowly I was progressing and that I didn’t want to go through the hassle of setting up to crowdsource it. Lessons learned and questions asked:
Organizing hierarchies is hard, and they do have to be organized from the top down if you want to come out with something good—the number of categories at the topmost level has to be kept small for the hierarchy to remain usable. (The important, widely-used groups on Usenet always fit under either the Big-8 hierarchies or
alt—anything else was small, regional/local/server-specific, or non-English. Usenet’s hierarchy isn’t really appropriate for Lemmy, I don’t think—it’s kind of suboptimal even for Usenet, with thealthierarchy containing a bunch of stuff that should be elsewhere.) I think I ended up with about a dozen top-level categories during my experiment.Categorization can’t be usefully automated (at least, not at present). All categories have to be assigned by a human if we want them to make sense. At most, uncategorized communities can be auto-dumped into an “unsorted” top-level category for examination.
Many communities don’t fit neatly under one category. Usenet’s hierarchy was strict-tree, so everything had to be in one, and only one, place. I decided to allow three places in my experiment (plus a language marker), so that, say, a community for an amateur softball league in the greater Toronto area could be placed under
Lifestyle and culture > Fitness > Sports Leagues > Softball,Sports > Amateur, kids, and local > Softball, andRegional > North America > Canada > Ontario > Toronto. So that’s a thing that any categorization initiative has to think about.Is it worth noting issues other than “NSFW” that may exist with a given community, since a human has to be looking at these anyway? This could be anything from laissez-faire free-speech-absolutism moderation to the presence of photos showing nipples in a nonsexual context (a breast-feeding community, for example).
Should communities on servers that almost no one federates with (lemmingrad) be included? Communities with illegal material (piracy links)? Communities with really illegal material (CSAM)? Where does the line get drawn?
This whole thing is a good-sized moving target, so a crowdsourcing effort is necessary.
Not all communities present enough information to make them easily categorizable by someone who is not a member.
Should a line be drawn regarding category abuse? The Usenet
althierarchy contains a lot of what I would characterize as joking juvenile vandalism (alt.newgroup.for.fun.fun.funis a SFW example of the kind of thing I’m talking about—many of the others have NSFW-ish names). Is this even worth guarding against?