join-lemmy.org regularly crawls all active Lemmy instances to keep the instance list updated. Additionally it also collects data from all Lemmy communities. The data is now publicly available in the following git repository:
https://github.com/LemmyNet/lemmy-statistics
Check the readme for details about the available data. Interestingly the numbers are quite different from other websites:
| join-lemmy.org | fediverse.observer | fedidb.com | |
|---|---|---|---|
| Monthly Active Users | 42.170 | 36.336 | 50.063 |
| Instances | 512 | 376 | 446 |
Here are some ideas what to do with the data:
- Recreate the Lemmymap, graphically showing the connections or defederations between instances.
- Render graphs, which could be added directly to join-lemmy.org (#532).
- Investigate what is causing the different numbers shown above.
- Run various types of analysis, like this one done by @malsadev.
- Build a tool to help users discover interesting and relevant communities.


Good job! I wonder why some of these are missing from fediverse.observer. There is an add instance page, and entering for example aussie.zone says it already exists, but the search doesnt find it. Lemmy.cafe, fosscad.io are included in the statistics (file
instances/full.json.gzin the git repo), but not currently shown on the website.Thanks to your comment I realized that we are only showing instances with registration application on the official site, as there was concern that others would be overrun by spam bots. I had a look at it now, and instances with captcha are actually fine. Here you can see all that are newly listed on joinlemmy.
The instance list is sorted by monthly active users with slight randomization. As these instances you mention are among the larger ones, it is expected that they show near the top. Is there a better sort method that you would suggest?