Anatomy of Gmane v2

Many people have been asking what technology / hardware is behind Gmane these days so I thought I’d put pen to paper (so to speak) and explain what’s going on under the hood.

Mid August we received a disk from Lars with the Gmane spool on it. We had already decided to go with ElasticSearch for the document store, it gives us great scalability and as we rebuild the site it will allow us to have a fast search engine.

We’ve currently setup:

  • 4  x ElasticSearch data servers (these are off-the-shelf Delimiter dedicated servers) each with Dual L5630, 48GB RAM, 2 x 2TB disk.
  • 2 x ElasticSearch routers (Delimiter Cloud) each with 4 Core KVM VM, 16GB RAM, 50GB NVMe accelerated storage (Ceph).
  • 2 x Nginx webservers (Delimiter Cloud) each with 4 Core KVM VM, 32GB RAM, 100GB NVMe accelerated storage (Ceph).
  • 2 x Redis servers (Delimiter Cloud) each with 4 Core KVM VM, 32GB RAM, 100GB NVMe accelerated storage (Ceph).
  • 10TB ObjSpace (S3 compatible object storage) which handles the ElasticSearch backups.

On the webservers we have a mix of Python and PHP handling the various lookup functions, Redis is caching the hot data to alleviate some of the pressure during busy periods on Elasticsearch and then the ElasticSearch routers handling the queries into ElasticSearch.

We’re working on adding the NNRP functionality into this and Martin is coding a NNRP server that will use ElasticSearch as a backend. It works but not ready for the prime-time yet. For now the NNRP remains running off INN.

We’re working between two priorities at the moment: a new NNRP frontend and new mailer front/backend. Once we have all the functionality restored then we can start looking at the web interface and fixing up some of the rushed scripting that was done to get the site back online.

We’d love to hear your feedback, what needs sorting, what would you like to see.

~ Mark

18 thoughts on “Anatomy of Gmane v2”

  1. That is an impressive amount of hardware you’re throwing at the project!

    When you write NNRP, do you mean NNTP?

    When do you start routing new emails into the new system? And start adding new mailing lists?

    1. NNRP as in the reader protocol variant. We’re not looking to build the transit side of it because there are products like Diablo / Cyclone that do that so much better.

      Once we’ve completed this and completed the mail handler then adding new mailing lists will be easy.

      ~ Mark

  2. Hello

    I am grateful you took over gmane. This is an impressive amount of engagement (humans and money). I thank your directors for their help and you, Mark & Martin, for you willpower in this process.

    I hope the road will be long.

    Best wishes
    Mat

  3. The original gmane is just based on scripts and a spool on disk. I am not sure why the ElasticSearch is so important in rebuilding gmane. Since all pages are hosted on public internet, let the search engine does the job. This could simplify the architecture and reduce lots of operation cost. (maybe a 4G VM is enough?)

    1. A disk based spool with one message per file, that doesn’t scale well. Last week we did 8,625,529 requests, doing that on a raw file system is not fun. Elasticsearch is not just about searching, its about having a scaleable, fault-tolerant, distributed document store.

      ~ Mark

    1. rss and blog will probably go live this week. We’ve got a few kinks to fix in there.

      Right now:

      – article
      – mid
      – permalink
      – dir
      – thread

      Once we complete the NNRP functionality then that will move from news.gmane.org to nnrp.gmane.org which will allow us to point news.gmane.org back to webservices (Yes I know we can PBR but it just adds another complexity).

      Remaining:

      – news (as above)
      – blog
      – rss
      – search

      ~ Mark

  4. Thanks for taking over stewardship of GMANE.

    A suggestion: This blog’s theme makes it practically unreadable. Look at what a difference disabling the stylesheet makes:

    Before: http://i.imgur.com/xXLyKgD.png
    After: http://i.imgur.com/DhBXUNA.png

    I’ll take the second one every time. The first one is like some kind of artsy poster design, except it’s supposed to be an article for reading. The second one is like…a web site that I can actually read on my computer.

      1. Thanks. It doesn’t seem to be fixed, but I assume it’s probably on your TODO list. It might be cool to have a public issue tracker so that users can avoid annoying you with the same bugs you’re already working on.

Leave a Reply to Mark Cancel reply

Your email address will not be published. Required fields are marked *