The Hotness
Games|People|Company
Eclipse
Gunship: First Strike!
Mage Knight: Board Game
Midnight Men
Agricola: Die Bauern und das liebe Vieh
Hawaii
Star Wars: Battle of Hoth
Wiz-War
Ora et Labora
Rex: Final Days of an Empire
Snowdonia
Barbarian Prince
The Lord of the Rings: The Card Game
Twilight Struggle
War of the Ring
Agricola
7 Wonders
A Game of Thrones: The Board Game (second edition)
Dominion
7 Wonders: Cities
Kingdoms
A Few Acres of Snow
Risk Legacy
Arkham Horror
Through the Ages: A Story of Civilization
Thunderstone Advance: Towers of Ruin
1812: The Invasion of Canada
Dixit: Journey
Elder Sign
D-Day Dice
The Castles of Burgundy
Le Havre
Kingdom Builder
Sid Meier's Civilization: The Board Game
Race for the Galaxy
Cosmic Encounter
Dominant Species
Dungeon Petz
Battlestar Galactica
Power Grid
Mansions of Madness
Last Will
Twilight Imperium (third edition)
Nexus Ops
Agents of SMERSH
Puerto Rico
Star Trek: Fleet Captains
Kairo
Core Worlds
Sherlock Holmes Consulting Detective
Recommend
17 
 Thumb up
 Thumb up
66 Posts
1 , 2 , 3  Next »   | 

BoardGameGeek» Forums » BoardGameGeek Related » BGG General

Subject: History (and Future) of BoardGameGeek Architecture rss

Your Tags: Add tags
Popular Tags: bgg [+] news [+] [View All]
Scott Alden
United States
Dallas
Texas
admin
Aldie's Full of Love!
Avatar
mbmbmbmbmb
BoardGameGeek started off as a purely dynamic site. There were no static files, and every time you looked at a page it was dynamically generated and displayed 'on the fly'. We queried the database for what you wanted, built the page in memory, and displayed the results to the users. We didn't use any templates - all the code was embedded with the PHP code. The database and the web server were co-located on one machine and I ran it in my bedroom off a DSL business line. This machine was affectionately called "gamegeek." Occasionally my maid would turn it off while cleaning - argh!

This setup is fine if you don't get much traffic. I would guesstimate that it would work for about 500 visitors/day.

Our first step to increase performance was to remove the images from the web server and host them on a separate machine (files). We also moved the servers to a co-located ISP.

This really doesn't help things from a dynamic point of view, but it offloaded some work from the main web server so things did speed up a bit. As traffic increases, so does the load on the database. We were still doing everything dynamically.

Next step we offloaded the database to a separate machine. This was a huge performance increase - probably the biggest we ever got. The database ran on a server with 2GB RAM for many years. Once the database started getting overloaded, we had to look for other improvements.

I added in the caching subsystem to offload work from the database. Instead of generating everything dynamically on every page request, we saved off a version of the page on the local web server's disk. If you were looking at page "X", the system would check for that page locally first. If it wasn't there then we would dynamically generate it, and save it off so that the next person to look at it would find it on the disk.

This is a huge savings in processing power. Many parts of the site don't change much, so storing these off helps a lot. Eventually I modularized each page so that instead of having 1 big file, it would be several files - each file being 1 part of an entire page. This let me selectively update parts of the site without wrecking the global cache of all the other stuff.

The problem we're running into now is load on the web server. Each server can only serve so many pages per day based on its capacity.

So, let's add another web server! However, we run into the problem of cache synchronization.... ugh.

Here's the problem in a nutshell:

Say you are surfing BGG on the 'wallace' server and post a reply to a thread. Everything on 'wallace' is updated fine and dandy with the current cache system. We just purge the cache file, and the next time someone looks at the thread we regenerate the file and show that to them. If someone else is over on the other server - 'kramer' they have no knowledge of the new post and it would just keep on showing the stale version of the thread (that doesn't have the post you just made).

Some kind of synchronization is necessary to keep all the servers up to date as much as possible.

The first method I tried was a centralized cache. I set up a directory on the file server which would serve as a centralized cache storage area. Each other web server mounted this directory using NFS (Network File System). Files are read from and written to the cache over the network. Unless your network is very high speed, this can cause a bottleneck. It did with us, it caused a major bottleneck which forced web server processes to stall (waiting on files to be loaded from the NFS cache) before it could continue on. This just caused a downward cycle which resulted in the web server getting more and more overloaded.

So we scratched the NFS server method.

Here's our new method - let's call it the DB Sync method:

- Each web server keeps its own cached files on the local disk.
- A centralized database of cache status is kept. It has an entry for every file in the cache and also whether the file is fresh or stale for each server.
- The web server checks the central database to see if the file has become stale before delivering it to a user.
- If it's stale, we regenerate the cache and update the database status to being fresh - yum!
- Any web server can mark a cache as stale. Say someone replies to a thread, the server that took the process can mark the cache stale for all other servers.

This could potentially cause a bottleneck on the database (each file needs to be looked up). If we keep a good index for fast access, the look up is constant time and should be quick.

We'll see how it goes...
 
 Thumb up
 tip
 Thumb up
  • Last edited Thu Sep 7, 2006 1:19 am (Total Number of Edits: 5)
  • Posted Thu Sep 7, 2006 1:10 am
    • Choose your Dice
      • Roll
      • Comment (Optional)
    • QuickReply
    •  
    • QuickQuote
    •  
    • Reply
    •  
    • Quote
mrbass
United States
Las Vegas
Nevada
Avatar
mbmbmbmbmb
Clever method for sure.
 
 Thumb up
 tip
 Thumb up
Mark Wilder
United States
Evanston
Illinois
Avatar
mbmbmbmbmb
This post is like the rules to Reef Encounter: I don't really follow it, but I'm sure it's good!
 
 Thumb up
 tip
 Thumb up
David desJardins
United States
Burlingame
California
Avatar
mbmbmbmbmb
A shared cacheserver is the best and most scalable approach. In addition to making it easy to invalidate entries, you also get a higher cache hitrate (because you don't have the problem of a user looking on one server for a page that's only cached on a different server). Using local caches can mean that adding more webservers actually lowers performance!

Network speed shouldn't be a problem. The problem is that NFS isn't really up to the job, especially NFS on a generic linux server as opposed to a custom NFS appliance (which is very expensive). You need custom software, and so this is a lot of work to implement. Building fast cacheservers at Google was a big effort (and that was easier, in some ways, because search results aren't edited; they only change when the underlying data or algorithms change).

Querying a database for cache status makes sense. That's part of what you would end up doing with a custom cacheserver anyway (first step would be to query a database to determine the cache status of a page, before fetching it---one of the things that's really slow and inefficient about NFS is how it checks to see whether files have changed since you accessed them last).

It's too bad that growing websites all have to reinvent the same tools. But everyone's problems are a bit different, still.
 
 Thumb up
 tip
 Thumb up
Accipe gaudium ex vestri victorias. Accipe lectiones ex vestri damnis.
United States
38.978164N 76.486881W
Maryland
Avatar
    My only concern is that you have taken the network weight of the NFS and transferred it onto your db. Network load is generally cheaper than db load, especially for small numbers of machines in close proximity.

    Did you consider a dedicated network strictly for the cache? This would run on separate cards between the two machines and run no other traffic. 1Gbps cards are pretty cheap.

             Sag.


 
 Thumb up
 tip
 Thumb up
  • Last edited Thu Sep 7, 2006 2:14 am (Total Number of Edits: 2)
  • Posted Thu Sep 7, 2006 2:12 am
    • Choose your Dice
      • Roll
      • Comment (Optional)
    • QuickReply
    •  
    • QuickQuote
    •  
    • Reply
    •  
    • Quote
Nasty McHaggis
United States
Columbia
South Carolina
I'm pouring all of my points in sarcasm.
badge
Box next to hand.
Avatar
mbmbmbmbmb
So, is somebody going to get laid, or what?
 
 Thumb up
 tip
 Thumb up
Jeff M
United States
Winter Park
Florida
Avatar
mbmbmbmbmb
Instead of using a pull (poll) model to check for invalidated cached files, how about a push model?

Setup a subscription service on one of the servers... each web server (consumer) subscribes to the service to receive invalidate cache file events.

Each web server also publishes the invalidate events when it has processed a web request that requires one or more cache files to be invalidated.

Depending on your setup, instead of a service, could use UDP/IP (or some reliable datagram protocol) and each web server could simply broadcast the events to your subnet and receive the event packets/datagrams on the same socket (endpoint). This leaves out the middle man (the subscription server acting as a relay).

ps (edit): of course this method does leave a small timing window due to propagation delay of the event(s) that the database model doesn't have (assuming the info is being written to the database atomically with the modified data that requires the invalidation in the 1st place)... but in this application the tiny window shouldn't make a difference.
Aldie wrote:
Here's our new method - let's call it the DB Sync method:

- Each web server keeps its own cached files on the local disk.
- A centralized database of cache status is kept. It has an entry for every file in the cache and also whether the file is fresh or stale for each server.
- The web server checks the central database to see if the file has become stale before delivering it to a user.
- If it's stale, we regenerate the cache and update the database status to being fresh - yum!
- Any web server can mark a cache as stale. Say someone replies to a thread, the server that took the process can mark the cache stale for all other servers.

This could potentially cause a bottleneck on the database (each file needs to be looked up). If we keep a good index for fast access, the look up is constant time and should be quick.

We'll see how it goes...
 
 Thumb up
 tip
 Thumb up
  • Last edited Thu Sep 7, 2006 2:26 am (Total Number of Edits: 1)
  • Posted Thu Sep 7, 2006 2:16 am
    • Choose your Dice
      • Roll
      • Comment (Optional)
    • QuickReply
    •  
    • QuickQuote
    •  
    • Reply
    •  
    • Quote
David desJardins
United States
Burlingame
California
Avatar
mbmbmbmbmb
Re: Re: History (and Future) of BoardGameGeek Architecture
JeffyJeff wrote:
Instead of using a pull (poll) model to check for invalidated cached files, how about a push model?


The main downside is that if your webserver restarts, you don't have any way to check the validity of anything in your cache. So you have to throw it all away, when you restart. This reduces the volume of traffic you can support with a given number of webservers, especially if you might have bugs that cause them to restart occasionally (I don't know if Scott does, but I always have).

You also run the risk, if you get up to a really large number of webservers, that the load on each webserver from just processing all of the invalidation requests from other webservers will be significant. That isn't likely to happen for a while, but when it does, it's a huge problem, because at that point you don't get any further performance increase from adding more servers (the total work to process the invalidation requests grows in proportion to the number of servers, so if your servers are all busy processing invalidation requests, and you add more servers, then they will all still be busy).
 
 Thumb up
 tip
 Thumb up
  • Last edited Thu Sep 7, 2006 2:31 am (Total Number of Edits: 1)
  • Posted Thu Sep 7, 2006 2:22 am
    • Choose your Dice
      • Roll
      • Comment (Optional)
    • QuickReply
    •  
    • QuickQuote
    •  
    • Reply
    •  
    • Quote
Scott Alden
United States
Dallas
Texas
admin
Aldie's Full of Love!
Avatar
mbmbmbmbmb
Re: Re: History (and Future) of BoardGameGeek Architecture
JeffyJeff wrote:
Instead of using a pull (poll) model to check for invalidated cached files, how about a push model?

Setup a subscription service on one of the servers... each web server (consumer) subscribes to the service to receive invalidate cache file events.

Each web server also publishes the invalidate events when it has processed a web request that requires one or more cache files to be invalidated.

Depending on your setup, instead of a service, could use UDP/IP (or some reliable datagram protocol) and each web server could simply broadcast the events to your subnet and receive the event packets/datagrams on the same socket (endpoint). This leaves out the middle man (the subscription server acting as a relay).



I actually did consider this and wrote some code to test things out...for some reason the central database method appealed to me more.
 
 Thumb up
 tip
 Thumb up
  • Last edited Thu Sep 7, 2006 2:40 am (Total Number of Edits: 1)
  • Posted Thu Sep 7, 2006 2:23 am
    • Choose your Dice
      • Roll
      • Comment (Optional)
    • QuickReply
    •  
    • QuickQuote
    •  
    • Reply
    •  
    • Quote
David desJardins
United States
Burlingame
California
Avatar
mbmbmbmbmb
Aldie wrote:
- Each web server keeps its own cached files on the local disk.


By the way, one way to make this sort of approach significantly more efficient is if you can implement some sort of "affinity" among webservers. I.e., you want anyone who accesses file X to usually go to server Y. That way, file X will tend to be cached and current on server Y, so they will usually get it out of the cache. It doesn't matter whether it's cached on other servers, as they are rarely asked for it. You would still allow the request for X to go to some other server, if Y is too busy, and that other server can still handle the request correctly (but at higher cost).

But adding affinity may not be feasible unless you have a relatively sophisticated system that's balancing the load among your multiple webservers.
 
 Thumb up
 tip
 Thumb up
Steve Wood
United States
Stamford
Connecticut
Avatar
mbmbmbmbmb
Scott, I've been wondering...

Did you start building BGG by modifying an existing PHP forum application, or is the entire site built from scratch?

I wonder this because the framework is quite unique... with the entire site essentially hanging off of a core forum application. Most sites have a core application and then tack on a forum later, but BGG seems essentially the opposite, and it works well.



 
 Thumb up
 tip
 Thumb up
Scott Alden
United States
Dallas
Texas
admin
Aldie's Full of Love!
Avatar
mbmbmbmbmb
Re: Re: History (and Future) of BoardGameGeek Architecture
DaviddesJ wrote:
Aldie wrote:
- Each web server keeps its own cached files on the local disk.


By the way, one way to make this sort of approach significantly more efficient is if you can implement some sort of "affinity" among webservers. I.e., you want anyone who accesses file X to usually go to server Y. That way, file X will tend to be cached and current on server Y, so they will usually get it out of the cache. It doesn't matter whether it's cached on other servers, as they are rarely asked for it. You would still allow the request for X to go to some other server, if Y is too busy, and that other server can still handle the request correctly (but at higher cost).

But adding affinity may not be feasible unless you have a relatively sophisticated system that's balancing the load among your multiple webservers.


Right now I am trying to get users to stick to a certain server - since I do a lot with the user's session. I would lose that if I flip flop.
 
 Thumb up
 tip
 Thumb up
Scott Alden
United States
Dallas
Texas
admin
Aldie's Full of Love!
Avatar
mbmbmbmbmb
Re: Re: History (and Future) of BoardGameGeek Architecture
Skadar wrote:
Scott, I've been wondering...

Did you start building BGG by modifying an existing PHP forum application, or is the entire site built from scratch?

I wonder this because the framework is quite unique... with the entire site essentially hanging off of a core forum application. Most sites have a core application and then tack on a forum later, but BGG seems essentially the opposite, and it works well.


I started from scratch - and then added in different forums (ultimateBB and phpBB) at different times. But since they were on a different user system they never felt fully integrated, so I just gave up and rolled my own.
 
 Thumb up
 tip
 Thumb up
  • Last edited Thu Sep 7, 2006 2:45 am (Total Number of Edits: 1)
  • Posted Thu Sep 7, 2006 2:44 am
    • Choose your Dice
      • Roll
      • Comment (Optional)
    • QuickReply
    •  
    • QuickQuote
    •  
    • Reply
    •  
    • Quote
Jeff M
United States
Winter Park
Florida
Avatar
mbmbmbmbmb
Re: Re: Re: History (and Future) of BoardGameGeek Architecture
With the assumption of frequently rebooted web servers, with out additions to this model, you are right that you would have to invalidate your whole local cache. If however you are only restarting the web server proces(es), the process receiving the invalidate events can still be operating and keeping the cache synced.

I sure hope Scott's setup doesn't require frequent server reboots. However if it was an issue then the subscription service could be setup so subscriptions are persistent, and reliable delivery. Ie. if the service detects the client went away it would start storing the events and when the consumer came back, it would push out the stored events to that consumer.

Not as simple as the simple relay, and there are probably off the shelf implementations out there (not counting CORBA implementations).
DaviddesJ wrote:
JeffyJeff wrote:
Instead of using a pull (poll) model to check for invalidated cached files, how about a push model?


The main downside is that if your webserver restarts, you don't have any way to check the validity of anything in your cache. So you have to throw it all away, when you restart. This reduces the volume of traffic you can support with a given number of webservers, especially if you might have bugs that cause them to restart occasionally (I don't know if Scott does, but I always have).
 
 Thumb up
 tip
 Thumb up
Scott Alden
United States
Dallas
Texas
admin
Aldie's Full of Love!
Avatar
mbmbmbmbmb
Re: Re: History (and Future) of BoardGameGeek Architecture
Sagrilarus wrote:
My only concern is that you have taken the network weight of the NFS and transferred it onto your db. Network load is generally cheaper than db load, especially for small numbers of machines in close proximity.

Did you consider a dedicated network strictly for the cache? This would run on separate cards between the two machines and run no other traffic. 1Gbps cards are pretty cheap.


I didn't really consider it - My access to fiddling with hardware is very limited. It's ice cold in the server room - I dont like spending a lot of time in there... and I bet it would take me at least a full day to get that up and running.

Not to mention there's no bathroom access. I have to call the NOC anytime I want to pee!
 
 Thumb up
 tip
 Thumb up
  • Last edited Thu Sep 7, 2006 3:05 am (Total Number of Edits: 1)
  • Posted Thu Sep 7, 2006 2:49 am
    • Choose your Dice
      • Roll
      • Comment (Optional)
    • QuickReply
    •  
    • QuickQuote
    •  
    • Reply
    •  
    • Quote
Jeff M
United States
Winter Park
Florida
Avatar
mbmbmbmbmb
By the way, unrelated, but I noticed the new QuickReply/QuickQuote functionality. Nice, however the QuickQuote seems to prepend a "Re: " to the subject, even if it already starts with "Re: ". QuickReply (and non-quick reply/quote) don't do this.
 
 Thumb up
 tip
 Thumb up
Jason Henke
United States
Maple Grove
Minnesota
Avatar
mbmbmb
Hey Aldie,

Thank you for sharing these experiences; I actually find it very interesting.
 
 Thumb up
 tip
 Thumb up
Scott Alden
United States
Dallas
Texas
admin
Aldie's Full of Love!
Avatar
mbmbmbmbmb
JeffyJeff wrote:
By the way, unrelated, but I noticed the new QuickReply/QuickQuote functionality. Nice, however the QuickQuote seems to prepend a "Re: " to the subject, even if it already starts with "Re: ". QuickReply (and non-quick reply/quote) don't do this.


Fixed... Thanks!
 
 Thumb up
 tip
 Thumb up
Scott Alden
United States
Dallas
Texas
admin
Aldie's Full of Love!
Avatar
mbmbmbmbmb
jhenke wrote:
Hey Aldie,

Thank you for sharing these experiences; I actually find it very interesting.


No problem - this writeup was very brief in the details - but I hope it conveyed a little of what goes on behind the scenes here.
 
 Thumb up
 tip
 Thumb up
Jared Heath
United States
Dallas
Texas
Avatar
mbmbmbmbmb
Sagrilarus wrote:
My only concern is that you have taken the network weight of the NFS and transferred it onto your db. Network load is generally cheaper than db load, especially for small numbers of machines in close proximity.

Did you consider a dedicated network strictly for the cache? This would run on separate cards between the two machines and run no other traffic. 1Gbps cards are pretty cheap.

Sag.




I agree with this...seperate the cacheing onto its own subnet isolated would probably help the network bottleneck (if there is one)
 
 Thumb up
 tip
 Thumb up
Stephen Glenn
United States
Virginia Beach
Virginia
designer
Avatar
mbmbmbmbmb
Longbow wrote:
So, is somebody going to get laid, or what?


Sure, but you have to have lots of cache.

 
 Thumb up
 tip
 Thumb up
Jared Heath
United States
Dallas
Texas
Avatar
mbmbmbmbmb
Aldie, just curious:

did you consider Squid with parent/child hierarchy caching at all? I've been playing with it at home some, but don't know the viablity of it on this large a scale.

Can't you do something akin to database replication on caches as well so that they are always in synch?
 
 Thumb up
 tip
 Thumb up
David desJardins
United States
Burlingame
California
Avatar
mbmbmbmbmb
Re: Re: History (and Future) of BoardGameGeek Architecture
Sagrilarus wrote:
My only concern is that you have taken the network weight of the NFS and transferred it onto your db. Network load is generally cheaper than db load, especially for small numbers of machines in close proximity.


The network load isn't significant. The NFS server is slow because NFS is a slow, inefficient protocol (and also because the linux implementation of the NFS server isn't so great). The database is much, much more efficient at storing and fetching change events, than if you make changes by adding and removing files, and look them up through NFS accesses. This would be true even if the network were infinitely fast.
 
 Thumb up
 tip
 Thumb up
  • Last edited Thu Sep 7, 2006 3:32 am (Total Number of Edits: 1)
  • Posted Thu Sep 7, 2006 3:31 am
    • Choose your Dice
      • Roll
      • Comment (Optional)
    • QuickReply
    •  
    • QuickQuote
    •  
    • Reply
    •  
    • Quote
Scott Alden
United States
Dallas
Texas
admin
Aldie's Full of Love!
Avatar
mbmbmbmbmb
jaredh wrote:
Aldie, just curious:

did you consider Squid with parent/child hierarchy caching at all? I've been playing with it at home some, but don't know the viablity of it on this large a scale.

Can't you do something akin to database replication on caches as well so that they are always in synch?


This is a possibility - and I have 1 more server that I may turn into a squid. I would need to figure out a way to make logged in users not use the squid. I assume this is possible because it's what wikipedia does.
 
 Thumb up
 tip
 Thumb up
Paul McKinney
United States
Austin
Texas
Avatar
mbmbmbmbmb
From my experience, unless you have some really high-speed networking going and/or proprietory equipment/solutions, I've always known NFS as 'N'ot 'F'a'S't.
 
 Thumb up
 tip
 Thumb up
1 , 2 , 3  Next »   | 
Front Page | Welcome | Contact | Privacy Policy | Terms of Service | Advertise | Support BGG | Feeds RSS
Geekdo, BoardGameGeek, the Geekdo logo, and the BoardGameGeek logo are trademarks of BoardGameGeek, LLC.