geek
The Hotness
Games|People|Company
Dominion - Stash Promo Card
Runewars
Dominion: Alchemy
Thunderstone
Dominion
Dungeon Lords
Alexander the Great
Agricola
Twilight Struggle
Murder at the Four Deuces
Stronghold
The Republic of Rome
Race for the Galaxy
Small World
Arkham Horror
Founding Fathers
Race for the Galaxy: The Brink of War
Through the Ages: A Story of Civilization
Vapor's Gambit
Battlestar Galactica
Chaos in the Old World
Le Havre
Last Night on Earth: The Zombie Game
Mystery Express
Puerto Rico
Endeavor
Power Grid
Vasco da Gama
Warhammer: Invasion
Descent: Journeys in the Dark
Space Hulk (3rd Edition)
Pandemic
Hansa Teutonica
Carson City
Campaign Manager 2008
War of the Ring
Livingstone
Cosmic Encounter
Stone Age
Le Havre: Le Grand Hameau
Tobago
Cosmic Encounter: Cosmic Incursion Expansion
Twilight Imperium 3rd Edition
Summoner Wars
War of the Ring Collector's Edition
Macao
Carcassonne
Dominion: Prosperity
Neuland
Steam
Rules | Subscriptions | Bookmarks | Search | Account | Moderators
Recommend
17
66 Posts
1 , 2 , 3  Next »   | 
New Thread | Printer Friendly | Subscribe  sub options | Bookmark
Your Tags: Add tags
Popular Tags: bgg [+] news [+] [View All]
Scott Alden
United States
Dallas
Texas
flag msg tools
admin
Avatar
04050607080910
mb
BoardGameGeek started off as a purely dynamic site. There were no static files, and every time you looked at a page it was dynamically generated and displayed 'on the fly'. We queried the database for what you wanted, built the page in memory, and displayed the results to the users. We didn't use any templates - all the code was embedded with the PHP code. The database and the web server were co-located on one machine and I ran it in my bedroom off a DSL business line. This machine was affectionately called "gamegeek." Occasionally my maid would turn it off while cleaning - argh!

This setup is fine if you don't get much traffic. I would guesstimate that it would work for about 500 visitors/day.

Our first step to increase performance was to remove the images from the web server and host them on a separate machine (files). We also moved the servers to a co-located ISP.

This really doesn't help things from a dynamic point of view, but it offloaded some work from the main web server so things did speed up a bit. As traffic increases, so does the load on the database. We were still doing everything dynamically.

Next step we offloaded the database to a separate machine. This was a huge performance increase - probably the biggest we ever got. The database ran on a server with 2GB RAM for many years. Once the database started getting overloaded, we had to look for other improvements.

I added in the caching subsystem to offload work from the database. Instead of generating everything dynamically on every page request, we saved off a version of the page on the local web server's disk. If you were looking at page "X", the system would check for that page locally first. If it wasn't there then we would dynamically generate it, and save it off so that the next person to look at it would find it on the disk.

This is a huge savings in processing power. Many parts of the site don't change much, so storing these off helps a lot. Eventually I modularized each page so that instead of having 1 big file, it would be several files - each file being 1 part of an entire page. This let me selectively update parts of the site without wrecking the global cache of all the other stuff.

The problem we're running into now is load on the web server. Each server can only serve so many pages per day based on its capacity.

So, let's add another web server! However, we run into the problem of cache synchronization.... ugh.

Here's the problem in a nutshell:

Say you are surfing BGG on the 'wallace' server and post a reply to a thread. Everything on 'wallace' is updated fine and dandy with the current cache system. We just purge the cache file, and the next time someone looks at the thread we regenerate the file and show that to them. If someone else is over on the other server - 'kramer' they have no knowledge of the new post and it would just keep on showing the stale version of the thread (that doesn't have the post you just made).

Some kind of synchronization is necessary to keep all the servers up to date as much as possible.

The first method I tried was a centralized cache. I set up a directory on the file server which would serve as a centralized cache storage area. Each other web server mounted this directory using NFS (Network File System). Files are read from and written to the cache over the network. Unless your network is very high speed, this can cause a bottleneck. It did with us, it caused a major bottleneck which forced web server processes to stall (waiting on files to be loaded from the NFS cache) before it could continue on. This just caused a downward cycle which resulted in the web server getting more and more overloaded.

So we scratched the NFS server method.

Here's our new method - let's call it the DB Sync method:

- Each web server keeps its own cached files on the local disk.
- A centralized database of cache status is kept. It has an entry for every file in the cache and also whether the file is fresh or stale for each server.
- The web server checks the central database to see if the file has become stale before delivering it to a user.
- If it's stale, we regenerate the cache and update the database status to being fresh - yum!
- Any web server can mark a cache as stale. Say someone replies to a thread, the server that took the process can mark the cache stale for all other servers.

This could potentially cause a bottleneck on the database (each file needs to be looked up). If we keep a good index for fast access, the look up is constant time and should be quick.

We'll see how it goes...
Last edited on 2006-09-06 19:19:27 CST (Total Number of Edits: 5)
mrbass
United States
Las Vegas
Nevada
flag msg tools
Avatar
mbmbmbmbmb
Clever method for sure.
Mark Wilder
United States
Evanston
Illinois
flag msg tools
Avatar
050607080910
mbmbmbmbmb
This post is like the rules to Reef Encounter: I don't really follow it, but I'm sure it's good!
David desJardins
United States
Burlingame
California
flag msg tools
Avatar
04050607080910
mbmbmbmbmb
A shared cacheserver is the best and most scalable approach. In addition to making it easy to invalidate entries, you also get a higher cache hitrate (because you don't have the problem of a user looking on one server for a page that's only cached on a different server). Using local caches can mean that adding more webservers actually lowers performance!

Network speed shouldn't be a problem. The problem is that NFS isn't really up to the job, especially NFS on a generic linux server as opposed to a custom NFS appliance (which is very expensive). You need custom software, and so this is a lot of work to implement. Building fast cacheservers at Google was a big effort (and that was easier, in some ways, because search results aren't edited; they only change when the underlying data or algorithms change).

Querying a database for cache status makes sense. That's part of what you would end up doing with a custom cacheserver anyway (first step would be to query a database to determine the cache status of a page, before fetching it---one of the things that's really slow and inefficient about NFS is how it checks to see whether files have changed since you accessed them last).

It's too bad that growing websites all have to reinvent the same tools. But everyone's problems are a bit different, still.
(Retired)
United States
38.978164N 76.486881W
Maryland
flag msg tools
Avatar
patron06070809
    My only concern is that you have taken the network weight of the NFS and transferred it onto your db. Network load is generally cheaper than db load, especially for small numbers of machines in close proximity.

    Did you consider a dedicated network strictly for the cache? This would run on separate cards between the two machines and run no other traffic. 1Gbps cards are pretty cheap.

             Sag.


Last edited on 2006-09-06 20:14:24 CST (Total Number of Edits: 2)
Nasty McHaggis
United States
Columbia
South Carolina
flag msg tools
Avatar
patron04050708
mbmbmbmbmb
So, is somebody going to get laid, or what?
Jeff M
United States
Windham
New Hampshire
flag msg tools
Avatar
patron0507
mbmbmbmbmb
Instead of using a pull (poll) model to check for invalidated cached files, how about a push model?

Setup a subscription service on one of the servers... each web server (consumer) subscribes to the service to receive invalidate cache file events.

Each web server also publishes the invalidate events when it has processed a web request that requires one or more cache files to be invalidated.

Depending on your setup, instead of a service, could use UDP/IP (or some reliable datagram protocol) and each web server could simply broadcast the events to your subnet and receive the event packets/datagrams on the same socket (endpoint). This leaves out the middle man (the subscription server acting as a relay).

ps (edit): of course this method does leave a small timing window due to propagation delay of the event(s) that the database model doesn't have (assuming the info is being written to the database atomically with the modified data that requires the invalidation in the 1st place)... but in this application the tiny window shouldn't make a difference.
Aldie wrote:
Here's our new method - let's call it the DB Sync method:

- Each web server keeps its own cached files on the local disk.
- A centralized database of cache status is kept. It has an entry for every file in the cache and also whether the file is fresh or stale for each server.
- The web server checks the central database to see if the file has become stale before delivering it to a user.
- If it's stale, we regenerate the cache and update the database status to being fresh - yum!
- Any web server can mark a cache as stale. Say someone replies to a thread, the server that took the process can mark the cache stale for all other servers.

This could potentially cause a bottleneck on the database (each file needs to be looked up). If we keep a good index for fast access, the look up is constant time and should be quick.

We'll see how it goes...
Last edited on 2006-09-06 20:26:57 CST (Total Number of Edits: 1)
David desJardins
United States
Burlingame
California
flag msg tools
Avatar
04050607080910
mbmbmbmbmb
Re: Re: History (and Future) of BoardGameGeek Architecture
JeffyJeff wrote:
Instead of using a pull (poll) model to check for invalidated cached files, how about a push model?


The main downside is that if your webserver restarts, you don't have any way to check the validity of anything in your cache. So you have to throw it all away, when you restart. This reduces the volume of traffic you can support with a given number of webservers, especially if you might have bugs that cause them to restart occasionally (I don't know if Scott does, but I always have).

You also run the risk, if you get up to a really large number of webservers, that the load on each webserver from just processing all of the invalidation requests from other webservers will be significant. That isn't likely to happen for a while, but when it does, it's a huge problem, because at that point you don't get any further performance increase from adding more servers (the total work to process the invalidation requests grows in proportion to the number of servers, so if your servers are all busy processing invalidation requests, and you add more servers, then they will all still be busy).
Last edited on 2006-09-06 20:31:11 CST (Total Number of Edits: 1)
Scott Alden
United States
Dallas
Texas
flag msg tools
admin
Avatar
04050607080910
mb
Re: Re: History (and Future) of BoardGameGeek Architecture
JeffyJeff wrote:
Instead of using a pull (poll) model to check for invalidated cached files, how about a push model?

Setup a subscription service on one of the servers... each web server (consumer) subscribes to the service to receive invalidate cache file events.

Each web server also publishes the invalidate events when it has processed a web request that requires one or more cache files to be invalidated.

Depending on your setup, instead of a service, could use UDP/IP (or some reliable datagram protocol) and each web server could simply broadcast the events to your subnet and receive the event packets/datagrams on the same socket (endpoint). This leaves out the middle man (the subscription server acting as a relay).



I actually did consider this and wrote some code to test things out...for some reason the central database method appealed to me more.
Last edited on 2006-09-06 20:40:01 CST (Total Number of Edits: 1)
David desJardins
United States
Burlingame
California
flag msg tools
Avatar
04050607080910
mbmbmbmbmb
Aldie wrote:
- Each web server keeps its own cached files on the local disk.


By the way, one way to make this sort of approach significantly more efficient is if you can implement some sort of "affinity" among webservers. I.e., you want anyone who accesses file X to usually go to server Y. That way, file X will tend to be cached and current on server Y, so they will usually get it out of the cache. It doesn't matter whether it's cached on other servers, as they are rarely asked for it. You would still allow the request for X to go to some other server, if Y is too busy, and that other server can still handle the request correctly (but at higher cost).

But adding affinity may not be feasible unless you have a relatively sophisticated system that's balancing the load among your multiple webservers.
Steve Wood
United States
Stamford
Connecticut
flag msg tools
Avatar
patron06070809
mbmbmbmbmb
Scott, I've been wondering...

Did you start building BGG by modifying an existing PHP forum application, or is the entire site built from scratch?

I wonder this because the framework is quite unique... with the entire site essentially hanging off of a core forum application. Most sites have a core application and then tack on a forum later, but BGG seems essentially the opposite, and it works well.



Scott Alden
United States
Dallas
Texas
flag msg tools
admin
Avatar
04050607080910
mb
Re: Re: History (and Future) of BoardGameGeek Architecture
DaviddesJ wrote:
Aldie wrote:
- Each web server keeps its own cached files on the local disk.


By the way, one way to make this sort of approach significantly more efficient is if you can implement some sort of "affinity" among webservers. I.e., you want anyone who accesses file X to usually go to server Y. That way, file X will tend to be cached and current on server Y, so they will usually get it out of the cache. It doesn't matter whether it's cached on other servers, as they are rarely asked for it. You would still allow the request for X to go to some other server, if Y is too busy, and that other server can still handle the request correctly (but at higher cost).

But adding affinity may not be feasible unless you have a relatively sophisticated system that's balancing the load among your multiple webservers.


Right now I am trying to get users to stick to a certain server - since I do a lot with the user's session. I would lose that if I flip flop.
Scott Alden
United States
Dallas
Texas
flag msg tools
admin
Avatar
04050607080910
mb
Re: Re: History (and Future) of BoardGameGeek Architecture
Skadar wrote:
Scott, I've been wondering...

Did you start building BGG by modifying an existing PHP forum application, or is the entire site built from scratch?

I wonder this because the framework is quite unique... with the entire site essentially hanging off of a core forum application. Most sites have a core application and then tack on a forum later, but BGG seems essentially the opposite, and it works well.


I started from scratch - and then added in different forums (ultimateBB and phpBB) at different times. But since they were on a different user system they never felt fully integrated, so I just gave up and rolled my own.
Last edited on 2006-09-06 20:45:04 CST (Total Number of Edits: 1)
Jeff M
United States
Windham
New Hampshire
flag msg tools
Avatar
patron0507
mbmbmbmbmb
Re: Re: Re: History (and Future) of BoardGameGeek Architecture
With the assumption of frequently rebooted web servers, with out additions to this model, you are right that you would have to invalidate your whole local cache. If however you are only restarting the web server proces(es), the process receiving the invalidate events can still be operating and keeping the cache synced.

I sure hope Scott's setup doesn't require frequent server reboots. However if it was an issue then the subscription service could be setup so subscriptions are persistent, and reliable delivery. Ie. if the service detects the client went away it would start storing the events and when the consumer came back, it would push out the stored events to that consumer.

Not as simple as the simple relay, and there are probably off the shelf implementations out there (not counting CORBA implementations).
DaviddesJ wrote:
JeffyJeff wrote:
Instead of using a pull (poll) model to check for invalidated cached files, how about a push model?


The main downside is that if your webserver restarts, you don't have any way to check the validity of anything in your cache. So you have to throw it all away, when you restart. This reduces the volume of traffic you can support with a given number of webservers, especially if you might have bugs that cause them to restart occasionally (I don't know if Scott does, but I always have).
Scott Alden
United States
Dallas
Texas
flag msg tools
admin
Avatar
04050607080910
mb
Re: Re: History (and Future) of BoardGameGeek Architecture
Sagrilarus wrote:
My only concern is that you have taken the network weight of the NFS and transferred it onto your db. Network load is generally cheaper than db load, especially for small numbers of machines in close proximity.

Did you consider a dedicated network strictly for the cache? This would run on separate cards between the two machines and run no other traffic. 1Gbps cards are pretty cheap.


I didn't really consider it - My access to fiddling with hardware is very limited. It's ice cold in the server room - I dont like spending a lot of time in there... and I bet it would take me at least a full day to get that up and running.

Not to mention there's no bathroom access. I have to call the NOC anytime I want to pee!
Last edited on 2006-09-06 21:05:45 CST (Total Number of Edits: 1)
Jeff M
United States
Windham
New Hampshire
flag msg tools
Avatar
patron0507
mbmbmbmbmb
By the way, unrelated, but I noticed the new QuickReply/QuickQuote functionality. Nice, however the QuickQuote seems to prepend a "Re: " to the subject, even if it already starts with "Re: ". QuickReply (and non-quick reply/quote) don't do this.
Jason Henke
United States
Maple Grove
Minnesota
flag msg tools
Avatar
040506070809
mbmbmb
Hey Aldie,

Thank you for sharing these experiences; I actually find it very interesting.
Scott Alden
United States
Dallas
Texas
flag msg tools
admin
Avatar
04050607080910
mb
JeffyJeff wrote:
By the way, unrelated, but I noticed the new QuickReply/QuickQuote functionality. Nice, however the QuickQuote seems to prepend a "Re: " to the subject, even if it already starts with "Re: ". QuickReply (and non-quick reply/quote) don't do this.


Fixed... Thanks!
Scott Alden
United States
Dallas
Texas
flag msg tools
admin
Avatar
04050607080910
mb
jhenke wrote:
Hey Aldie,

Thank you for sharing these experiences; I actually find it very interesting.


No problem - this writeup was very brief in the details - but I hope it conveyed a little of what goes on behind the scenes here.
Jared Heath
United States
Dallas
Texas
flag msg tools
Avatar
0405060709
mbmbmbmbmb
Sagrilarus wrote:
My only concern is that you have taken the network weight of the NFS and transferred it onto your db. Network load is generally cheaper than db load, especially for small numbers of machines in close proximity.

Did you consider a dedicated network strictly for the cache? This would run on separate cards between the two machines and run no other traffic. 1Gbps cards are pretty cheap.

Sag.




I agree with this...seperate the cacheing onto its own subnet isolated would probably help the network bottleneck (if there is one)
Stephen Glenn
United States
Virginia Beach
Virginia
flag msg tools
designer
Avatar
patron04
mbmbmbmbmb
Longbow wrote:
So, is somebody going to get laid, or what?


Sure, but you have to have lots of cache.

Jared Heath
United States
Dallas
Texas
flag msg tools
Avatar
0405060709
mbmbmbmbmb
Aldie, just curious:

did you consider Squid with parent/child hierarchy caching at all? I've been playing with it at home some, but don't know the viablity of it on this large a scale.

Can't you do something akin to database replication on caches as well so that they are always in synch?
David desJardins
United States
Burlingame
California
flag msg tools
Avatar
04050607080910
mbmbmbmbmb
Re: Re: History (and Future) of BoardGameGeek Architecture
Sagrilarus wrote:
My only concern is that you have taken the network weight of the NFS and transferred it onto your db. Network load is generally cheaper than db load, especially for small numbers of machines in close proximity.


The network load isn't significant. The NFS server is slow because NFS is a slow, inefficient protocol (and also because the linux implementation of the NFS server isn't so great). The database is much, much more efficient at storing and fetching change events, than if you make changes by adding and removing files, and look them up through NFS accesses. This would be true even if the network were infinitely fast.
Last edited on 2006-09-06 21:32:00 CST (Total Number of Edits: 1)
Scott Alden
United States
Dallas
Texas
flag msg tools
admin
Avatar
04050607080910
mb
jaredh wrote:
Aldie, just curious:

did you consider Squid with parent/child hierarchy caching at all? I've been playing with it at home some, but don't know the viablity of it on this large a scale.

Can't you do something akin to database replication on caches as well so that they are always in synch?


This is a possibility - and I have 1 more server that I may turn into a squid. I would need to figure out a way to make logged in users not use the squid. I assume this is possible because it's what wikipedia does.
Paul McKinney
United States
Austin
Texas
flag msg tools
Avatar
patron07
mbmbmbmbmb
From my experience, unless you have some really high-speed networking going and/or proprietory equipment/solutions, I've always known NFS as 'N'ot 'F'a'S't.
1 , 2 , 3  Next »   | 
Front Page | Welcome | Contact | Privacy Policy | DMCA | Advertise | Support BGG | Feeds RSS
BoardGameGeek and the BoardGameGeek logo are trademarks of BoardGameGeek, LLC.