recommended maximum number of nodes in memcache cluster (server pool)

Discussion:

moses wejuli

2011-11-26 05:05:18 UTC

hi guys,

not sure if this has been asked (and answered) before, but thought i might
ask away anyway...

what would be the recommended maximum number of nodes in a memcached server
pool (cluster) ...? am thinking u cannot go on indefinitely adding nodes
without some sort of performance penalty -- a 100-node homogeneous cluster
will probably hash faster than a 2000-node homogeneous cluster??! with
additional network issues for good measure??

any pointers would be very helpful!!

oh, and what wud be the optimal node specs in such a case (particularly CPU
cores)?

thanks,

-m.

Henrik Schröder

2011-11-26 11:47:11 UTC

Permalink

The only limits are when you've saturated your internal network or hit the
max number of TCP connections that your underlying OS can handle. The
amount of nodes make absolutely no difference.

Yes, part of the server selection algorithm gets slower the more nodes you
have, but that part is insignificant compared to the part where you
actually compute the hash for each key, and that in turn is insignificant
compared to the time it takes to talk to a server over the network, so in
effect there is no maximum amount of nodes.

The memcached server itself consumes very little CPU, don't worry about
that. In the typical case you don't build a separate cluster for that, you
just use whatever servers you already have that have some spare RAM.

/Henrik

Post by moses wejuli
hi guys,
not sure if this has been asked (and answered) before, but thought i might
ask away anyway...
what would be the recommended maximum number of nodes in a memcached
server pool (cluster) ...? am thinking u cannot go on indefinitely adding
nodes without some sort of performance penalty -- a 100-node homogeneous
cluster will probably hash faster than a 2000-node homogeneous cluster??!
with additional network issues for good measure??
any pointers would be very helpful!!
oh, and what wud be the optimal node specs in such a case (particularly
CPU cores)?
thanks,
-m.

Arjen van der Meijden

2011-11-26 13:15:50 UTC

Permalink

Wouldn't more servers become increasingly (seen from the application)
slower as you force your clients to connect to more servers?

Assuming all machines have enough processing power and network
bandwidth, I'd expect performance of the last of these variants to be best:
16x 1GB machines
8x 2GB machines
4x 4GB machines
2x 8GB machines
1x 16GB machines

In the first one you may end up with 16 different tcp/ip-connections per
client. Obviously, connection pooling and proxies can alleviate some of
that overhead. Still, a multi-get might actually hit all 16 servers.

Obviously, the last variant offers much lower availability.

Best regards,

Arjen

Post by Henrik SchrÃ¶der
The only limits are when you've saturated your internal network or hit
the max number of TCP connections that your underlying OS can handle.
The amount of nodes make absolutely no difference.
Yes, part of the server selection algorithm gets slower the more nodes
you have, but that part is insignificant compared to the part where you
actually compute the hash for each key, and that in turn is
insignificant compared to the time it takes to talk to a server over the
network, so in effect there is no maximum amount of nodes.
The memcached server itself consumes very little CPU, don't worry about
that. In the typical case you don't build a separate cluster for that,
you just use whatever servers you already have that have some spare RAM.
/Henrik
hi guys,
not sure if this has been asked (and answered) before, but thought i
might ask away anyway...
what would be the recommended maximum number of nodes in a memcached
server pool (cluster) ...? am thinking u cannot go on indefinitely
adding nodes without some sort of performance penalty -- a 100-node
homogeneous cluster will probably hash faster than a 2000-node
homogeneous cluster??! with additional network issues for good
measure??
any pointers would be very helpful!!
oh, and what wud be the optimal node specs in such a case
(particularly CPU cores)?
thanks,
-m.

moses wejuli

2011-11-26 17:51:10 UTC

Permalink

Thanks Henrik. Thanks Arjen.

Post by Arjen van der Meijden
Wouldn't more servers become increasingly (seen from the application)
slower as you force your clients to connect to more servers?
Assuming all machines have enough processing power and network bandwidth,
16x 1GB machines
8x 2GB machines
4x 4GB machines
2x 8GB machines
1x 16GB machines
In the first one you may end up with 16 different tcp/ip-connections per
client. Obviously, connection pooling and proxies can alleviate some of
that overhead. Still, a multi-get might actually hit all 16 servers.
Obviously, the last variant offers much lower availability.
Best regards,
Arjen

Les Mikesell

2011-11-26 18:28:33 UTC

Permalink

Wouldn't more servers become increasingly (seen from the application) slower
as you force your clients to connect to more servers?
Assuming all machines have enough processing power and network bandwidth,
16x 1GB machines
8x 2GB machines
4x 4GB machines
2x 8GB machines
1x 16GB machines
In the first one you may end up with 16 different tcp/ip-connections per
client. Obviously, connection pooling and proxies can alleviate some of that
overhead. Still, a multi-get might actually hit all 16 servers.

That doesn't make sense. Why would you expect 16 servers acting in
parallel to be slower than a single server? And in many/most cases
the application will also be spread over multiple servers so the load
is distributed independently there as well.

--
Les Mikesell
***@gmail.com

moses wejuli

2011-11-26 20:05:24 UTC

Permalink

@Les, you make a clear and concise point. thnx.

In this thread, i'm really keen on exploring a theoretical possibility
(that could become very practical for very large installations):

-- at what node count (for a given pool) may/could we start to
experience problems related to performance (server, network or even
client) assuming a near perfect hardware/network set-up?
-- if a memcacached client were to pool say, 2,000 or 20,000
connections (again, theoretical but not entirely impractical given the rate
of internet growth), wud that not inject enough overhead -- connection or
otherwise -- on the client side to, say, warrant a direct fetch from the
database? in such a case, we wud have established a *theoretical* maximum
number nodes in a pool for that given client in near perfect conditions.
-- also, i wud think the hashing algo wud deteriorate after a given
number of nodes.. admittedly, this number could be very large indeed and
also, i know this is unlikely in probably 99.999% of cases but it wud be
great to factor in the maths behind science.

Just saying....

-m.

Post by Henrik SchrÃ¶der

Post by Arjen van der Meijden
Wouldn't more servers become increasingly (seen from the application)

slower

Post by Arjen van der Meijden
as you force your clients to connect to more servers?
Assuming all machines have enough processing power and network bandwidth,
16x 1GB machines
8x 2GB machines
4x 4GB machines
2x 8GB machines
1x 16GB machines
In the first one you may end up with 16 different tcp/ip-connections per
client. Obviously, connection pooling and proxies can alleviate some of

that

Post by Arjen van der Meijden
overhead. Still, a multi-get might actually hit all 16 servers.

Les Mikesell

2011-11-26 21:25:41 UTC

Permalink

Post by moses wejuli
@Les, you make a clear and concise point. thnx.
In this thread, i'm really keen on exploring a theoretical possibility (that
-- at what node count (for a given pool) may/could we start to
experience problems related to performance (server, network or even client)
assuming a near perfect hardware/network set-up?

I think you can hit a point in number of requests in a multiget where
increasing servers doesn't help speed up that single query (because
you are limited by client cpu making the request), but having more
servers isn't actually a problem.

You will likely have to worry about the persistent backend database
scaling long before that point.

--
Les Mikesell
***@gmail.com

dormando

2011-11-26 22:34:32 UTC

Permalink

Post by moses wejuli
@Les, you make a clear and concise point. thnx.
-- at what node count (for a given pool) may/could we start to experience problems related to performance (server, network or even client)
assuming a near perfect hardware/network set-up?

I think the really basic theoretical response is:

- If your request will easily fit in the TCP send buffer and immediately
transfer out the network card, it's best if it hits a single server.
- If your requests are large, you can get lower latency responses by not
waiting on the TCP socket.
- Then there's some fiddling in the middle.
- Each time a client runs "send" that's a syscall, so more do suck, but
keep in mind the above tradeoff: A little system cpu time vs waiting for
TCP Ack's.

In reality it doesn't tend to matter that much. The point of my response
to the facebook "multiget hole" is that you can tell clients to group keys
to specific or subsets of servers, (like all keys related to a particular
user), so you can have a massive pool and still generally avoid contacting
all of them on every request.

Post by moses wejuli
-- if a memcacached client were to pool say, 2,000 or 20,000 connections (again, theoretical but not entirely impractical given the rate of
internet growth), wud that not inject enough overhead -- connection or otherwise -- on the client side to, say, warrant a direct fetch from the
database? in such a case, we wud have established a *theoretical* maximum number nodes in a pool for that given client in near perfect conditions.

The theory depends on your setup, of course:

- Accessing the server hash takes no time (it's a hash), calculating it
is the time consuming one. We've seen clients misbehave and seriously slow
things down by recalculating a consistent hash on every request. So long
as you're internally caching the continuum the lookups are free.

- Established TCP sockets mostly just waste RAM, but don't generally slow
things down. So for a client server, you can calculate the # of memcached
instances * number of apache procs or whatever * the amount of memory
overhead per TCP socket compared to the amount of RAM in the box and
there's your limit. If you're using persistent connections.

- If you decide to not use persistent connections, and design your
application so satisfying a page read would hit at *most* something like 3
memcached instances, you can go much higher. Tune the servers for
TIME_WAIT reuse, higher local ports, etc, which deals with the TCP churn.
Connections are established on first use, then reused until the end of the
request, so the TCP SYN/ACK cycle for 1-3 (or even more) instances won't
add up to much. Pretending you can have an infinite number of servers on
the same L2 segment you would likely be limited purely by bandwidth, or
the amount of memory required to load the consistent hash for clients.
Probably tens of thousands.

- Or use UDP, if your data is tiny and you tune the fuck out of it.
Typically it doesn't seem to be much faster, but I may get a boost out of
it with some new linux syscalls.

- Or (Matt/Dustin correct me if I'm wrong) you use a client design like
spymemcached. The memcached binary protocol can actually allow many client
instances to use the same server connections. Each client stacks commands
in the TCP sockets like a queue (you could even theoretically add a few
more connections if you block too long waiting for space), then they get
responses routed to them off the same socket. This means you can use
persistent connections, and generally have one socket per server instance
for an entire app server. Many thousands should scale okay.

- Remember Moore's law does grow computers very quickly. Maybe not as fast
as the internet, but ten years ago you would have 100 megabit 2G ram
memcached instances and need an awful lot of them. Right now 10ge is
dropping in price, 100G+ RAM servers are more affordable, and the industry
is already looking toward higher network rates. So as your company grows,
you get opportunities to cut the instance count every few years.

Post by moses wejuli
-- also, i wud think the hashing algo wud deteriorate after a given number of nodes.. admittedly, this number could be very large indeed and
also, i know this is unlikely in probably 99.999% of cases but it wud be great to factor in the maths behind science.

I sorta answered this above. Should put this into a wiki page I guess...

-Dormando

moses wejuli

2011-11-27 00:08:24 UTC

Permalink

@dormando, great response.... this is almost exctly what i had in mind,
i.e. grouping all of your memcached servers into logical pools so as to
avoid hitting all of them for every request. infact, a reasonable design,
for a very large server installation base, would be to aim for say 10-20%
node hit for every request (or even less if u can manage it).

so with the facebook example, we know there's a point you get to where a
high node count means all sorts of problems, in this case, it was 800, i
think (correct me if am wrong) proving the point that logical groupings
should be the way to go for large pools -- infact i would suggest groupings
with varying canopies of granularity as long as your app kept a simple and
clear means by which to zap down to a small cross section of servers
without losing any intended benefits of caching in the fisrt place..

in short, most of my anxities have been well addressed (both theoretical
and practical)...

+1 for posting this in a wiki Dormando.

thanks @dormando @henrik @les (oh, and @arjen)

-m.

Post by moses wejuli
@Les, you make a clear and concise point. thnx.
In this thread, i'm really keen on exploring a theoretical possibility
-- at what node count (for a given pool) may/could we start to

experience problems related to performance (server, network or even client)

Post by moses wejuli
assuming a near perfect hardware/network set-up?

the benefoit
- If your request will easily fit in the TCP send buffer and immediately
transfer out the network card, it's best if it hits a single server.
- If your requests are large, you can get lower latency responses by not
waiting on the TCP socket.
- Then there's some fiddling in the middle.
- Each time a client runs "send" that's a syscall, so more do suck, but
keep in mind the above tradeoff: A little system cpu time vs waiting for
TCP Ack's.
In reality it doesn't tend to matter that much. The point of my response
to the facebook "multiget hole" is that you can tell clients to group keys
to specific or subsets of servers, (like all keys related to a particular
user), so you can have a massive pool and still generally avoid contacting
all of them on every request.

Post by moses wejuli
-- if a memcacached client were to pool say, 2,000 or 20,000

connections (again, theoretical but not entirely impractical given the rate
of

Post by moses wejuli
internet growth), wud that not inject enough overhead -- connection or

otherwise -- on the client side to, say, warrant a direct fetch from the

Post by moses wejuli
database? in such a case, we wud have established a *theoretical*

maximum number nodes in a pool for that given client in near perfect
conditions.
- Accessing the server hash takes no time (it's a hash), calculating it
is the time consuming one. We've seen clients misbehave and seriously slow
things down by recalculating a consistent hash on every request. So long
as you're internally caching the continuum the lookups are free.
- Established TCP sockets mostly just waste RAM, but don't generally slow
things down. So for a client server, you can calculate the # of memcached
instances * number of apache procs or whatever * the amount of memory
overhead per TCP socket compared to the amount of RAM in the box and
there's your limit. If you're using persistent connections.
- If you decide to not use persistent connections, and design your
application so satisfying a page read would hit at *most* something like 3
memcached instances, you can go much higher. Tune the servers for
TIME_WAIT reuse, higher local ports, etc, which deals with the TCP churn.
Connections are established on first use, then reused until the end of the
request, so the TCP SYN/ACK cycle for 1-3 (or even more) instances won't
add up to much. Pretending you can have an infinite number of servers on
the same L2 segment you would likely be limited purely by bandwidth, or
the amount of memory required to load the consistent hash for clients.
Probably tens of thousands.
- Or use UDP, if your data is tiny and you tune the fuck out of it.
Typically it doesn't seem to be much faster, but I may get a boost out of
it with some new linux syscalls.
- Or (Matt/Dustin correct me if I'm wrong) you use a client design like
spymemcached. The memcached binary protocol can actually allow many client
instances to use the same server connections. Each client stacks commands
in the TCP sockets like a queue (you could even theoretically add a few
more connections if you block too long waiting for space), then they get
responses routed to them off the same socket. This means you can use
persistent connections, and generally have one socket per server instance
for an entire app server. Many thousands should scale okay.
- Remember Moore's law does grow computers very quickly. Maybe not as fast
as the internet, but ten years ago you would have 100 megabit 2G ram
memcached instances and need an awful lot of them. Right now 10ge is
dropping in price, 100G+ RAM servers are more affordable, and the industry
is already looking toward higher network rates. So as your company grows,
you get opportunities to cut the instance count every few years.

Post by moses wejuli
-- also, i wud think the hashing algo wud deteriorate after a given

number of nodes.. admittedly, this number could be very large indeed and

Post by moses wejuli
also, i know this is unlikely in probably 99.999% of cases but it wud

be great to factor in the maths behind science.
I sorta answered this above. Should put this into a wiki page I guess...
-Dormando

dormando

2011-12-08 22:27:25 UTC

Permalink

@dormando, great response.... this is almost exctly what i had in mind, i.e. grouping all of your memcached servers into logical pools so as to
avoid hitting all of them for every request. infact, a reasonable design, for a very large server installation base, would be to aim for say 10-20%
node hit for every request (or even less if u can manage it).
so with the facebook example, we know there's a point you get to where a high node count means all sorts of problems, in this case, it was 800, i
think (correct me if am wrong) proving the point that logical groupings should be the way to go for large pools -- infact i would suggest groupings
with varying canopies of granularity as long as your app kept a simple and clear means by which to zap down to a small cross section of servers
without losing any intended benefits of caching in the fisrt place..
in short, most of my anxities have been well addressed (both theoretical and practical)...
+1 for posting this in a wiki Dormando.

http://code.google.com/p/memcached/wiki/NewPerformance added to the bottom
of this wiki page... tried to summarize the notes better, but speak up
with edits if something's missing or unclear.

moses wejuli

2011-12-09 01:20:57 UTC

Permalink

i think the notes are well summarised... cheers.

Post by moses wejuli

Post by moses wejuli
@dormando, great response.... this is almost exctly what i had in mind,

i.e. grouping all of your memcached servers into logical pools so as to

Post by moses wejuli
avoid hitting all of them for every request. infact, a reasonable

design, for a very large server installation base, would be to aim for say
10-20%

Post by moses wejuli
node hit for every request (or even less if u can manage it).
so with the facebook example, we know there's a point you get to where

a high node count means all sorts of problems, in this case, it was 800, i

Post by moses wejuli
think (correct me if am wrong) proving the point that logical groupings

should be the way to go for large pools -- infact i would suggest groupings

Post by moses wejuli
with varying canopies of granularity as long as your app kept a simple

and clear means by which to zap down to a small cross section of servers

Post by moses wejuli
without losing any intended benefits of caching in the fisrt place..
in short, most of my anxities have been well addressed (both theoretical

and practical)...

Post by moses wejuli
+1 for posting this in a wiki Dormando.

http://code.google.com/p/memcached/wiki/NewPerformance added to the bottom
of this wiki page... tried to summarize the notes better, but speak up
with edits if something's missing or unclear.

Arjen van der Meijden

2011-11-26 21:19:41 UTC

Permalink

Post by Les Mikesell

Wouldn't more servers become increasingly (seen from the application) slower
as you force your clients to connect to more servers?
Assuming all machines have enough processing power and network bandwidth,
16x 1GB machines
8x 2GB machines
4x 4GB machines
2x 8GB machines
1x 16GB machines
In the first one you may end up with 16 different tcp/ip-connections per
client. Obviously, connection pooling and proxies can alleviate some of that
overhead. Still, a multi-get might actually hit all 16 servers.

Why not? Will it really be in parallel? Given that most application code
is fairly linear (i.e. all parallelism will have to come from the client
library). Even with true parallelism, you'll still have to connect to
all servers, be hindered by slow starts, etc (a connection pool may help
here). I'm just wondering whether the connection and other tcp/ip
overheads will be outweighed by any load-spreading gains. Especially
since memcached's part of the job is fairly quick.

Here's another variant on my question I hadn't even thought about:
http://highscalability.com/blog/2009/10/26/facebooks-memcached-multiget-hole-more-machines-more-capacit.html
And here's Dormando's response to that;
http://dormando.livejournal.com/521163.html

So his post also suggests it might not be a good idea to issue small
requests to many servers rather than issue large requests to few.

Best regards,

Arjen

Les Mikesell

2011-11-26 21:31:05 UTC

Permalink

Post by Arjen van der Meijden
In the first one you may end up with 16 different tcp/ip-connections per
client. Obviously, connection pooling and proxies can alleviate some of that
overhead. Still, a multi-get might actually hit all 16 servers.

Are you only expecting to run one client that only makes one query at once?

I'm just wondering whether the connection and other tcp/ip overheads will be
outweighed by any load-spreading gains. Especially since memcached's part of
the job is fairly quick.

If you are facebook-size, you might want to convert to udp, but I
thought that was more more of a number-of-client-connection issue for
them from a bazillion preforked apache processes all connecting
separately to the servers.

--
Les Mikesell
***@gmail.com

Arjen van der Meijden

2011-11-27 16:58:02 UTC

Permalink

Post by Les Mikesell

Are you only expecting to run one client that only makes one query at once?

Obviously not. But the performance is mostly interesting from the
client-perspective, that's the one your users are waiting on. If you can
manage to send every user's request to only a few memcached-instances,
regardless of how many there are, than the server side is basically just
resource planning.
So in that regard, there isn't really an upper limit to the amount of
servers. But for practicality, you'll likely be limited by the amount of
connections the clients will have to maintain, to actually effectively
use memcached. Although with current client libraries and smart use of
memcached, that may well be in the high thousands.

But from Dormando's response, I still gather that its actually a good
idea to not just add more servers. You'll have to do it in a smart way,
so your clients won't be connecting to a different server for every key
they look up (just maintaining the list of available servers will become
tricky at some point ;) ).

Post by Les Mikesell

I'm just wondering whether the connection and other tcp/ip overheads will be
outweighed by any load-spreading gains. Especially since memcached's part of
the job is fairly quick.

Les Mikesell

2011-11-27 17:51:31 UTC

Permalink

On Sun, Nov 27, 2011 at 10:58 AM, Arjen van der Meijden

Post by Arjen van der Meijden

Post by Les Mikesell
Are you only expecting to run one client that only makes one query at once?

Obviously not. But the performance is mostly interesting from the
client-perspective, that's the one your users are waiting on.

But which client? Usually if you need memcache to scale you will be
running many clients in parallel - and if they are doing single-key
operations in many cases adding more servers will make them completely
separate. It is only multi-gets with many small keys that don't
scale forever.

Post by Arjen van der Meijden
If you can
manage to send every user's request to only a few memcached-instances,
regardless of how many there are, than the server side is basically just
resource planning.
So in that regard, there isn't really an upper limit to the amount of
servers. But for practicality, you'll likely be limited by the amount of
connections the clients will have to maintain, to actually effectively use
memcached. Although with current client libraries and smart use of
memcached, that may well be in the high thousands.

Yes, it is more a matter of using smart clients. But still, you are
likely to have a problem with your backend persistent storage before
you get to that point - especially if you expect to recover from any
major failure that dumps most of your cache at once.

--
Les Mikesell
***@gmail.com

dormando

2011-11-27 23:17:07 UTC

Permalink

Post by Les Mikesell
But which client? Usually if you need memcache to scale you will be
running many clients in parallel - and if they are doing single-key
operations in many cases adding more servers will make them completely
separate. It is only multi-gets with many small keys that don't
scale forever.

If you enable persistent conns, you'll build up tons of connections (one
per memd per client process). If you don't use persistent conns, but
access memcached several times during a request, the odds of having to do
a 3-way handshake before every request increases.

Post by Les Mikesell
Yes, it is more a matter of using smart clients. But still, you are
likely to have a problem with your backend persistent storage before
you get to that point - especially if you expect to recover from any
major failure that dumps most of your cache at once.

Yes :) Reality trumps most of these issues in a few ways; keeping up with
your backing store is one. Another decent reality is that if you have an
installation that large, odds are you can afford specialists who can
ensure it continues to work.