Subscribe: Comments on Geeking with Greg: Replication, caching, and partitioning
Added By: Feedage Forager Feedage Grade B rated
Language: English
cache layer  cache  caching layer  caching  data grid  data store  data  database  front end  front  layer  memory data  memory 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Comments on Geeking with Greg: Replication, caching, and partitioning

Comments on Geeking with Greg: Replication, caching, and partitioning

Updated: 2017-12-16T21:51:47.424-08:00


If using "higher order" cache objects like page fr...


If using "higher order" cache objects like page fragments, aggregated or personalized lists, the cache layer is not equivalent to the database buffer or query(!) cache.

I think you are assuming the cache layer is a sepe...


I think you are assuming the cache layer is a seperate group of machines that can be dedicated to the database cluster instead. This may be common but it is not necessarily correct.

Taking JVM based Coherence as an example. The cache may share the same machines (even JVMs)as the application layer. This "layer" is not using extra machines it is using spare RAM and under-utilised network IO on existing machines. Let the applications use the CPU (the cache uses very little), and the cache use the spare RAM. This spare capacity cannot be utilised in the same way by database shards.


Hi GregI was wondering if your familiar also with ...


Hi Greg

I was wondering if your familiar also with In Memory Data Grid (read/write distributed caching). It looks like many of your assumptions applies to Memcache and read-only simple caching solution but not to Data-Grid which is essentially an In-memory transactional read/write data store.

I've recently wrote a detailed post on Scaling Out My SQL where i compare the pure database clustering approach with a combination of In-Memory-Data-Grid as front end and database as the backend data store. In general i think that decoupling the application from the underlying database as well as keeping pure in-memory data store to manage our data as the front-end data store is the most scalable and efficient solution. There are plenty of references, many of them servers today mission critical applications in the financial world in which this model proved to be successful.

While i think that pure database clustering is a viable solution i don't think that it is the best fit for every scenario i actually see more cases in Web2.0 and Real time applications where it would be the wrong solution.

Yes; I've had this conversation many times over th...


Yes; I've had this conversation many times over the years, and most recently last week. I've observed several efforts by people to outperform the database with memory caches, and all failed miserably. Of course, I've seen it succeed twice as well (once when I used a bloom filter instead of lookups), but doing it right is very niche and requires more knowledge of your architecture than is required to partition and tune your DB properly.

gongfermor has a point that the commodity caching layers can compensate for poor database design. But we're talking here about how to get maximum scale with minimum hardware, not how use extra caching servers to compensate for lack of DB skills.

Rather than putting layers in front to hide your s...


Rather than putting layers in front to hide your shame, why not fix that slow database?

I agree that fixing your slow database should be required, but I think a caching layer between your applications and the database is still invaluable.

The caching layer is generally extremely simple, whereas the database layer is largely a blackhole unless you know your database in-depth (what happens with query X using storage engine Y in database Z on _this_ particular system?).

A caching layer will never get stuck on a table lock (except maybe on that miss).

Very very few sites 'use' their entire database. Keeping that whole thing in memory when you use 5% of it for 90% of requests is a waste of resources (and often impractical).

Obviously you can tune the database for all of these scenarios, but throwing a caching layer up saves you a lot of time trying to learn some esoteric detail of a specific database.

On the flip side, if you rely excessively on your caching system/layer and it dies, you are very likely to be in a horrible scenario if you have not 'tuned' your database.

If the database layer can keep its working set in memory and its own caches are well designed, it should be able to exceed the performance and scalability of an architecture with a separate caching layer

I'd consider that a huge 'If'. I think in the real world the caching layer will almost always win that argument from a performance standpoint (I say this from personal experience and not with anything quantitative to bring to the table).

Hi!Partitioning is definitely one way to go... pro...



Partitioning is definitely one way to go... problem is with architecture you still are limited in the number of live queries you can run.

Writing an application for read through caching does not mean that you can just ignore the database (or any other data tiers) and hope to scale entirely via caching. I believe you still need to partition the data that you feed to the cache tier.

In the end though? Keep your front end working through the cache. Keep all of your data generation behind it.


BTW I would give you some shit over "rambling" but you are dead on in the description :)