Subscribe: ACM Queue - All Queue Content
http://queue.acm.org/rss/feeds/queuecontent.xml
Added By: Feedage Forager Feedage Grade B rated
Language: French
Tags:
analysis  cas  change  included  jam  power  reliability  scheduling systems  scheduling  system jam  system  systems  team  tenant systems 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: ACM Queue - All Queue Content

ACM Queue - All Queue Content





 



Watchdogs vs. Snowflakes

Tue, 10 Apr 2018 14:09:54 GMT

That a system can randomly jam doesn't just indicate a serious bug in the system; it is also a major source of risk. You don't say what your distributed job-control system controls, but let's just say I hope it's not something with significant, real-world side effects, like a power station, jet aircraft, or financial trading system. The risk, of course, is that the system will jam, not when it's convenient for someone to add a dummy job to clear the jam, but during some operation that could cause data loss or return incorrect results. I rather suspect that having a system like this jam while coordinating, for example, the balancing of electrical power across a power grid would have spectacular and perhaps fatal results.



Thou Shalt Not Depend on Me

Wed, 04 Apr 2018 12:55:55 GMT

Most websites use JavaScript libraries, and many of them are known to be vulnerable. Understanding the scope of the problem, and the many unexpected ways that libraries are included, are only the first steps toward improving the situation. The goal here is that the information included in this article will help inform better tooling, development practices, and educational efforts for the community.



How to Come up with Great Ideas

Thu, 29 Mar 2018 17:21:59 GMT

No matter what your profession, learning to think more innovatively and spark new ideas can help you. I have included some points and inspiration that have helped me, but the real key is changing your behavior and taking action.



Designing Cluster Schedulers for Internet-Scale Services

Tue, 20 Mar 2018 14:31:53 GMT

Engineers looking to build scheduling systems should consider all failure modes of the underlying infrastructure they use and consider how operators of scheduling systems can configure remediation strategies, while aiding in keeping tenant systems as stable as possible during periods of troubleshooting by the owners of the tenant systems.



Manual Work is a Bug

Wed, 14 Mar 2018 13:36:02 GMT

Every IT team should have a culture of constant improvement - or movement along the path toward the goal of automating whatever the team feels confident in automating, in ways that are easy to change as conditions change. As the needle moves to the right, the team learns from each other's experiences, and the system becomes easier to create and safer to operate. A good team has a structure in place that makes the process frictionless and collaborative



Canary Analysis Service

Tue, 06 Mar 2018 13:44:37 GMT

It is unreasonable to expect engineers working on product development or reliability to have statistical knowledge; removing this hurdle led to widespread CAS adoption. CAS has proven useful even for basic cases that don't need configuration, and has significantly improved Google's rollout reliability. Impact analysis shows that CAS has likely prevented hundreds of postmortem-worthy outages, and the rate of postmortems among groups that do not use CAS is noticeably higher.