### Chapter 32 Planning Data Centers规划数据中心 > Having collected an army and concentrated his forces, he must blend andharmonize the different elements thereof before pitching his camp.—Sun Tzu > 在集结军队并集中兵力之后,他必须在扎营之前融合和协调其中的不同元素。——孙子 One of the biggest limitations to hyper-growth companies scaling effectively today isthe data center. You can absolutely nail everything else in this book including build-ing and incenting the right team and behaviors, developing and implementing theright processes, and architecting the best solutions, and completely fail your custom-ers and shareholders by limiting growth due to ineffective data center planning.Depending upon your needs, approach, and size, it can take anywhere from monthsto years to bring additional data center or collocation space online. When contrastedwith the weeks or months that it might take to make significant architecturalchanges, management and leadership changes, and process changes, it is easy to seethat the data center can very easily and very quickly become your greatest barrier toscale and success. 如今,高速增长的公司有效扩展的最大限制之一是数据中心。您绝对可以掌握本书中的所有其他内容,包括建立和激励正确的团队和行为、开发和实施正确的流程以及构建最佳解决方案,并且由于无效的数据中心而限制增长,从而彻底让您的客户和股东失望规划。根据您的需求、方法和规模,可能需要数月到数年的时间才能将额外的数据中心或并置空间上线。与进行重大架构变更、管理和领导层变更以及流程变更可能需要数周或数月的时间相比,很容易看出数据中心可以非常轻松且快速地成为规模和成功的最大障碍。 This chapter will not give you everything you need to know to plan and manage adata center or collocation build out or move; to do so in enough detail to be mean-ingful to your efforts would require an entire book. Instead, we want to reinforce theneed for long-term data center planning as part of your engineering team efforts. Wewill also highlight some approaches that we hope will be meaningful to reduce youroverall costs as you start to implement multiple data centers and mitigate your busi-ness risks with disaster recovery and business continuity plans. We will also cover ata high level some of the drivers of data center costs and constraints. 本章不会向您提供规划和管理数据中心或配置扩建或迁移所需了解的所有信息;要足够详细地做到这一点,使你的努力有意义,需要一整本书。相反,我们希望加强对长期数据中心规划的需求,作为工程团队工作的一部分。当您开始实施多个数据中心并通过灾难恢复和业务连续性计划减轻业务风险时,我们还将重点介绍一些方法,我们希望这些方法对降低您的总体成本有意义。我们还将深入探讨数据中心成本和限制的一些驱动因素。 ####Data Center Costs and Constraints 数据中心成本和限制 In the last 15 years, something in data centers changed so slowly that few if any of uscaught on until it was just too late. This slow and steady movement should have beenobvious to us all as the data was right under our noses if we had only bothered tolook at it. But just as a military sniper moves very slowly into a firing position even asthe enemy watches but remains unaware of his presence, so did power consumptionand constraints sneak up on us and make us scramble to change our data centercapacity planning models. 在过去 15 年里,数据中心的某些事情发生了如此缓慢的变化,以至于几乎没有人能够及时发现,直到为时已晚。这种缓慢而稳定的变化对我们所有人来说都应该是显而易见的,因为如果我们只花时间去观察的话,这些数据就在我们眼皮底下。但是,就像军事狙击手非常缓慢地进入射击位置一样,即使敌人在观察但仍然没有意识到他的存在,功耗和限制也悄悄地降临到我们身上,让我们争先恐后地改变我们的数据中心容量规划模型。 For years, processors have increased in speed as observed by Gordon Moore andas described in Moore’s Law. This incredible increase in speed resulted in computersand servers drawing more power over time. The ratio of clock speed to power con-sumption varies with the technologies employed and the types of chips. Some chipsemployed technology to reduce clock speed and hence power consumption when idle,and multicore processors allegedly have lower power consumption for higher clockspeeds. But given similar chip set architectures, a faster processor will typically meanmore power consumption. 多年来,正如戈登摩尔所观察到的和摩尔定律所描述的那样,处理器的速度不断提高。这种令人难以置信的速度提升导致计算机和服务器随着时间的推移消耗更多的能量。时钟速度与功耗的比率随所采用的技术和芯片类型的不同而变化。一些芯片采用技术来降低时钟速度,从而降低空闲时的功耗,而多核处理器据称在更高的时钟速度下具有更低的功耗。但考虑到类似的芯片组架构,更快的处理器通常意味着更多的功耗。 Until the mid 1990s, most data centers had enough power capacity that the pri-mary constraint was the number of servers one could shoehorn into the footprint orsquare footage of the data center. As computers decreased in size in rack units or U’sand increased in clock speed, the data center became increasingly efficient. Efficiencyhere is measured strictly against the computing capacity per square foot of the datacenter, with more computers crammed into the same square footage and with eachcomputer having more clock cycles per second to perform work. This increase incomputing density also increased power consumption on a per square foot basis. Notonly did the computers themselves draw more power per square foot, they alsorequired more HVAC to cool the area and as a result even more power was drawn. Ifyou were lucky enough to be in a collocation facility with a contract wherein youwere charged by square foot of space, you weren’t likely aware of this increase incost; the cost was eaten by the collocation facility owner causing decreased marginsfor their services. If you owned your own data center, more than likely, the facilitiesteam identified the steady but slow increase in cost but did not pass that informationalong to you until you needed to use more space and found out that you were out ofpower. 直到 20 世纪 90 年代中期,大多数数据中心都拥有足够的电力容量,因此主要的限制是可以塞入数据中心占地面积或平方英尺的服务器数量。随着计算机机架单位或 U 单位尺寸的减小以及时钟速度的提高,数据中心变得越来越高效。这里的效率是严格根据数据中心每平方英尺的计算能力来衡量的,同一平方英尺内挤满了更多的计算机,并且每台计算机每秒有更多的时钟周期来执行工作。输入计算密度的增加也增加了每平方英尺的功耗。不仅计算机本身每平方英尺消耗更多的电力,它们还需要更多的暖通空调来冷却该区域,因此消耗了更多的电力。如果您足够幸运,在托管设施中签订了按平方英尺空间收费的合同,那么您可能不会意识到成本的增加;该成本被配置设施所有者承担,导致其服务利润下降。如果您拥有自己的数据中心,那么设施团队很可能会发现成本稳定但缓慢的增长,但直到您需要使用更多空间并发现您停电时才将该信息传递给您。 #####Rack Units 机架单元 A rack unit is a measurement of height in a 19-inch or 23-inch wide rack. Typically labeled as aU and sometimes labeled as an RU, the unit equals 1.75 inches in height. A 2U server is there-fore 3.5 inches in height. The term half rack is an indication of width rather than height and theterm is applied to a 19-inch rack. A half rack server or component, then, is 9.5 inches wide. 机架单位是 19 英寸或 23 英寸宽机架的高度测量单位。通常标记为 aU,有时标记为 RU,单位高度等于 1.75 英寸。因此,2U 服务器的高度为 3.5 英寸。术语“半机架”表示宽度而不是高度,该术语适用于 19 英寸机架。那么,半机架服务器或组件的宽度为 9.5 英寸。 This shift where power utilization suddenly constrained our growth caused a num-ber of interesting problems. The first problem manifested itself in an industry oncebased upon square footage assumptions. Collocation and managed server providersfound themselves in contracts for square footage largely predicated on a number ofservers. As previously indicated, the increase in power utilization decreased the mar-gins for their services until the provider could renegotiate contracts. The buyers ofthe collocation facilities in turn looked to move to locations where power density wasgreater. Successful providers of services changed their contracts to charge for bothspace and power, or strictly upon power used. The former allowed companies to flexprices based on the price of power, thereby lowering the variability within their oper-ating margins, whereas the former attempted to model power costs over time toensure that they were always profitable. Often, both would reduce the power andspace component to a number of racks and power utilization per rack within a givenfootprint in a data center. 电力利用突然限制了我们的增长,这种转变引发了许多有趣的问题。第一个问题出现在曾经基于平方英尺假设的行业中。配置和托管服务器提供商发现自己签订的面积合同主要取决于服务器的数量。如前所述,电力利用率的增加降低了其服务的利润,直到提供商可以重新谈判合同。并置设施的买家反过来希望搬到功率密度更大的地方。成功的服务提供商改变了他们的合同,对空间和电力收费,或者严格按照使用的电力收费。前者允许公司根据电力价格灵活定价,从而降低其营业利润的波动性,而前者试图对一段时间内的电力成本进行建模,以确保他们始终盈利。通常,两者都会将电力和空间组件减少到数据中心给定占地面积内的多个机架以及每个机架的电力利用率。 Companies that owned their own data centers and found themselves constrainedby power could not build new data centers quickly enough to allow themselves togrow. As such, they would turn to implement their growth in services within colloca-tion facilities until such time as they could build new data centers, often with morepower per square foot than the previous data centers. 拥有自己的数据中心并发现自己受到电力限制的公司无法足够快地建立新的数据中心以实现自身发展。因此,他们将转向在并置设施内实现服务增长,直到他们能够建造新的数据中心,通常每平方英尺的功率比以前的数据中心更高。 Regardless of whether we owned or leased space, the world changed underneathour feet and the new world order was that power, and not space, dictated our capac-ity planning for data centers. This has led to some other important aspects of datacenter planning, not all of which has been fully embraced or recognized by everycompany. 无论我们拥有还是租赁空间,世界都在我们脚下发生变化,新的世界秩序是权力,而不是空间,决定了我们数据中心的容量规划。这导致了数据中心规划的一些其他重要方面,但并非所有这些都已被每个公司完全接受或认可。 ####Location, Location, Location 位置,位置,位置 Most of us have heard of the real estate mantra “location, location, location.”Nowhere is this mantra more important these days than in the planning of owned orrented space for data centers. Data center location has an impact on nearly every-thing you do from fixed and variable costs through quality of service and to the riskyou impose upon your business. 我们大多数人都听说过房地产口头禅“位置、位置、位置”。如今,在规划自有或租赁的数据中心空间时,没有什么比这个口头禅更重要的了。数据中心的位置几乎对您所做的一切都有影响,从固定和可变成本到服务质量,再到您给业务带来的风险。 With the shift in data center constraints from location to power, so came a shift inthe fixed and variable costs of our product offerings. Previously, when space was theprimary driver and limitation of data center costs and capacity, location was stillimportant, but for very different reasons. When power was not as large a concern asit is today, we would look to build data centers in a place where land and buildingmaterials were cheapest. This would result in data centers being built in major metro-politan areas where land was abundant and labor costs were low. Often, companieswould factor in an evaluation of geographic risk and as a result, areas such as Dallas,Atlanta, Phoenix, and Denver became very attractive. Each of these areas offeredplenty of space for data centers, skills within the local population to build and main-tain them, and low geographic risk. 随着数据中心限制从位置到电力的转变,我们产品的固定成本和可变成本也发生了变化。以前,当空间是主要驱动因素以及数据中心成本和容量的限制时,位置仍然很重要,但原因截然不同。当电力不像今天那么重要时,我们会寻求在土地和建筑材料最便宜的地方建造数据中心。这将导致数据中心建在土地丰富、劳动力成本低廉的主要大都市地区。通常,公司会考虑地理风险的评估,因此,达拉斯、亚特兰大、菲尼克斯和丹佛等地区变得非常有吸引力。这些区域都为数据中心提供了充足的空间、当地居民构建和维护数据中心的技能,并且地理风险较低。 Smaller companies that rented or leased space and services and were less con-cerned about risk would look to locate data centers close to their employee popula-tion. This led to collocation providers building or converting facilities in companydense areas like the Silicon Valley, Boston, Austin, and the New York/New Jerseyarea. These smaller companies favored ease of access to the servers supporting theirnew services over risk mitigation and cost. Although the price was higher for thespace as compared to the lower cost alternatives, many companies felt the benefit ofproximity overcame the increase in relative cost. 租用或租赁空间和服务且不太关心风险的小型公司会寻求将数据中心设在靠近员工的地方。这导致主机配置提供商在硅谷、波士顿、奥斯汀和纽约/新泽西地区等公司密集地区建设或改造设施。这些较小的公司更喜欢轻松访问支持其新服务的服务器,而不是降低风险和成本。尽管与成本较低的替代方案相比,该空间的价格较高,但许多公司认为邻近的好处克服了相对成本的增加。 When power became the constraining factor for data center planning and utiliza-tion, companies started shifting their focus to areas where they could not only pur-chase and build for an attractive price, but where they could obtain power at arelatively low price, and perhaps as importantly, where they could use that powermost efficiently. This last point actually leads to some counterintuitive locations fordata centers. 当电力成为数据中心规划和利用的制约因素时,公司开始将重点转向不仅可以以有吸引力的价格购买和建设的领域,而且可以以相对较低的价格获得电力的领域,也许还可以同样重要的是,他们可以在哪里最有效地利用这种力量。最后一点实际上导致了数据中心的一些违反直觉的位置。 Air conditioning, the AC portion of HVAC, operates most efficiently at lower ele-vations above sea level. We won’t go into the reasons why, as there is plenty of infor-mation available on the Internet to confirm the statement. Air conditioning alsooperates more efficiently in areas of lower humidity, as less work is performed by theair conditioner to remove humidity from the air being conditioned. Low elevationabove sea level and low humidity work together to produce a more efficient air con-ditioning system; the system in turn draws less power to perform a similar amount ofwork. This is why areas like Phoenix, Arizona, which is a net importer of power andas such sometimes has a higher cost of power, are still a favorite of companies build-ing data centers. Although the per unit cost of power is higher than some other areas,and cooling demands are high in the summer, the efficiency of the HVAC systemsthrough the course of the year and the low winter month demand reduces overallpower consumption and makes Phoenix an attractive area to consider. 空调是暖通空调的空调部分,在海拔较低的地方运行效率最高。我们不会深究原因,因为互联网上有大量信息可以证实这一说法。空调在湿度较低的区域也能更有效地运行,因为空调从被调节的空气中去除湿气的工作量较少。低海拔和低湿度共同作用,产生更高效的空调系统;反过来,系统在完成相似工作量时消耗的电量更少。这就是为什么像亚利桑那州菲尼克斯这样的电力净进口地区,因此有时电力成本较高的地区仍然是建设数据中心的公司的最爱。尽管单位电力成本高于其他一些地区,并且夏季制冷需求较高,但 HVAC 系统全年的效率以及冬季较低的需求降低了总体功耗,使菲尼克斯成为一个有吸引力的地区考虑。 Some other interesting areas started to become great candidates for data centersdue to an abundance of low-cost power. The area served by the Tennessee ValleyAuthority (TVA), including the state of Tennessee, parts of western North Carolina,northwest Georgia, northern Alabama, northeast Mississippi, southern Kentucky,and southwest Virginia is one such place. Another favorite of companies buildingdata centers is the region called the Columbia River Gorge between Oregon andWashington. Both places have an abundance of low-cost power thanks to theirhydroelectric power plants. 由于拥有丰富的低成本电力,其他一些有趣的领域开始成为数据中心的绝佳候选者。田纳西河谷管理局 (TVA) 服务的地区包括田纳西州、北卡罗来纳州西部部分地区、佐治亚州西北部、阿拉巴马州北部、密西西比州东北部、肯塔基州南部和弗吉尼亚州西南部就是这样的地方。建设数据中心的公司最喜欢的另一个地区是俄勒冈州和华盛顿州之间的哥伦比亚河峡谷地区。由于水力发电厂,这两个地方都拥有丰富的低成本电力。 Location also impacts our quality of service. We want to be in an area that haseasy access to quality bandwidth, an abundance of highly available power, and aneducated labor pool. We would like the presence of multiple carriers to reduce ourtransit or Internet pipe costs and increase the likelihood that we can pass traffic overat least one carrier if one of the carriers is having availability or quality problems. Wewant the power infrastructure not only to be comparatively low cost as described inthe preceding paragraphs, but to be highly available with a low occurrence of inter-ruptions due to age of the power infrastructure or environmental concerns. Lastly, weneed to have an educated labor pool that can help us build the data center and oper-ate it. 位置也会影响我们的服务质量。我们希望所在的地区能够轻松获得优质带宽、丰富的高可用电力以及受过良好教育的劳动力资源。我们希望多个运营商的存在能够降低我们的运输或互联网管道成本,并增加我们在其中一个运营商出现可用性或质量问题时通过至少一个运营商传递流量的可能性。我们希望电力基础设施不仅具有前面段落中所述的相对较低的成本,而且具有高可用性,并且由于电力基础设施的老化或环境问题而发生的中断发生率较低。最后,我们需要拥有一支受过教育的劳动力队伍来帮助我们建设和运营数据中心。 Finally, location affects our risk profile. If we are in a single location with a singledata center and that location has high geographic risk, the probability that we sufferan extended outage as a result of that geographic risk increases. A geographic riskcan be anything that causes structural damage to our data center, power infrastruc-ture failures, or network transport failures. The most commonly cited geographicrisks to data centers and businesses are earthquakes, floods, hurricanes, and torna-does. But there are other location specific risks to consider as well. Crime rates of anarea have the possibility of interrupting services. Extremely cold or hot weather thattaxes the location’s power infrastructure can cause an interruption to operations. 最后,地点会影响我们的风险状况。如果我们位于具有单个数据中心的单个位置,并且该位置具有很高的地理风险,那么我们因该地理风险而遭受长时间停机的可能性就会增加。地理风险可以是导致数据中心结构性损坏、电力基础设施故障或网络传输故障的任何风险。最常提到的数据中心和企业地理风险是地震、洪水、飓风和龙卷风。但还需要考虑其他特定地点的风险。安娜地区的犯罪率有可能中断服务。极冷或极热的天气会给该地区的电力基础设施带来负担,可能会导致运营中断。 Even within a general geography, some areas have higher risks than others. Prox-imity to freeways can cause an increase in the likelihood of a major accident causingdamage to our facility or may increase the likelihood that our facility is evacuateddue to a chemical spill. Do we have quick and easy access to fuel sources in the areashould we need to use our backup generators? Does the area allow for easy access forfuel trucks for our generators? How close are we to a fire department? 即使在一般地理范围内,某些地区的风险也高于其他地区。靠近高速公路可能会增加发生重大事故对我们的设施造成损坏的可能性,或者可能会增加我们的设施因化学品泄漏而被疏散的可能性。如果我们需要使用备用发电机,我们是否可以快速轻松地获取该地区的燃料来源?该区域是否可以方便地为我们的发电机加油车?我们离消防队有多远? Although location isn’t everything, it absolutely impacts several areas critical toour cost of operations, quality of service, and risk profile. The right location canreduce our fixed and variable costs associated with power usage and infrastructure,increase our quality of service, and reduce our risk profile. 尽管位置并不是一切,但它绝对会影响运营成本、服务质量和风险状况等几个关键领域。正确的位置可以降低我们与电力使用和基础设施相关的固定和可变成本,提高我们的服务质量,并降低我们的风险状况。 In considering cost, quality of service, and risk, there is no single panacea. Inchoosing to go to the Columbia River Gorge or TVA areas, you will be reducing yourcosts at the expense of needing to train the local talent or potentially bringing in yourown talent as many companies have done. In choosing Phoenix or Dallas, you willhave access to an experienced labor pool but will be paying more for power than youwould be in either the TVA or Columbia River Gorge areas. There is no single rightanswer for location; you should work to optimize your solution to fit your budgetand needs. There are, however, several wrong answers in our minds. We always sug-gest to our clients that they never choose an area of high geographic risk unless theresimply is no other choice. Should they choose an area of high geographic risk, wealways ask that they create a plan to get out of that area. It only takes one major out-age to make the decision a bad one, and the question is always when, rather than if,that outage will happen. 在考虑成本、服务质量和风险时,没有单一的灵丹妙药。选择前往哥伦比亚河谷或 TVA 地区,您将降低成本,但代价是需要培训当地人才或可能像许多公司那样引进自己的人才。选择菲尼克斯或达拉斯时,您将有机会获得经验丰富的劳动力资源,但比在 TVA 或哥伦比亚河峡谷地区支付的电力费用更高。对于位置来说,没有单一的正确答案;您应该努力优化您的解决方案以满足您的预算和需求。然而,我们心中有几个错误的答案。我们总是建议我们的客户永远不要选择地理风险高的地区,除非别无选择。如果他们选择地理风险较高的区域,我们通常会要求他们制定离开该区域的计划。只要发生一次重大停电,就会做出错误的决定,而且问题始终是停电何时会发生,而不是是否会发生。 #####Location Considerations 位置考虑因素 When considering a location for your data center or collocation partner, the following are somethings to ponder: 在考虑数据中心或配置合作伙伴的位置时,需要考虑以下事项 * What is the cost of power in the area? Is it high or low relative to other options? How effi-ciently will my HVAC run in the area? * 该地区的电力成本是多少?相对于其他选项是高还是低?我的 HVAC 在该区域的运行效率如何? * Is there an educated labor force in the area from which I can recruit employees? Are they educated and experienced in the building and operation of a data center? * 我可以从该地区招募员工吗?他们在数据中心的建设和运营方面是否受过教育并有经验? * Are there a number of transit providers in the area and how good has their service been to other consumers of their service? * 该地区是否有许多交通提供商?他们的服务对其他消费者的服务质量如何? * What is the geographic risk profile in the area? * 该地区的地理风险状况如何? Often, you will find yourself making tradeoffs between questions. You may find an area oflow power cost, but without an experienced labor pool. The one area we recommend neversacrificing is geographic risk. 通常,您会发现自己在问题之间进行权衡。您可能会发现一个电力成本低的地区,但没有经验丰富的劳动力资源。我们建议永远不要牺牲的一个领域是地理风险。 ####Data Centers and Incremental Growth 数据中心和增量增长 Data centers and collocation space present an interesting dilemma to incrementalgrowth, and interestingly that dilemma is more profound for companies of moderategrowth than it is for companies of rapid or hyper growth. Data centers are, for manyof our technology focused clients, what factories are for companies that manufactureproduct; they are a way to produce the product, they are a limitation on the quantitythat can be produced, and they are either accretive or dilutive to shareholder value,depending upon their level of utilization. 数据中心和配置空间给增量增长带来了一个有趣的困境,有趣的是,这种困境对于中等增长的公司比对于快速或超增长的公司更为深刻。对于我们许多以技术为重点的客户来说,数据中心就像工厂对于制造产品的公司一样;它们是生产产品的一种方式,是对可生产数量的限制,并且它们可以增加或稀释股东价值,具体取决于其利用水平。 Taking and simplifying the automotive industry as an example, new factories areinitially dilutive as they represent a new expenditure of cash by the company in ques-tion. The new factory probably has the newest available technology, which in turn isintended to decrease the cost per vehicle produced and ultimately increase the grossmargin and profit margins of the business. Initially, the amortized value of the factoryis applied to each of the cars produced and the net effect is dilutive as initial productionquantities are low and each car loses money for the company. As the production quan-tity is increased and the amortized fixed cost of the factory is divided by an increasingvolume of cars, the profit of those cars in aggregate start to offset the cost and finallyovercome it. The factory starts to become accretive when the cost of the factory islower per car produced than the next lowest cost per car factory. Unfortunately, tohit that point, we often have to be using the factory at a fairly high level of utilization. 以汽车行业为例,新工厂最初是稀释性的,因为它们代表了相关公司新的现金支出。新工厂可能拥有最新的可用技术,这反过来又旨在降低每辆车的生产成本,并最终提高业务的毛利率和利润率。最初,工厂的摊销价值适用于生产的每辆汽车,由于初始产量较低,每辆汽车都会给公司带来损失,因此净效应会被稀释。随着生产数量的增加,工厂的摊销固定成本除以汽车数量的增加,这些汽车的利润总计开始抵消成本并最终克服成本。当工厂生产的每辆汽车的成本低于每家汽车工厂的下一个最低成本时,工厂开始变得增值。不幸的是,为了达到这一点,我们通常必须以相当高的利用率使用工厂。 The same holds true with a data center. The building of a data center usually rep-resents a fairly large expenditure for any company. For smaller companies that areleasing new or additional data center space, that space still probably represents afairly large commitment for the company in question. In many, if not most, of thecases where new space, rather purchased or lease, is being considered, we will likelyput better and faster hardware in the space than we had in most of our other datacenter space. Although increasing the power utilization and the associated powercosts, the hope is that we will reduce our overall spending by doing more with lessequipment for our new space. Still, we have to be using some significant portion ofthis new space before the recurring lease and power costs or amortized property,plant, and equipment costs are offset by the new transactions. 数据中心也是如此。对于任何公司来说,数据中心的建设通常都是相当大的支出。对于正在租用新的或额外的数据中心空间的小型公司来说,该空间可能仍然代表着该公司相当大的承诺。在许多(如果不是大多数)正在考虑购买或租赁新空间的情况下,我们可能会在该空间中放置比我们在大多数其他数据中心空间中更好、更快的硬件。尽管增加了电力利用率和相关的电力成本,但我们希望通过为新空间使用更少的设备做更多的事情来减少总体支出。尽管如此,在经常性租赁和电力成本或摊销财产、厂房和设备成本被新交易抵消之前,我们必须使用这个新空间的很大一部分。 To illustrate this point, let’s make up some hypothetical numbers. For this discus-sion, reference Table 32.1.Let’s say that you run the operations of AllScale Networksand that you currently lease 500 square feet of data center and associated power at atotal cost of $3,000.00 per month “all in.” You are currently constrained by powerwithin your 500 square feet and need to consider additional space quickly beforeyour systems are overwhelmed with user demand. You have options to lease another500 square feet at $3,000.00 per month, or 1,000 square feet at $5,000.00 permonth. The costs to build out the racks and power infrastructure (but not the serveror network gear) within the 500 square feet are $10,000.00 and $20,000.00 for the1,000 square feet. The equipment that you expect to purchase and put into the newspace has been tested for your application, and you believe that it will handle about50% more traffic or requests than your current systems (indexed at 1.5, the original500 square foot Request Index in Table 32.1), but it will draw about 25% morepower to do so (indexed at 1.25% the original Request Index in Table 32.1). As such,given the power density is the same between the cages, you can only rack roughly80% of the systems you previously had as each system draws 1.25u the power of theprevious systems. These are represented as .8 and 1.6 under Space Index in Table32.1 for the 500 square foot and 1,000 square foot options, respectively. The result-ing performance efficiency is (.8 u 1.5 =) 1.2u the throughput for the 500 square footoption and 2.4u for the 1,000 square foot option referenced as the PerformanceIndex. Finally, the performance per dollar spent, as indexed to the original 500square foot cage, is 1.2 for the 500 square foot and 1.44 for the 1,000 square footoptions. This change is due to the addition of 500 square feet more at a reduced priceof $2,000 for the 1,000 square foot option. Note that we’ve ignored the originalbuild out cost in this calculation, but you could amortize that over the expected lifeof the cage and include it in all of the numbers that follow. We also assume that youare not going to replace the systems in the older 500 square foot cage and we havenot included the price of the new servers. 为了说明这一点,让我们编一些假设的数字。对于此讨论,请参考表 32.1。假设您运行 AllScale Networks 的运营,并且当前以每月 3,000.00 美元的总成本租赁 500 平方英尺的数据中心和相关电力。目前,您的 500 平方英尺内受到电源的限制,需要在您的系统因用户需求而不堪重负之前迅速考虑额外的空间。您可以选择以每月 3,000.00 美元的价格再租赁 500 平方英尺,或以每月 5,000.00 美元的价格租赁 1,000 平方英尺。在 500 平方英尺内建造机架和电力基础设施(但不包括服务器或网络设备)的成本为 10,000.00 美元,在 1,000 平方英尺内建造机架和电力基础设施的成本为 20,000.00 美元。您期望购买并放入新空间的设备已经针对您的应用程序进行了测试,并且您相信它将比您当前的系统处理大约 50% 的流量或请求(索引为 1.5,表 32.1 中的原始 500 平方英尺请求索引) ),但这样做会消耗大约 25% 的能量(索引为表 32.1 中原始请求索引的 1.25%)。因此,考虑到机柜之间的功率密度相同,您只能机架大约 80% 的先前系统,因为每个系统消耗的功率是先前系统的 1.25u。对于 500 平方英尺和 1,000 平方英尺选项,这些在表 32.1 的空间指数下分别表示为 0.8 和 1.6。最终的性能效率为 (.8 u 1.5 =) 1.2u(500 平方英尺选项的吞吐量)和 2.4u(1,000 平方英尺选项的吞吐量)(作为性能指数引用)。最后,按照原始 500 平方英尺笼子的指数,每美元支出的性能对于 500 平方英尺选项为 1.2,对于 1,000 平方英尺选项为 1.44。这一变化是由于 1,000 平方英尺选项以 2,000 美元的降价增加了 500 平方英尺。请注意,我们在此计算中忽略了原始建造成本,但您可以在笼子的预期寿命内摊销该成本,并将其包含在后面的所有数字中。我们还假设您不会更换旧的 500 平方英尺笼子中的系统,并且我们没有包括新服务器的价格。 ![](https://blog.baidu-google.com/usr/uploads/2024/06/1204680735.png) ![](https://blog.baidu-google.com/usr/uploads/2024/06/2061601244.png) It previously took you two years to fill up your 500 square feet, and the businessbelieves the growth rate has doubled and should stay on its current trajectory. Allindicators are that you will fill up another 500 square feet in about a year. What doyou do? 以前你花了两年时间才填满你的 500 平方英尺,而企业相信增长率已经翻了一番,应该保持目前的轨迹。所有指标都表明您将在大约一年内再填满 500 平方英尺。你做什么工作? We’re not going to answer that question, but rather leave it as an exercise for you.The answer, however, is financially based if answered properly. It should considerhow quickly the data center becomes accretive or margin positive for the businessand shareholders. You should also factor in considerations of lost opportunity forbuilding out data center space twice rather than once. It should be obvious by nowthat data center costs are “lumpy” in that they are high relative to many of yourother technology costs and take some planning to ensure that they do not negativelyimpact your operations. 我们不会回答这个问题,而是将其作为练习留给您。但是,如果回答正确,答案是基于财务的。它应该考虑数据中心为企业和股东带来增值或利润的速度有多快。您还应该考虑两次而不是一次扩建数据中心空间的机会的损失。现在应该很明显,数据中心成本是“不稳定的”,因为它们相对于您的许多其他技术成本而言较高,并且需要进行一些规划以确保它们不会对您的运营产生负面影响。 How about our previous assertion that the problem is larger for moderate-growthcompanies than hyper-growth companies? Can you see why we made that statement?The answer is that the same space purchased or leased by a hyper-growth company isgoing to become accretive faster than that of the moderate-growth company. As such,space considerations are much more important for slower growth companies unlessthey expect to exit other facilities and close them over time. The concerns of thehyper-growth company are more about staying well ahead of the demand for newspace than ensuring that the space hits the accretive point of utilization. 我们之前的断言,即中等增长公司的问题比高速增长公司更严重,怎么样?你能明白我们为什么这么说吗?答案是,高速增长的公司购买或租赁的相同空间将比中等增长的公司更快地增值。因此,空间考虑对于增长较慢的公司来说更为重要,除非他们希望随着时间的推移退出其他设施并关闭它们。高速增长的公司更关心的是如何远远领先于新空间的需求,而不是确保空间达到利用率的增长点。 ####Three Magic Rules of Three 三个三的魔法规则 We love simple, easily understood and communicated rules, and one of these is ourRules of Three as applied to data centers. There are three of these rules, hence “ThreeMagic Rules of Three.” The first rule has to do with the costs of data centers, the sec-ond has to do with the number of servers, and third has to do with the number ofdata centers a company should consider implementing. 我们喜欢简单、易于理解和沟通的规则,其中之一就是应用于数据中心的三规则。这些规则共有三个,因此称为“三魔法规则”。第一条规则与数据中心的成本有关,第二条规则与服务器的数量有关,第三条规则与公司应考虑实施的数据中心的数量有关。 #####The First Rule of Three: Three Magic Drivers of Data Center Costs 三第一法则:数据中心成本的三个神奇驱动因素 Our first rule of three concerns the cost of running a data center. The first and mostobvious cost within a data center is the cost of the servers and equipment that carriesrequests and acts upon them. These are the servers and network equipment necessaryto run your application or platform. The second cost is the power to run these serversand other pieces of equipment. The third and final cost is the power necessary to runthe HVAC for these servers. This isn’t a hard and fast rule. Rather, it is intended tofocus companies on the large costs of running a data center as too often these costsare hidden within different organizations and not properly evaluated. 我们的三条规则中的第一条涉及数据中心的运行成本。数据中心内第一个也是最明显的成本是承载请求并对其进行操作的服务器和设备的成本。这些是运行应用程序或平台所需的服务器和网络设备。第二个成本是运行这些服务器和其他设备的电力。第三个也是最后一个成本是为这些服务器运行 HVAC 所需的电力。这不是一个硬性规定。相反,它的目的是让公司关注运行数据中心的巨额成本,因为这些成本往往隐藏在不同的组织中并且没有得到适当的评估。 These costs tend to increase directly in relationship to the number of servers. Eachserver obviously has its own cost, as does the power it draws. The HVAC needs alsotypically increase linearly with the number of servers and the power consumption ofthose servers. More servers drawing more power create more heat that needs to bemoved or reduced. In many companies, especially larger companies, this relationshipis lost within organizational budget boundaries. 这些成本往往与服务器数量直接相关。每个服务器显然都有自己的成本,其消耗的电力也是如此。 HVAC 需求通常还随着服务器数量和这些服务器的功耗线性增加。更多的服务器消耗更多的电量会产生更多的热量,需要转移或减少。在许多公司,尤其是大公司中,这种关系在组织预算范围内消失了。 There are other obvious costs not included within this first rule of three. Forinstance, we need headcount to run the data center, or we are paying for a contractfor someone else to run the data center or it is included within our rental/lease agree-ment. There are also network transit costs as we are going to want to talk to someoneelse outside of our data center. There may also be security costs, the costs of main-taining certain pieces of equipment such as FM-200 fire suppression devices, and soon. These costs, however, tend to be well understood and are often either fixed by theamount of area, as in security and FM-200 maintenance, or clearly within a singleorganization’s budget such as network transit costs. 还有其他明显的成本未包含在第一个三条规则中。例如,我们需要人员来运行数据中心,或者我们正在为其他人运行数据中心的合同付费,或者它包含在我们的租赁/租赁协议中。还有网络传输成本,因为我们想要与数据中心之外的其他人交谈。还可能存在安全成本、维护某些设备(例如 FM-200 灭火装置)的成本等。然而,这些成本往往是很好理解的,并且通常要么由面积大小决定,如安全和 FM-200 维护,要么明确在单个组织的预算内,如网络传输成本。 #####The Second Rule of Three: Three Is the Magic Number for Servers 三的第二条规则:三是服务器的神奇数字 Many of our clients have included this “magic rule” as an architectural principle.Simply put, it means that the number of servers for any service should never fallbelow three and when planning data center capacity, you should consider all of theexisting services and future or planned services and expect that there will be at leastthree servers for any service. The thought here is that you build one for the customer,one for capacity and growth, and one to fail. Ideally, the service is built and the datacenter is planned in such a way that services can expand horizontally per our ScaleOut Not Up principle in Chapter 12, Exploring Architectural Principles. 我们的许多客户都将这一“神奇规则”作为架构原则。简单地说,这意味着任何服务的服务器数量永远不应低于三台,并且在规划数据中心容量时,应考虑所有现有服务以及未来或计划服务并期望任何服务至少有三台服务器。这里的想法是,您为客户构建一个,一个用于容量和增长,一个用于失败。理想情况下,服务的构建和数据中心的规划方式使得服务可以根据第 12 章“探索架构原则”中的“横向扩展而非向上”原则进行水平扩展。 Taken to its extreme for hyper-growth sites, this rule would be applied to data cen-ter capacity planning not only for front-end Web services, but for data storage ser-vices such as a database. If a service requires a database, and if finances allow, theservice should be architected such that at the very least there can be a write databasefor writes and load balanced reads, an additional read database, and a database thatcan serve as a logical standby in the case of corruption. In a fault isolative architec-ture or swim lane architecture by service, there may be several of these databaseimplementations. 对于高速增长的站点来说,这一规则的极端应用是,不仅适用于前端 Web 服务,还适用于数据库等数据存储服务的数据中心容量规划。如果服务需要数据库,并且财务允许,那么该服务的架构应该至少有一个用于写入和负载平衡读取的写入数据库、一个额外的读取数据库以及一个可以充当逻辑备用数据库的数据库。腐败案件。在故障隔离架构或服务泳道架构中,可能有多个这样的数据库实现。 It’s important to note that no data center decisions should be made without con-sulting the architects, product managers, and capacity planners responsible for defin-ing, designing, and planning new and existing systems. 值得注意的是,在制定数据中心决策时,必须咨询负责定义、设计和规划新系统和现有系统的架构师、产品经理和容量规划人员。 #####The Third Rule of Three: Three Is the Magic Number for Data Centers 第三个三法则:三是数据中心的神奇数字 “Whoa, hang on there!” you might say. “We are a young company attempting tobecome profitable and we simply cannot afford three data centers.” At first blush,this probably appears to be a ridiculous suggestion and we don’t blame you for hav-ing such an initial adverse reaction to the suggestion. But what if we told you thatyou can run out of three data centers for close to the cost that it takes you to run outof two data centers? Few of you would probably argue that you can afford to run outof a single data center forever as most of us recognize that running a single data cen-ter for mission critical or revenue critical transactions is just asking for trouble. Andif you are a public company, no one wants to make the public disclosure that “Anysignificant damage to our single data center would significantly hamper our ability toremain a going concern.” Let’s first discuss the primary architectural shift that allowsyou to run out of multiple data centers. 哇哦,坚持住!”你可能会说。 “我们是一家年轻的公司,试图实现盈利,但我们根本无力承担三个数据中心的费用。”乍一看,这可能是一个荒谬的建议,我们不会责怪您对该建议产生如此强烈的不良反应。但是,如果我们告诉您,您可以以接近耗尽两个数据中心所需的成本来耗尽三个数据中心,该怎么办?你们中很少有人会认为,您有能力永远使用单个数据中心,因为我们大多数人都认识到,为关键任务或收入关键交易运行单个数据中心只是自找麻烦。如果你是一家上市公司,没有人愿意公开披露“对我们单个数据中心的任何重大损害都会严重妨碍我们保持持续经营的能力。”让我们首先讨论允许您运行多个数据中心的主要架构转变。 In Chapter 12, we suggested designing for multiple live sites as an architecturalprinciple. To do this, you need either stateless systems; or systems that maintain statewithin the browser (say with a cookie) or pass state back and forth through the sameURL/URI. After you establish affinity with a data center and maintain state at thatdata center, it becomes very difficult to serve the transaction from other live data cen-ters. Another approach is to maintain affinity with a data center through the courseof a series of transactions, but allow a new affinity for new or subsequent sessions tobe maintained through the life of those sessions. Finally, you can consider segmentingyour customers by data center along a z-axis split, and then replicate the data foreach data center, split evenly through the remainder of the data centers. In thisapproach, should you have three data centers, 50% of the data from data center Awould move to data centers B and C. This approach is depicted in Figure 32.1.Theresult is that you have 200% of the data necessary to run the site in aggregate, buteach site only contains 66% of the necessary data as each site contains the copy forwhich it is a master (33% of the data necessary to run the site) and 50% of the copiesof each of the other sites (16.5% of the data necessary to run the site for a total of anadditional 33%). 在第 12 章中,我们建议将多个实时站点的设计作为架构原则。为此,您需要无状态系统;或在浏览器内维护状态(例如使用 cookie)或通过相同 URL/URI 来回传递状态的系统。在与数据中心建立关联并维护该数据中心的状态后,从其他实时数据中心提供事务服务变得非常困难。另一种方法是在一系列事务的过程中保持与数据中心的亲和力,但允许在新会话或后续会话的整个生命周期中保持对这些会话的新亲和力。最后,您可以考虑沿 z 轴分割按数据中心对客户进行细分,然后复制每个数据中心的数据,在其余数据中心中均匀分割。在这种方法中,如果您拥有三个数据中心,则来自数据中心 A 的 50% 的数据将移动到数据中心 B 和 C。这种方法如图 32.1 所示。结果是您拥有运行站点所需的 200% 的数据总的来说,但每个站点仅包含 66% 的必要数据,因为每个站点包含其作为主站点的副本(运行该站点所需的数据的 33%)和其他每个站点的 50% 的副本(16.5%运行网站所需的数据总共额外增加了 33%)。 Let’s discuss the math behind our assertion. We will first assume that you agreewith us that you need to have at least two data centers to help ensure that you cansurvive any disaster. If these data centers were labeled A and B, you might decide tooperate 100% of your traffic out of data center A and leave data center B for a warmstandby. Back-end databases might be replicated using native database replication ora third-party tool and may be several seconds behind. You would need 100% of yourcomputing and network assets in both data centers to include 100% of your Web andapplication servers, 100% of your database servers, and 100% of your networkequipment. Power needs would be similar and Internet connectivity would be similar.You probably keep slightly more than 100% of the capacity necessary to serve yourpeak demand in each location in order to handle surges in demand. So, let’s say thatyou keep 110% of your needs in both locations. Any time you buy additional serversfor one place, you have to buy for the next. You may also decide to connect the datacenters with your own dedicated circuits for the purposes of secure replication ofdata. Running live out of both sites would help you in the event of a major catastro-phe as only 50% of your transactions would initially fail until you transfer that traf-fic to the alternate site, but it won’t help you from a budget or financial perspective.A high-level diagram of the data centers may look as depicted in Figure 32.2. 让我们讨论一下我们的断言背后的数学原理。我们首先假设您同意我们的观点,即您需要至少两个数据中心来帮助确保您能够在任何灾难中幸存下来。如果这些数据中心被标记为 A 和 B,您可能会决定将 100% 的流量从数据中心 A 中运出,并让数据中心 B 进行热备用。后端数据库可能使用本机数据库复制或第三方工具进行复制,并且可能会落后几秒钟。您需要 100% 的计算和网络资产位于两个数据中心,以包括 100% 的 Web 和应用程序服务器、100% 的数据库服务器以及 100% 的网络设备。电力需求将相似,互联网连接也将相似。您可能会保留略高于 100% 的容量来满足每个位置的峰值需求,以应对需求激增。因此,假设您在两个位置都保留了 110% 的需求。每当您为一个地方购买额外的服务器时,您都必须为下一个地方购买服务器。您还可以决定使用自己的专用电路连接数据中心,以实现数据的安全复制。在这两个站点上实时运行将在发生重大灾难时为您提供帮助,因为只有 50% 的交易最初会失败,直到您将该流量转移到备用站点,但这对您的预算没有帮助数据中心的高级图可能如图 32.2 所示。 ![](https://blog.baidu-google.com/usr/uploads/2024/06/2359835479.png) However, if we have three sites and we run live out of all three sites at once, ourcost for systems goes down. This is because for all nondatabase systems, we onlyreally need 150% of our capacity in each location to run 100% of our traffic in theevent of a site failure. For databases, we definitely need 200% of the storage as com-pared to one site, but we only really need 150% of the processing power if we aresmart about how we allocate our database server resources. Power and facilities con-sumption should also be at roughly 150% of the need for a single site, though obvi-ously we will need slightly more people and there’s probably slightly more overheadthan 150% to handle three sites versus one. The only area that increases dispropor-tionately are the network interconnects as we need two additional connections (ver-sus one) for three sites versus two. Such a data center configuration is indicated inFigure 32.3 and Table 32.2 shows the relative costs of running three sites versus two.Note that in our Table 32.2, we have figured that each site has 50% of the servercapacity necessary to run everything, and 66% (66.66, but we’ve made it a roundnumber and rounded down rather than up in the figure) of the storage per Figure33.3.You would need 300% of the storage if you were to locate 100% of the data ineach of the three sites. 但是,如果我们拥有三个站点并且同时在所有三个站点上运行,那么我们的系统成本就会下降。这是因为对于所有非数据库系统,我们实际上只需要每个位置 150% 的容量来在站点发生故障时运行 100% 的流量。对于数据库,与一个站点相比,我们肯定需要 200% 的存储,但如果我们明智地分配数据库服务器资源,我们实际上只需要 150% 的处理能力。电力和设施消耗也应约为单个站点需求的 150%,尽管显然我们需要稍微多一点的人员,而且处理三个站点与处理一个站点的开销可能比 150% 稍多一些。唯一不成比例增加的领域是网络互连,因为我们需要两个额外的连接(相对于一个)三个站点而不是两个站点。这样的数据中心配置如图 32.3 所示,表 32.2 显示了运行三个站点与两个站点的相对成本。请注意,在表 32.2 中,我们计算出每个站点拥有运行所有内容所需的 50% 的服务器容量,以及 66% 的服务器容量。 (66.66,但我们将其设为整数,并在图中向下舍入而不是向上舍入)根据图 33.3 的存储空间。如果要在每个存储空间中查找 100% 的数据,则需要 300% 的存储空间。三个站点。 ![](https://blog.baidu-google.com/usr/uploads/2024/06/1169075869.png) ![](https://blog.baidu-google.com/usr/uploads/2024/06/264900397.png) ![](https://blog.baidu-google.com/usr/uploads/2024/06/1628517478.png) Note that we get this leverage in data centers because we expect that the data cen-ters are sufficiently far apart so as not to have two data centers simultaneously elimi-nated as a result of any geographically isolated event. You might decide to stick onenear the West Coast of the United States, one in the center of the U.S., and anothernear the East Coast. Remember, however, that you still want to reduce your data cen-ter power costs and reduce the risks to each of the three data centers so you still wantto be in areas of low relative cost of power and low geographic risk. 请注意,我们在数据中心中获得这种优势是因为我们希望数据中心相距足够远,以免两个数据中心因任何地理上孤立的事件而同时被淘汰。您可能决定将一个放在美国西海岸附近,一个放在美国中部,另一个放在东海岸附近。但请记住,您仍然希望降低数据中心的电力成本并降低三个数据中心各自的风险,因此您仍然希望处于相对电力成本较低且地理风险较低的区域。 Maybe now you are a convert to our three-site approach and you immediatelyjump to the conclusion that more is better! Why not four sites, or five, or 20? Well,more sites are better, and there are all sorts of games you can play to further reduceyour capital costs. But at some point, unless you are a very large company, the man-agement overhead of a large number of data centers is cost prohibitive. Each addi-tional data center will give you some reduction in the amount of equipment that youneed to have complete redundancy, but will increase the management overhead andnetwork connectivity costs. To arrive at the “right number” for your company, youshould take the example of Table 32.2 and add in the costs to run and manage thedata centers to determine the right number for your company. While you are per-forming your cost calculations, remember that there are other benefits to multipledata centers such as ensuring that those data centers are close to end customer con-centrations in order to reduce customer response times. Our point is that you shouldplan for at least three data centers to both give you disaster prevention and reduceyour costs relative to a two-site implementation. 也许现在您已经转变为我们的三站点方法,并且您立即得出结论:越多越好!为什么不是四个、五个或二十个站点?好吧,网站越多越好,而且您可以玩各种游戏来进一步降低资金成本。但在某些时候,除非您是一家非常大的公司,否则大量数据中心的管理开销是令人望而却步的。每个额外的数据中心都会在一定程度上减少实现完全冗余所需的设备数量,但会增加管理开销和网络连接成本。为了得出适合您公司的“正确数字”,您应该以表 32.2 为例,并添加运行和管理数据中心的成本,以确定适合您公司的正确数字。在进行成本计算时,请记住,多个数据中心还有其他好处,例如确保这些数据中心靠近最终客户集中地,以减少客户响应时间。我们的观点是,您应该规划至少三个数据中心,以便为您提供灾难预防并降低相对于两个站点实施的成本。 ####Multiple Active Data Center Considerations 多个活动数据中心注意事项 We hinted at some of the concerns of running multiple active data centers in our ear-lier discussion of why three is the magic number for data centers. We will first coversome of the benefits and concerns of running multiple live data centers and then wewill discuss three flavors of approaches and the concerns unique to each of thoseapproaches. 我们在之前关于为什么 3 是数据中心的神奇数字的讨论中暗示了运行多个活动数据中心的一些问题。我们将首先介绍运行多个实时数据中心的一些好处和问题,然后我们将讨论三种方法以及每种方法特有的问题。 The two greatest benefits of running multiple live data centers are the disasterrecovery, or as we prefer to call it, disaster prevention, aspects and the reduction incost when running three data centers versus two. Designing and running multipledata centers also gives you the flexibility of putting data centers closer to your cus-tomers and thereby reducing response times to their requests. A multidatacenterapproach does not eliminate the benefits you would receive by deploying a contentdelivery network as described in Chapter 25, Caching for Performance and Scale, butit does benefit those calls that are forced to go directly to the data center due to theirdynamic nature. If you are leasing data center space, you also get the benefit of beingable to multisource your collocation partners, and as a result, use the market to drivedown the negotiated price for your space. Should you ever need or desire to leave onelocation, you can run live out of two data centers and move to a lower cost or higherquality provider for your third data center. If you are a SaaS (Software as a Service)company, you may find it easier to roll out or push updates to your site by movingtraffic between data centers and upgrading sites one at a time during off peak hours.Finally, when you run live out of multiple data centers, you don’t find yourself ques-tioning the viability of your warm or cold “disaster recovery site;” your daily opera-tions proves that each of the sites is capable of handling requests from yourcustomers. 运行多个实时数据中心的两个最大好处是灾难恢复,或者我们更喜欢称之为灾难预防,以及运行三个数据中心相对于两个数据中心时的成本降低。设计和运行多个数据中心还可以让您灵活地将数据中心放置在离客户更近的地方,从而减少对客户请求的响应时间。多数据中心方法不会消除您通过部署内容交付网络所获得的好处(如第 25 章“性能和规模缓存”中所述),但它确实有利于那些由于其动态性质而被迫直接转至数据中心的呼叫。如果您租赁数据中心空间,您还可以获得能够多源托管合作伙伴的优势,从而利用市场来压低空间的协商价格。如果您需要或希望离开一个位置,您可以在两个数据中心中运行,并为您的第三个数据中心转移到成本更低或质量更高的提供商。如果您是一家 SaaS(软件即服务)公司,您可能会发现,通过在数据中心之间移动流量并在非高峰时段一次升级一个站点,可以更轻松地向您的站点推出或推送更新。最后,当您实时运行时在多个数据中心中,您不会发现自己质疑热或冷“灾难恢复站点”的可行性;您的日常运营证明每个站点都有能力处理客户的请求。 Multiple live data centers do add some complexity and will likely increase yourheadcount needs as compared to running a single data center and maybe even twodata centers. The increase in headcount should be moderate, potentially adding oneto a few people to manage the contracts and space depending upon the size of yourcompany. Some of your processes will need to change, such as how and when youroll code and how you ensure that multiple sites are roughly consistent with respectto configuration. You will likely also find members of your team travelling moreoften to visit and inspect sites should you not have full-time employees dedicated toeach of your centers. Network costs are also likely to be higher as you add networklinks between sites for intersite communication. 与运行单个数据中心甚至两个数据中心相比,多个实时数据中心确实会增加一些复杂性,并且可能会增加您的人员需求。员工人数的增加应该是适度的,根据公司的规模,可能会增加一个或几个人来管理合同和空间。您的某些流程需要更改,例如如何以及何时滚动代码以及如何确保多个站点在配置方面大致一致。如果您没有专门负责每个中心的全职员工,您可能还会发现您的团队成员会更频繁地访问和检查现场。当您在站点之间添加网络链接以进行站点间通信时,网络成本也可能会更高。 From an architecture perspective, to gain the full advantage of a multisite configu-ration, you should consider moving to a near stateless system with no affinity to adata center. You may of course decide that you are going to route customers based onproximity using a geo-locator service, but you want the flexibility of determiningwhen to route what traffic to what data center. In this configuration, where data ispresent at all three data centers and there is no state or session data held solely withina single data center, a failure of a service or failure of the entire data center allows theend user to nearly seamlessly fail over to the next available data center. This results inthe highest possible availability for any configuration possible. 从体系结构的角度来看,为了充分利用多站点配置,您应该考虑迁移到与数据中心没有关联的近乎无状态的系统。当然,您可能决定使用地理定位器服务根据邻近程度来路由客户,但您希望能够灵活地确定何时将哪些流量路由到哪个数据中心。在此配置中,数据存在于所有三个数据中心,并且不存在仅在单个数据中心内保存的状态或会话数据,服务故障或整个数据中心故障允许最终用户几乎无缝地故障转移到下一个数据中心。可用的数据中心。这使得任何可能的配置都具有最高的可用性。 In the case where some state or affinity is required, or the cost of architecting outstate and affinity is simply too high, you need to make the choice of whether to failall transactions or sessions and force them to restart should a data center or servicego down or find a way to replicate the state and session to at least one more data cen-ter. This increases the cost slightly as now you need to either build or buy a replica-tion engine for the user state information and you will need additional systems orstorage to handle it. 在需要某种状态或亲和力的情况下,或者构建外状态和亲和力的成本太高,您需要选择是否在数据中心或服务出现故障时使所有事务或会话失败并强制它们重新启动。找到一种方法将状态和会话复制到至少一个数据中心。这会稍微增加成本,因为现在您需要为用户状态信息构建或购买复制引擎,并且您将需要额外的系统或存储来处理它。 #####Multiple Live Site Considerations 多个实时站点注意事项 Multiple live site benefits include 多个实时站点的好处包括 * Higher availability as compared to a hot and cold site configuration * 与热站点和冷站点配置相比,可用性更高 * Lower costs compared to a hot and cold site configuration * 与热站点和冷站点配置相比,成本更低 * Faster customer response times if customers are routed to the closest data center for dynamic calls * 如果将客户路由到最近的数据中心进行动态呼叫,则客户响应时间会更快 * Greater flexibility in rolling out products in a SaaS environment * 在 SaaS 环境中推出产品具有更大的灵活性 * Greater confidence in operations versus a hot and cold site configuration * 与冷热站点配置相比,操作更有信心 Drawbacks or concerns of a multiple live site configuration include 多个实时站点配置的缺点或问题包括 * Greater operational complexity * 操作复杂性更高 * Small increase in headcount needs * 人员需求小幅增加 * Increase in travel and network costs * 差旅和网络成本增加 * Increase in operational complexity * 操作复杂性增加 Architectural considerations in moving to a multiple live site environment include 迁移到多实时站点环境时的架构考虑因素包括 * Eliminate the need for state and affinity wherever possible * 尽可能消除对状态和亲和力的需求 * Route customers to closest data center if possible to reduce dynamic call times * 如果可能,将客户路由到最近的数据中心,以减少动态呼叫时间 * Investigate replication technologies for databases and state if necessary * 研究数据库的复制技术并在必要时说明 ####Conclusion 结论 This chapter discussed the unique constraints that data centers create for hyper-growth companies, data center location considerations, and the benefit of designingfor and operating out of multiple live data centers. As such, when consideringoptions for where to locate data centers, one should consider areas that provide thelowest cost of power with high quality and availability of power. Another major loca-tion based criteria is the geographic risk in any given area. Companies should ideallylocate data centers in areas with low geographic risk, low cost of power, and highpower efficiency for air conditioning systems. 本章讨论了数据中心为高速增长的公司带来的独特限制、数据中心位置考虑因素以及设计和运营多个实时数据中心的好处。因此,在考虑选择数据中心的选址时,应考虑提供最低电力成本、高质量且可用的电力的区域。另一个主要的基于位置的标准是任何给定区域的地理风险。公司应该将数据中心理想地安置在地理风险低、电力成本低、空调系统能效高的地区。 Data center growth and capacity need to be evaluated and planned out months oreven years in advance based on whether you lease or purchase data center space andhow much space you need. Finding yourself in a position needing to immediatelyenter into contracts and occupy space puts your business at risk and at the very leastreduces your negotiating leverage and causes you to pay more money for space. Afailure to plan for data center space and power needs well in advance could hinderyour ability to grow. 数据中心的增长和容量需要根据您是否租赁或购买数据中心空间以及需要多少空间,提前数月甚至数年进行评估和规划。发现自己需要立即签订合同并占用空间,这会让您的业务面临风险,至少会降低您的谈判筹码,并导致您为空间支付更多费用。未能提前规划数据中心空间和电力需求可能会阻碍您的发展能力。 When planning data centers, remember to apply the three magic rules of three.The first rule is that there are three drivers of cost. The first driver is the cost of theserver, the second driver is the cost of power, and the third driver is the cost ofHVAC. The second rule is to always plan for at least three servers for any service andthe final rule is to plan for three or more live data centers. 在规划数据中心时,请记住应用三个神奇规则“三”。第一个规则是成本的三个驱动因素。第一个驱动因素是服务器成本,第二个驱动因素是电力成本,第三个驱动因素是 HVAC 成本。第二条规则是始终为任何服务规划至少三台服务器,最后一条规则是规划三个或更多实时数据中心。 Multiple active data centers provide a number of advantages for your company.You gain higher availability and lower overall costs relative to the typical hot andcold site disaster recovery configuration. They also allow you greater flexibility inproduct rollouts and greater negotiation leverage with leased space. Operational con-fidence in facilities increase as compared to the lack of faith most organizations havein a cold or warm disaster recovery facility. Finally, customer perceived responsetimes go down for dynamic calls when routed to the closest data center. 多个活动数据中心为您的公司提供了许多优势。相对于典型的热站点和冷站点灾难恢复配置,您可以获得更高的可用性和更低的总体成本。它们还使您在产品推出方面具有更大的灵活性,并通过租赁空间获得更大的谈判筹码。与大多数组织对冷或热灾难恢复设施缺乏信心相比,对设施的运营信心有所增加。最后,当动态呼叫路由到最近的数据中心时,客户感知的响应时间会缩短。 Drawbacks of the multiple live data center configuration include increased opera-tional complexity, increases in headcount and network costs, and an increase intravel cost. That said, our experience is that the benefits far outweigh the negativeaspects of such a configuration. 多个实时数据中心配置的缺点包括操作复杂性增加、人员数量和网络成本增加以及内部成本增加。也就是说,我们的经验是,这种配置的好处远远大于坏处。 When considering a multiple live data center configuration, you should attempt toeliminate state and affinity wherever possible. Affinity to a data center closest to thecustomer is preferred to reduce customer perceived response times, but ideally youwant the flexibility of seamlessly moving traffic. You will need to implement somemethod of replication for databases and should you need to maintain state for anyreason, you should also consider using that technology for state replication. 在考虑多个实时数据中心配置时,您应该尽可能尝试消除状态和关联性。首选与距离客户最近的数据中心有密切关系,以减少客户感知的响应时间,但理想情况下,您希望能够灵活地无缝移动流量。您将需要为数据库实现某种复制方法,并且如果您出于任何原因需要维护状态,您还应该考虑使用该技术进行状态复制。 #####Key Points 关键点 * Power is typically the constraining factor within most data centers today. * 电力通常是当今大多数数据中心内的限制因素。 * Cost of power, quality and availability of power, geographic risk, an experi-enced labor pool, and cost and quality of network transit are all location basedconsiderations for data centers. * 电力成本、电力质量和可用性、地理风险、经验丰富的劳动力资源以及网络传输的成本和质量都是数据中心基于位置的考虑因素。 * Data center planning has a long time horizon. It needs to be done months andyears in advance. * 数据中心规划具有较长的时间跨度。它需要提前数月甚至数年完成。 * The three magic drivers of data center costs are servers, power, and HVAC. * 数据中心成本的三个神奇驱动因素是服务器、电力和暖通空调。 * Three is the magic number for servers: Never plan for a service having less thanthree servers initially. * 三是服务器的神奇数字:永远不要计划最初拥有少于三台服务器的服务。 * Three is the magic number for data centers: Always attempt to design for atleast three live sites. * 三是数据中心的神奇数字:始终尝试为至少三个实时站点进行设计。 * Multiple live sites offer higher availability, greater negotiating leverage, higheroperational confidence, lower cost, and faster customer response times than tra-ditional hot/cold disaster recovery configurations. * 与传统的热/冷灾难恢复配置相比,多个实时站点可提供更高的可用性、更大的谈判筹码、更高的运营信心、更低的成本和更快的客户响应时间。 * Multiple live sites tend to increase operational complexity, costs associated withtravel and network connectivity, and headcount needs. * 多个实时站点往往会增加运营复杂性、与旅行和网络连接相关的成本以及人员需求。 * Attempt to eliminate affinity and state in a multiple life site design. * 尝试消除多生命站点设计中的亲和力和状态。
没有评论