###Chapter 17 Performance and Stress Testing 第17章性能和压力测试 > If you know neither the enemy nor yourself, you will succumb in every battle.—Sun Tzu > 不知敌不知己,则百战必败。——孙子 After peripherally mentioning performance and stress testing in previous chapters, wenow turn our attention to these tests and discuss how they differ in purpose and out-put and how they impact scalability. Your organization may currently be using nei-ther, one, or both of these tests. Either way, this chapter should give you some freshperspectives on the purpose and viability of testing that you can use to either revampor initiate a testing process in your organization. 在前面的章节中外围提到了性能和压力测试之后,我们现在将注意力转向这些测试,并讨论它们在目的和输出方面的差异以及它们如何影响可扩展性。您的组织当前可能未使用这些测试中的任何一种、一种或两种。无论哪种方式,本章都应该为您提供有关测试的目的和可行性的一些新观点,您可以使用它们来改造或在组织中启动测试过程。 An important thing to remember up front is that no matter how good your testingis, including performance and stress testing, nothing will replace good design andproper development in terms of a quality and scalable product. Just as you cannottest quality into a product, you cannot load test scalability into one either. You needto establish very early in the product development life cycle that there will be a focuson scalability and quality from the start. This doesn’t mean that you should skip per-formance testing any more than you should skip quality testing; they are both essen-tial, but they are verification and validation steps that ensure the proper work wasdone up front. You should not expect to build the required quality or scalability in atthe end of the life cycle. 首先要记住的重要一点是,无论您的测试(包括性能和压力测试)有多好,就质量和可扩展的产品而言,没有什么可以取代良好的设计和适当的开发。正如您无法测试产品的质量一样,您也无法将测试可扩展性加载到产品中。您需要在产品开发生命周期的早期就确定从一开始就关注可扩展性和质量。这并不意味着您应该跳过性能测试,就像您应该跳过质量测试一样。它们都是必不可少的,但它们是验证和确认步骤,可确保预先完成正确的工作。您不应期望在生命周期结束时构建所需的质量或可扩展性。 ####Performing Performance Testing 执行性能测试 Performance testing, by definition, according to Wikipedia, covers a broad range ofengineering evaluations, where the emphasis is on the final measurable performancecharacteristics instead of the actual material or product. 根据维基百科的定义,性能测试涵盖了广泛的工程评估,其重点是最终可测量的性能特征,而不是实际的材料或产品。 With respect to computerscience, performance testing is focused on determining the speed, throughput, oreffectiveness of a device or piece of software. Performance testing is often called loadtesting and to us the terms are interchangeable. Some professionals will argue thatperformance testing and load testing have different goals but similar techniques. Toavoid a pedantic argument, we will use a broader goal for defining performance test-ing in order that it incorporates both. 就计算机科学而言,性能测试的重点是确定设备或软件的速度、吞吐量或有效性。性能测试通常称为负载测试,对我们来说这些术语是可以互换的。一些专业人士会认为性能测试和负载测试有不同的目标,但技术相似。为了避免迂腐的争论,我们将使用更广泛的目标来定义性能测试,以便将两者结合起来。 By our definition, the goal of performance testing is to identify, document, and,where possible, eliminate bottlenecks in the system. This is done through a strict con-trolled process of measurement and analysis. Load testing is utilized as a method inthis process. 根据我们的定义,性能测试的目标是识别、记录并在可能的情况下消除系统中的瓶颈。这是通过严格控制的测量和分析过程来完成的。负载测试被用作此过程中的一种方法。 #####Handling the Load with Load Testing 通过负载测试处理负载 Load testing is the process of putting load or user demand on a system to measure itsresponse and stability, the purpose of which is to verify that the application can meet thedesired performance objectives often specified as a service level agreement (SLA). A load testmeasures such things as response time, throughput, and resource utilization. It is not intendedto identify the system’s breaking point unless this point occurs below the peak load conditionthat is expected by the specifications, requirements, or normal operating conditions. If thatshould occur, you have a serious issue that must be addressed prior to release. 负载测试是在系统上施加负载或用户需求以测量其响应和稳定性的过程,其目的是验证应用程序是否能够满足通常指定为服务级别协议 (SLA) 的所需性能目标。负载测试可测量响应时间、吞吐量和资源利用率等内容。它无意于识别系统的断点,除非该点出现在规范、要求或正常操作条件预期的峰值负载条件以下。如果发生这种情况,则说明您遇到了必须在发布之前解决的严重问题。 Example load tests include 负载测试示例包括 * Test a mail server with the load of the expected number of users’ email accounts. * 使用预期数量的用户电子邮件帐户的负载来测试邮件服务器。 * Test the same mail server with the expected load of email messages. * 使用预期的电子邮件负载测试同一邮件服务器。 * Test a SaaS application by sending many and varied simulated user requests to the application over an extended period of time—the more like production traffic the better. * 通过在较长时间内向应用程序发送许多不同的模拟用户请求来测试 SaaS 应用程序 - 越像生产流量越好。 * Test a load balanced pair of app servers with a scaled down load of user traffic. * 使用按比例降低的用户流量负载来测试一对负载平衡的应用程序服务器。 #####Criteria 标准 Before we can begin our performance testing to identify bottlenecks, we must firstclearly identify the specifications of the system. This is the first step in performancetesting, establishing the success criteria. For Web 2.0 and SaaS systems, this is oftenbased on the concurrent usage and response time metrics. Unless this is the very timeconducting performance testing, these specifications will have already been estab-lished. The first time you conducted performance testing, hopefully prior to the firstrelease, you should have increased the load until the application either stoppedresponding or responded in an unpredictable manner—at which point, you wouldhave established a benchmark for the performance of the application. 在我们开始性能测试以确定瓶颈之前,我们必须首先明确系统的规格。这是性能测试的第一步,建立成功标准。对于 Web 2.0 和 SaaS 系统,这通常基于并发使用和响应时间指标。除非这是正在进行的性能测试,否则这些规范将已经建立。第一次进行性能测试时,希望在第一个版本之前,您应该增加负载,直到应用程序停止响应或以不可预测的方式响应 - 此时,您将为应用程序的性能建立基准。 There are other ways that you can establish these benchmarks or requirements,such as having specifications detailed ahead of time for the particular project. This isoften the case when developing a replacement system or doing a complete redesign.The old system may have handled a certain number of concurrent users and in orderto not have to purchase more hardware, the project has a major requirement of main-taining or improving this metric. Other times, the business is growing beyond thecurrent system and a decision is made to completely redesign the system from theground up. In this case, the usage and response time requirements generally go wayup based on the amount of investment necessary to completely redevelop the system. 您还可以通过其他方式建立这些基准或要求,例如提前为特定项目详细说明规范。在开发替代系统或进行完全重新设计时,通常会出现这种情况。旧系统可能已经处理了一定数量的并发用户,并且为了不必购买更多硬件,项目的主要要求是维护或改进此系统公制。有时,业务的增长超出了当前系统的范围,因此决定从头开始完全重新设计系统。在这种情况下,使用和响应时间要求通常会根据完全重新开发系统所需的投资量而增加。 #####Environment 环境 After you have these benchmarks, the second step is to establish your environment.The environment encapsulates the network, servers, operating system, and third-party software that the application is running on. It is typical to have separate envi-ronments for development, quality assurance testing, performance testing, staging,and production. The environment is important because you need a stable, consistentenvironment to conduct the test repeatedly over some extended duration. There are awide variety of tests that we will discuss in the next step of defining the test; for now,know that there can be many tests to test the breadth of components. Additionally,some of these tests need to be run over certain time periods, such as 24 hours, to pro-duce the load expected for batch routines. The other reason that the environment isimportant is that for the tests results to be accurate and meaningful, the environmentmust mirror production as much as possible. 有了这些基准之后,第二步是建立环境。环境封装了应用程序运行的网络、服务器、操作系统和第三方软件。通常有单独的开发、质量保证测试、性能测试、登台和生产环境。环境很重要,因为您需要一个稳定、一致的环境来在较长时间内重复进行测试。我们将在定义测试的下一步中讨论各种各样的测试;现在,知道可以有很多测试来测试组件的广度。此外,其中一些测试需要在特定时间段(例如 24 小时)内运行,以产生批处理例程所需的负载。环境很重要的另一个原因是,为了使测试结果准确且有意义,环境必须尽可能地反映生产情况。 The reason it is important that the performance testing environment mimic pro-duction as much as possible is because environmental settings, configurations, differ-ent hardware, different firewall rules, and much more can all dramatically affect testresults. Even different patch versions of the operating system, which seems trivial,can have dramatically different performance characteristics for applications. Thisdoes not mean that you need a full copy of your production environment; althoughthat would be nice, few companies can afford such a luxury. Instead, make wisetradeoffs but stick to the same basic architecture and implementation as possible. Forexample, pools of servers that in production have 40 servers in them can be scaleddown in a test environment to only two or three servers. Databases are often very dif-ficult to scale down because the amount of data affects the query performance. Insome cases, you can “trick” the database into believing it has the same amount ofdata as the production database in order to ensure the queries execute with the samequery plans. Spend some time deciding on a performance testing environment anddiscuss the tradeoffs that you are making. Balance the cost with the effectiveness andyou will be able to make the best decisions in terms of what the environment shouldlook like and how accurate the results will be. 性能测试环境尽可能模拟生产非常重要,因为环境设置、配置、不同的硬件、不同的防火墙规则等等都会极大地影响测试结果。即使操作系统的不同补丁版本(看似微不足道)也可能对应用程序产生截然不同的性能特征。这并不意味着您需要生产环境的完整副本;而是意味着您需要完整的生产环境副本。虽然那很好,但很少有公司能负担得起这样的奢侈。相反,要做出明智的权衡,但尽可能坚持相同的基本架构和实现。例如,生产环境中包含 40 台服务器的服务器池可以在测试环境中缩减为仅两到三台服务器。数据库通常很难缩小规模,因为数据量会影响查询性能。在某些情况下,您可以“欺骗”数据库,使其相信它具有与生产数据库相同的数据量,以确保查询使用相同的查询计划执行。花一些时间决定性能测试环境并讨论您正在做出的权衡。平衡成本与有效性,您将能够根据环境的外观以及结果的准确性做出最佳决策。 #####Define Tests 定义测试 The third step in performance planning is to define the tests. As mentioned earlier,there are a multitude of tests that can be performed on all the various services andfeatures. If you try to run all of them, you may never release any products. The key isto use the Pareto Distribution or Rule of 80/20.Find the 20% of the tests that willprovide you with 80% of the information. System’s tests almost always follow somesimilar distribution when it comes to the amount or value of information provided.This is because the features are not all used equally, and some are more critical thanothers. A feature handling user payments is more important than one handling auser’s search for friends, and thus can be tested more vigorously. 性能规划的第三步是定义测试。如前所述,可以对所有各种服务和功能执行大量测试。如果您尝试运行所有这些产品,您可能永远不会发布任何产品。关键是使用帕累托分布或 80/20 规则。找到可以为您提供 80% 信息的 20% 测试。当涉及到所提供的信息的数量或价值时,系统的测试几乎总是遵循某种类似的分布。这是因为这些功能并不是全部被平等地使用,并且有些功能比其他功能更重要。处理用户支付的功能比处理用户搜索朋友的功能更重要,因此可以更积极地进行测试。 #####Vilfredo Pareto 维尔弗雷多•累托 Vilfredo Federico Damaso Pareto was an Italian economist who lived from 1848 to 1923 andwas responsible for contributing several important advances to economics. One of the mostnotable insights that almost everyone has heard of today is the Pareto Distribution. Fascinatedby power and wealth distribution in societies, he studied the property ownership in Italy andobserved in his 1909 publication that 20% of the population owned 80% of the land, thus givingrise to his Pareto Distribution. 维尔弗雷多•德里科•马索•累托 (Vilfredo Federico Damaso Pareto) 是一位意大利经济学家,出生于 1848 年至 1923 年,为经济学做出了多项重要进步。今天几乎每个人都听说过的最著名的见解之一是帕累托分布。他对社会中的权力和财富分配着迷,研究了意大利的财产所有权,并在 1909 年的出版物中观察到,20% 的人口拥有 80% 的土地,从而产生了他的帕累托分布。 Technically, the Pareto Distribution is a power law of probability distribution, meaning that ithas a special relationship between the frequency of an observed event and the size of theevent. Another power law is Kleiber’s Law of metabolism, which states that the metabolic rateof an animal scales to the 3/4 power of the mass. As an example, a horse that is 50 times largerthan a rabbit will have a metabolism 18.8 times greater than the rabbit. 从技术上讲,帕累托分布是概率分布的幂律,这意味着观察到的事件的频率与事件的大小之间存在特殊的关系。另一个幂律是克莱伯新陈代谢定律,该定律指出动物的代谢率与质量的 3/4 次方成正比。例如,一匹比兔子大 50 倍的马,其新陈代谢是兔子的 18.8 倍。 There are lots of other rules of thumb that you can use, but the Pareto Distribution is veryuseful, when it applies, for getting the majority of a result without the majority of the effort. Thecaution of course is to make sure the probability distribution applies before using it. If you havea scenario where the information is one for one with the action, you cannot get 80% of it by onlyperforming 20% of the action; you will have to perform the percentage work that you need toachieve the equivalent percentage information. 您还可以使用许多其他经验法则,但帕累托分布在适用时非常有用,可以轻松获得大部分结果。当然,要注意的是在使用概率分布之前要确保它适用。如果你有一个场景,信息与操作是一一对应的,那么你不能通过只执行 20% 的操作来获得 80% 的信息;您必须执行所需的百分比工作才能获得等效的百分比信息。 When you do define the tests, be sure to include tests of various types. Some typesor categories of tests include endurance, load, most used, most visible, and compo-nent (app, network, database, cache, and storage). The endurance test is used toensure that a standard load experienced over a prolonged period of time does nothave any adverse effects due to such problems as memory leaks, data storage, log filecreation, or batch jobs. A normal user load with as realistic traffic patterns and activ-ities as possible is used here. It is often difficult to come up with actual or close toactual user traffic. A minimum substitute for this is a series of actions such as a signup process followed by a picture upload, a search for friends, and a log out, writteninto a script that can be executed over and over. More of an ideal scenario is to gatheractual users’ traffic from a network device or app server and replay these in the exactsame order varying the time period. At first, you can run the test over the same timeperiod that the users generated the traffic, and then you can increase the speed andensure the application performs as expected with the increased throughput. 当您定义测试时,请确保包括各种类型的测试。测试的某些类型或类别包括耐久性、负载、最常用、最明显和组件(应用程序、网络、数据库、缓存和存储)。耐久性测试用于确保长时间经历的标准负载不会因内存泄漏、数据存储、日志文件创建或批处理作业等问题而产生任何不利影响。这里使用具有尽可能真实的流量模式和活动的正常用户负载。通常很难得出实际或接近实际的用户流量。对此的最低限度的替代是一系列操作,例如注册过程,然后是图片上传、搜索朋友和注销,这些操作被写入可以反复执行的脚本中。更理想的情况是从网络设备或应用程序服务器收集实际用户的流量,并在不同的时间段以完全相同的顺序重放这些流量。首先,您可以在用户生成流量的同一时间段内运行测试,然后可以提高速度并确保应用程序通过增加的吞吐量按预期执行。 #####Execute Tests 执行测试 The load test is essentially putting a user load on the system up to the expected orrequired level to ensure the application is stable and responsive according to internalor external service level agreements. A most used test scenario is testing the mostcommon path that users take through the application. In contrast, a most visible testscenario is testing the part of the application that is seen the most such as the homepage or a new landing page. The component test category is a broad set of tests thatare designed to test individual components in the system. One such test might be toexercise a particularly long running query on the database to ensure it can handle theprescribed amount of traffic. Similarly, traffic requests through a load balancer orfirewall are other component tests that you might consider. 负载测试本质上是将系统上的用户负载置于预期或要求的水平,以确保应用程序根据内部或外部服务级别协议稳定且响应迅速。最常用的测试场景是测试用户通过应用程序最常用的路径。相比之下,最明显的测试场景是测试应用程序中最常见的部分,例如主页或新的登陆页面。组件测试类别是一组广泛的测试,旨在测试系统中的各个组件。其中一项测试可能是在数据库上执行特别长时间运行的查询,以确保它可以处理规定的流量。同样,通过负载均衡器或防火墙的流量请求是您可能考虑的其他组件测试。 After you have finalized your test plan based on the size of the system, the relativevalue of the information that you will gain from each test, the amount of time thatyou have available, and the amount of risk that the organization is willing to accept,you are ready to move on to step four, which is to actually execute the tests. In thisstep, you work through the test plan executing the tests methodically in the environ-ment established for this testing and begin recording various measurements such astransaction times, response times, outputs, and behavior. All available data is gath-ered; data is your friend in performance testing, and you really can’t have too much.It is important to keep this data from release to release. As we will talk about in thenext step, comparison between various releases is critical to understanding the dataand determining if the data indicates normal operating ranges or if there could be aproblem. 根据系统的规模、从每次测试中获得的信息的相对价值、可用的时间以及组织愿意接受的风险量确定测试计划后,您准备好进入第四步,即实际执行测试。在此步骤中,您将完成测试计划,在为此测试建立的环境中系统地执行测试,并开始记录各种测量结果,例如事务时间、响应时间、输出和行为。收集所有可用数据;数据是性能测试中的朋友,而且你真的不能拥有太多。在每次发布时保留这些数据非常重要。正如我们将在下一步中讨论的,不同版本之间的比较对于理解数据并确定数据是否表明正常运行范围或是否可能存在问题至关重要。 #####Analyze Data 分析数据 Step five in the performance testing process is to analyze the data gathered. This anal-ysis can be done in a variety of manners depending on such things as the expertise ofthe analyst, the expectations of thoroughness, the acceptable risk level, and the timeallotted. Perhaps the simplest analysis is a comparison of this candidate release withpast releases. A query that can only execute 25 times per second without increasedresponse time compared to last release when it could execute 50 times per secondwith no noticeable degradation in performance indicates a potential problem. Thefun begins in the next step trying to figure out why this change has occurred.Although decreases in capacity of throughput or increases in response time are clearlythings that should be noted for further investigation, the opposite is true as well. Asudden dramatic increase in capacity might indicate that a particular code path hasbeen dropped or SQL conditionals have been lost and should be noted as well forexplanation. We hope that in these scenarios an engineer has refactored andimproved the performance, but it is best to note this and ask the questions. 性能测试过程的第五步是分析收集的数据。这种分析可以通过多种方式完成,具体取决于分析师的专业知识、彻底性的期望、可接受的风险水平以及分配的时间等。也许最简单的分析就是将此候选版本与过去的版本进行比较。与上一个版本每秒只能执行 50 次且性能没有明显下降的查询相比,每秒只能执行 25 次且响应时间没有增加的查询表明存在潜在问题。 Thefun 从下一步开始尝试找出发生这种变化的原因。虽然吞吐量容量的降低或响应时间的增加显然是需要进一步调查的事情,但反之亦然。容量的突然急剧增加可能表明特定的代码路径已被删除或 SQL 条件已丢失,并且还应注意以供解释。我们希望在这些场景中工程师已经重构并提高了性能,但最好注意这一点并提出问题。 A more detailed analysis involves graphing the data for visual reference. Some-times, it is much easier when data is graphed on line, bar, or pie charts to recognizeanomalies or differences. Although these may or may not be truly significant, theyare generally quick ways of making judgments about the release candidate. A furtherdetailed analysis involves performing statistical analysis on the data. There is a multi-tude of statistical tests that can be used, such as control charts, t-tests, factor analysis,main effects plot, analysis of variance, and interaction plots. The general purpose ofconducting any of this analysis is to determine what factors are causing the observedbehavior, is it statistically significantly different from other releases, and will it meetthe service level agreements that are in place. 更详细的分析涉及将数据绘制成图表以供视觉参考。有时,将数据绘制成折线图、条形图或饼图更容易识别异常或差异。尽管这些可能真正重要,也可能不重要,但它们通常是对候选版本做出判断的快速方法。进一步详细的分析涉及对数据进行统计分析。可以使用多种统计检验,例如控制图、t 检验、因子分析、主效应图、方差分析和交互图。进行任何此类分析的一般目的是确定哪些因素导致了所观察到的行为,它在统计上是否与其他版本有显着差异,以及它是否满足现有的服务级别协议。 #####Report to Engineers 向工程师报告 The sixth step in the performance testing process is to report the results to the engi-neering team responsible for the release. This is generally done in an informal mannerand can be done either at one time with all parties present or in smaller teams. Thegoal of the meeting is to have each item that gets raised as a possible anomaly han-dled in one of three ways. The first case would be that the anomaly gets explainedaway by the engineer. In this case, the engineer must make a good enough argumentfor why the results of the test are different than expected to make the tester as well asthe engineering leadership feel comfortable passing this test without investigating fur-ther. The second case is for a bug to be filed against the engineer in order that heinvestigate the issue further and either fix it or explain it. The third option is for theengineering team to ask for additional tests with the expectation that more data willhelp narrow down the actual problem. 性能测试过程的第六步是将结果报告给负责发布的工程团队。这通常以非正式的方式完成,可以在所有各方都在场的情况下一次性完成,也可以以较小的团队形式完成。会议的目标是通过三种方式之一处理作为可能异常提出的每个项目。第一种情况是工程师对异常现象进行了解释。在这种情况下,工程师必须就测试结果与预期不同的原因提出足够充分的论据,以使测试人员和工程领导感到放心地通过此测试,而无需进一步调查。第二种情况是向工程师提交错误,以便他进一步调查问题并修复或解释它。第三种选择是工程团队要求进行额外的测试,期望更多的数据将有助于缩小实际问题的范围。 #####Repeat Tests and Analysis 重复测试和分析 The last step in the performance process is to repeat the testing and reanalyze thedata. This can either be because a fix was provided for a bug that was logged in stepsix or because there is additional time, and the code base is likely always changingdue to functional bug fixes. If there are time and resources available, these testsshould definitely be repeated to ensure the results have not changed dramaticallyfrom one build to another for the candidate release and to continue probing forpotential anomalies. 性能过程的最后一步是重复测试并重新分析数据。这可能是因为为步骤六中记录的错误提供了修复,或者因为有额外的时间,并且代码库可能总是由于功能性错误修复而发生变化。如果有时间和资源,绝对应该重复这些测试,以确保候选版本的结果从一个版本到另一个版本没有发生显着变化,并继续探测潜在的异常情况。 #####Summary of Performance Testing Steps 性能测试步骤总结 When conducting performance testing, the following steps are the critical steps to completing itproperly. You can add steps as necessary to fit your organization’s needs, but these are theones you must have to ensure you achieve the results that you expect. 进行性能测试时,以下步骤是正确完成测试的关键步骤。您可以根据需要添加步骤来满足组织的需求,但这些是您必须确保实现预期结果的步骤。 1.Criteria. Establish what criteria are expected from the application, component, device, or system that is being tested. 1.标准。确定正在测试的应用程序、组件、设备或系统的预期标准。 2.Environment. Make sure your testing environment is as close to production as possible to ensure that your test results are accurate. 2.环境。确保您的测试环境尽可能接近生产环境,以确保测试结果准确。 3.Define tests. There are many different categories of tests that you should consider for inclusion in the performance test. These include endurance, load, most used, most visi-ble, and component. 3.定义测试。您应该考虑将许多不同类别的测试包含在性能测试中。其中包括耐久性、负载、最常用、最可见和组件。 4.Execute tests. This step is where the tests are actually being executed in the environ-ment established in Step 2. 4.执行测试。这一步是在步骤 2 中建立的环境中实际执行测试的地方。 5.Analyze data. Analyzing the data can take many forms—some as simple as comparing to previous releases, others include stochastic models. 5.分析数据。分析数据可以采取多种形式,有些简单到与以前的版本进行比较,其他则包括随机模型。 6.Report to engineers. Provide the analysis to the engineers and facilitate a discussion about the relevant points. 6.向工程师报告。向工程师提供分析并促进相关要点的讨论。 7.Repeat tests & analysis. As necessary to validate bug fixes or as time and resources per-mit, continue testing and analyzing the data. 7.重复测试和分析。根据需要验证错误修复或在时间和资源允许的情况下,继续测试和分析数据。 Follow these seven steps and any others that you need to add for your specific situationsand organization. The key to a successful process is making it fit the organization. 请遵循这七个步骤以及您需要为您的特定情况和组织添加的任何其他步骤。成功流程的关键是使其适合组织。 Performance testing covers a broad range of testing evaluations, but they share thefocus on the necessary characteristics of the system rather than the individual materi-als, hardware, or code. Staying focused on ensuring the software meets or exceeds thespecified requirements or service level agreements is what performance testing is allabout. We covered the seven steps of a successful performance testing process andidentified that the key to this, as with all processes, is a good fit within the organiza-tion. Additional important aspects of performance testing include a methodicalapproach from the very beginning of establishing the benchmarks and success criteriato the very end of repeating the tests as often as possible for validation purposes.Because there are always necessary tradeoffs between testing, time, and monetaryinvestments, a methodical, scientific approach is the way to ensure success with per-formance testing. 性能测试涵盖了广泛的测试评估,但它们都关注系统的必要特征,而不是单个材料、硬件或代码。性能测试的重点是确保软件满足或超过指定的要求或服务级别协议。我们介绍了成功的性能测试流程的七个步骤,并发现与所有流程一样,其关键在于与组织的良好契合。性能测试的其他重要方面包括从一开始建立基准和成功标准到最后尽可能频繁地重复测试以进行验证的系统方法。因为在测试、时间和金钱投资之间总是存在必要的权衡,有条不紊、科学的方法是确保性能测试成功的方法。 ####Don’t Stress Over Stress Testing 不要过度强调压力测试 Stress testing is a process that is used to determine an application’s stability whensubjected to above normal loads. As opposed to load testing, where the load is onlyas much as specified or normal operations require, stress testing goes well beyondthese, often to the breaking point of the application, in order to observe the behav-iors. There are different methods of stress testing, the two most common are positivetesting and negative testing. Positive testing is where the load is progressivelyincreased to overwhelm the system’s resources. Negative testing takes away resourcessuch as memory, threads, or connections. Besides determining the exact point ofdemise or in some instances the degradation curve of the application, the other pur-pose is to drive the application beyond its capacity to make sure that when it fails itcan recover gracefully. This is testing the application’s recoverability. 压力测试是一个用于确定应用程序在承受高于正常负载时的稳定性的过程。与负载测试不同,负载测试的负载仅为指定或正常操作所需的负载,而压力测试则远远超出这些范围,通常会达到应用程序的临界点,以便观察行为。压力测试有不同的方法,最常见的两种是正向测试和负向测试。积极测试是指负载逐渐增加以压垮系统资源。负面测试会占用内存、线程或连接等资源。除了确定应用程序的确切终止点或在某些情况下确定降级曲线之外,另一个目的是驱动应用程序超出其容量,以确保在发生故障时能够正常恢复。这是测试应用程序的可恢复性。 As an example, let’s revisit our fictitious AllScale human resources management(HRM) application again. The application has a service that provides searching func-tionality for managers to find employees. This is particularly useful for HR managerswho might have hundreds of employees that they are responsible for in terms of HRM.Kevin Qualman, the director of quality assurance, has asked his team to develop astress test for this service. One method that Kevin’s team has come up with is to sub-ject the service to an increasing number of simultaneous requests. At each progressivestep, the team would want to monitor and record response time, returned results, andthe behavior of various components such as the buffer pool of the database or thefreshness of a caching layer. When the team gets to a point that response time beginsto degrade beyond the specifications, it makes note of this and continues to monitorspecifically to ensure that the system degrades nicely. Kevin’s team members do notwant the service to topple over and stop serving any requests. If this is the case, thereis a problem in the system that should be fixed. Instead, they want to see the systemhandle this inability to service incoming requests in some acceptable manner such asreject requests that exceed its capacity or queue them to be serviced later. At thispoint in the test, they should begin tapering off the requests back to an acceptable andmanageable level for the service. They should expect to see that as the requests aretapered off the system will clean up the queued or rejected requests and continue pro-cessing. This is the recoverability that they expect in the service. 作为示例,让我们再次回顾一下虚构的 AllScale 人力资源管理 (HRM) 应用程序。该应用程序提供一项服务,为经理提供搜索功能以查找员工。这对于人力资源经理来说特别有用,他们可能在人力资源管理方面负责数百名员工。质量保证总监凯文?#22840;尔曼(Kevin Qualman)已要求他的团队为此服务开发压力测试。 Kevin 的团队提出的一种方法是让服务承受越来越多的并发请求。在每个渐进步骤中,团队都希望监视和记录响应时间、返回的结果以及各种组件的行为,例如数据库的缓冲池或缓存层的新鲜度。当团队达到响应时间开始下降超出规范的程度时,它会记录这一点并继续进行专门监控,以确保系统良好地下降。 Kevin 的团队成员不希望服务崩溃并停止服务任何请求。如果是这种情况,则系统存在问题,需要修复。相反,他们希望看到系统以某种可接受的方式处理无法为传入请求提供服务的情况,例如拒绝超出其容量的请求或将它们排队以供稍后服务。在测试的这一点上,他们应该开始将请求逐渐减少到服务可接受和可管理的水平。他们应该期望看到,随着请求逐渐减少,系统将清理排队或拒绝的请求并继续处理。这是他们期望的服务可恢复性。 #####Identify Objectives 确定目标 We have identified eight separate steps in a basic stress test. You may choose to add tothis as required by the needs of your organization, but this basic outline will get youstarted or help refine your process if it exists already. The first step is to identify what youwant to achieve with the test. As with all projects, time and resources are limited; there-fore, by identifying up front the goals, you can narrow the field of tests that you willperform. This is crucial to saving a great deal of time or worse, executing the tests andthen having the data not tell you want you need to know about the application or service. 我们在基本压力测试中确定了八个单独的步骤。您可以根据组织的需要选择添加此内容,但此基本大纲将帮助您入门或帮助完善您的流程(如果已存在)。第一步是确定您希望通过测试实现什么目标。与所有项目一样,时间和资源都是有限的;因此,通过预先确定目标,您可以缩小要执行的测试范围。这对于节省大量时间至关重要,或者更糟的是,执行测试然后让数据告诉您您需要了解应用程序或服务。 There are four categories of goals that a stress test can provide results for analysis.These categories are establishing a baseline, testing failure and recoverability, nega-tive testing, and system interactions. The goal of establishing baseline behavior of aservice is usually the goal when you have never done stress testing and you need toestablish the peak utilization possible or degradation curve. The second category ofgoals for stress testing is to test the service’s behavior during failure and then its sub-sequent recoverability. The service may have been modified or enhanced and youwant to ensure it still behaves properly during periods of extreme stress. These twogoals involve positive stress testing because you are putting a continually increasingpositive load on the service. The third category of goals that you might have for astress test is negative testing. In this case, you are interested in determining what hap-pens should you lose cache or have a memory leak or any other resource becomeslimited or restricted. The final category of goals that you may have for your stresstest is testing the interactivity of the system’s services. Here, you are trying to ensurethat some given functionality continues to work when one or more other services areoverloaded. Between these four categories of goals, you should be able to define spe-cifically the purpose of your stress test. 压力测试可以提供四类目标进行分析。这些类别是建立基线、测试故障和可恢复性、负面测试以及系统交互。当您从未进行过压力测试并且需要建立可能的峰值利用率或降级曲线时,建立服务基线行为的目标通常是目标。压力测试的第二类目标是测试服务在故障期间的行为及其随后的可恢复性。该服务可能已被修改或增强,并且您希望确保其在极端压力期间仍能正常运行。这两个目标涉及正向压力测试,因为您正在对服务施加不断增加的正向负载。压力测试的第三类目标是负面测试。在这种情况下,您有兴趣确定如果丢失缓存或内存泄漏或任何其他资源受到限制或限制,会发生什么情况。您的压力测试的最后一类目标是测试系统服务的交互性。在这里,您试图确保当一个或多个其他服务过载时某些给定的功能继续工作。在这四类目标之间,您应该能够具体定义压力测试的目的。 #####Identify Key Services 确定关键服务 After you have identified the goal or objective of your stress test, the second step is toidentify the services that you will be testing. Again, we have limited time andresources and must choose which services are to be tested to ensure we achieve ourgoals established in the first step. Some factors that you should consider are criticalityto the overall system, ones most likely to affect performance, and those identifiedthrough load testing as bottlenecks. Let’s talk about each one individually. The firstfactor to use in determining which services should be selected for stress testing is thecriticality of each service to the overall system performance. If there is a central ser-vice such as a data abstract layer (DAL) or user authorization, this should beincluded as a candidate for stress testing because the stability of the entire applicationdepends on this service. If you have architected your application into fault tolerant“swim lanes,” which will be discussed in Chapter 21, Creating Fault Isolative Archi-tectural Structures, you still likely have core services that have been replicated acrossthe lanes. The second consideration for determining services to stress test is the likeli-hood that a service affects performance. This decision will be influenced by knowl-edgeable engineers but should also be somewhat scientific. You can rank services bythe usage of things such as synchronous calls, I/O, caching, locking, and so on. Themore of these higher risk processes that are included in the service the more likelythey will have an effect on performance. The third factor for selecting services to bestress tested is those services identified during load testing as a bottleneck. Hopefully,if a service has been identified as a bottleneck, this constraint will have already beenfixed but you should recheck them during stress testing. These three factors shouldprovide you with strong guidelines for selecting the services on which you shouldfocus your time and resources to ensure you get the most out of your stress testing. 确定压力测试的目标后,第二步是确定要测试的服务。同样,我们的时间和资源有限,必须选择要测试哪些服务,以确保我们实现第一步设定的目标。您应该考虑的一些因素对整个系统至关重要,最有可能影响性能,以及通过负载测试确定为瓶颈的因素。让我们分别谈谈每一项。在确定应选择哪些服务进行压力测试时,第一个因素是每个服务对整体系统性能的重要性。如果存在数据抽象层 (DAL) 或用户授权等中央服务,则应将其作为压力测试的候选对象,因为整个应用程序的稳定性取决于此服务。如果您已将应用程序构建为容错“泳道”(这将在第 21 章“创建故障隔离架构结构”中讨论),那么您仍然可能拥有跨泳道复制的核心服务。确定要进行压力测试的服务的第二个考虑因素是服务影响性能的可能性。这一决定将受到知识渊博的工程师的影响,但也应该具有一定的科学性。您可以根据同步调用、I/O、缓存、锁定等的使用情况对服务进行排名。服务中包含的风险较高的流程越多,对性能产生影响的可能性就越大。选择要进行压力测试的服务的第三个因素是那些在负载测试期间被识别为瓶颈的服务。希望,如果服务已被确定为瓶颈,则此约束将已得到修复,但您应该在压力测试期间重新检查它们。这三个因素应该为您提供强有力的指导,帮助您选择应集中时间和资源的服务,以确保您从压力测试中获得最大收益。 #####Determine Load 确定负载 The third step in stress testing is to determine how much load is actually necessary.Determining the load is important for a variety of reasons. First, it is helpful to knowat approximately what load the application will start exhibiting strange behaviors sothat you don’t waste time on much lower loads. Second, you need to understand ifyou have enough capacity on your test systems to generate the required load. Theload that you decide to place upon a particular service should stress it sufficientlybeyond the breaking point in order to enable you to observe the behavior and conse-quences of the stress. One way to accomplish this is to identify the load under whichthe service begins to exhibit poor behavior, and incrementally increase the loadbeyond this point. The important thing is to be methodical, record as much data aspossible, and create a significant failure of the service. Stress can be placed upon theservice in a variety of manners, such as increasing the requests, shortening any delays,or reducing the hardware capacity. An important factor to remember is that loads,whether identified in production or in load testing, should always be scaled to theappropriate level based on the differences in hardware between the environments. 压力测试的第三步是确定实际需要多少负载。出于多种原因,确定负载很重要。首先,了解应用程序将开始表现出奇怪行为的大约负载是有帮助的,这样您就不会在低得多的负载上浪费时间。其次,您需要了解测试系统是否有足够的容量来生成所需的负载。您决定对特定服务施加的负载应该足以使其超过临界点,以便您能够观察压力的行为和后果。实现此目的的一种方法是识别服务开始表现出不良行为的负载,并逐渐增加负载超过该点。重要的是要有条理,记录尽可能多的数据,并造成服务的重大故障。可以通过多种方式对服务施加压力,例如增加请求、缩短延迟或减少硬件容量。要记住的一个重要因素是,无论是在生产中还是在负载测试中确定的负载,都应始终根据环境之间硬件的差异缩放到适当的级别。 #####Environment 环境 As with performance testing, establishing the appropriate environment is critical.This is the fourth step in stress testing. The environment must be stable, consistent,and as close to production as possible. This last item might be hard to accomplishunless you have an unlimited budget. If you are one of the less fortunate technologymanagers, constrained by a budget like the rest of us, you will have to scale thisdown. Large pools of servers in production can be scaled down to small pools of twoor three servers, but the fact that there are multiple servers load balanced behind thesame rules is important. The class of servers should be the same if at all possible or ascale factor must be introduced. A production environment with 7.2K rpm SATAdisks and a test environment with 5.4K rpm SATA disks may cause the application tohave different performance characteristics and different load capacities. It is impor-tant to spend some time deciding on a stress testing environment, just as you did foryour performance testing environment. Understand the tradeoffs that you are makingwith each difference between your production and testing environment. Balance therisk and rewards to make the best decisions in terms of what the environment shouldlook like and how useful the tests will be. 与性能测试一样,建立合适的环境至关重要。这是压力测试的第四步。环境必须稳定、一致且尽可能接近生产。除非您有无限的预算,否则最后一项可能很难完成。如果您是不幸的技术经理之一,像我们其他人一样受到预算的限制,您将不得不缩小规模。生产中的大型服务器池可以缩小为两个或三个服务器的小型池,但在相同规则后面有多个服务器进行负载平衡这一事实很重要。如果可能的话,服务器的类别应该相同,否则必须引入比例因子。具有 7.2K rpm SATA 磁盘的生产环境和具有 5.4K rpm SATA 磁盘的测试环境可能会导致应用程序具有不同的性能特征和不同的负载能力。花一些时间决定压力测试环境非常重要,就像您为性能测试环境所做的那样。了解您对生产和测试环境之间的每个差异所做的权衡。平衡风险和回报,根据环境应该是什么样子以及测试的有用程度做出最佳决策。 #####Identify Monitors 识别监视器 The fifth step in the stress testing process is to identify what needs to be monitored orwhat data needs to be collected. It is as equally important to identify what needs tobe monitored and captured as it is to properly choose the service, load, and tests. Youcertainly do not want to go to the trouble of performing the tests only to find thatyou did not capture the data that you needed to perform a proper analysis. Somethings that might be important to consider as potential data points are the results orbehavior of the service, the response time, CPU load, memory usage, disk usage,thread deadlocks, SQL count, transactions failed, and so on. The results of the ser-vice are important in the event that the application provide erroneous results. Com-parison of the expected and actual results should be considered as a very goodmeasure of the behavior of the service under load. 压力测试过程的第五步是确定需要监控哪些内容或需要收集哪些数据。确定需要监视和捕获的内容与正确选择服务、负载和测试同样重要。您当然不想麻烦地执行测试却发现您没有捕获执行正确分析所需的数据。作为潜在数据点,需要考虑的重要因素包括服务的结果或行为、响应时间、CPU 负载、内存使用情况、磁盘使用情况、线程死锁、SQL 计数、事务失败等等。如果应用程序提供错误的结果,服务的结果很重要。预期结果与实际结果的比较应被视为衡量负载下服务行为的一个很好的指标。 #####Create Load 创建负载 The next step in the process is to create the simulated load. This sixth step is impor-tant because this often takes more work than running the actual tests. Creating suffi-cient load to stress the service may be very difficult if you have services that havebeen well architected to handle especially high loads. The best loads are those thatare replicated from real user traffic. Sometimes, it is possible to gather this fromapplication or load balancer logs. If this is possible and the source of your load data,it is likely that you will need to coordinate other parts of the system such as the data-base to coincide with the load data. For example, if you are testing a signup serviceand plan on replaying actual user registrations from your production logs, you willneed to not only extract the registration requests from your logs but also have thedata in the test database set to a point before the user registrations began. The reasonfor this is that if the user is already registered in the database, it will cause a differentcode path to be executed than normal for a user registration. This difference will sig-nificantly skew your testing results and is not an accurate test. If you cannot get realuser traffic to simulate your load, you can revert to writing scripts that simulate aseries of steps that exercise the service as close to normal user traffic as possible. 该过程的下一步是创建模拟负载。第六步很重要,因为这通常比运行实际测试需要更多的工作。如果您的服务经过精心架构以处理特别高的负载,那么创建足够的负载来对服务施加压力可能会非常困难。最好的负载是从真实用户流量复制的负载。有时,可以从应用程序或负载均衡器日志中收集此信息。如果这是可能的并且是您的负载数据的来源,则您可能需要协调系统的其他部分(例如数据库)以与负载数据一致。例如,如果您正在测试注册服务并计划从生产日志中重放实际的用户注册,则您不仅需要从日志中提取注册请求,还需要将测试数据库中的数据设置为用户注册开始之前的某个点。这样做的原因是,如果用户已经在数据库中注册,则会导致执行与用户注册正常情况不同的代码路径。这种差异会严重影响您的测试结果,并且不是准确的测试。如果无法获得真实用户流量来模拟负载,您可以转而编写脚本来模拟一系列尽可能接近正常用户流量的服务执行步骤。 #####Execute Tests 执行测试 After you have finalized your test objectives, identified the key services to be tested,determined the load necessary, set up your environment, identified what needs to bemonitored, and created the simulated load that will be used, you are ready for theseventh step, which is to actually execute the tests. In this step, you will methodicallyprogress through your identified services performing the stress tests under the loadsdetermined and methodically record the data that you identified as being importantto perform a proper analysis. Like with performance testing, you should keep datafrom release to release. Comparison between various releases is a great way toquickly understand the changes that have taken place from one release to another. 在最终确定测试目标、确定要测试的关键服务、确定必要的负载、设置环境、确定需要监视的内容并创建将使用的模拟负载后,您就可以进行第七步了,这就是是实际执行测试。在此步骤中,您将有条不紊地逐步完成已确定的服务,在确定的负载下执行压力测试,并有条理地记录您认为对执行正确分析很重要的数据。与性能测试一样,您应该保留各个版本的数据。不同版本之间的比较是快速了解从一个版本到另一个版本所发生的更改的好方法。 #####Analyze Data 分析数据 The last step in stress testing is to perform the analysis on the data gathered duringthe tests. The analysis for the stress test data is similar to that done for the perfor-mance tests in that a variety of methods can be implemented depending on factorssuch as the amount of time allocated, the skills of the analyst, the acceptable amountof risk, and the level of details expected. The other significant determinant in how thedata should be analyzed is the objectives or goals determined in Step 1.If the object isto establish a baseline, little analysis needs to be done, perhaps just to validate thatthe data accurately depicts the baseline, that it is statistically significant, and that itonly has common cause variation. If the object is to identify the failure behavior, theanalysis should focus on comparing results from when the load was below the break-ing point and above it. This will help identify warning signs of an impending problemas well as if there is a problem or inappropriate behavior of the system at certainloads. If the objective is to test for the behavior when the resource is removed com-pletely from the system, the analysis will probably want to include a comparison ofresponse times and other system metrics between various resource-constrained sce-narios and post-load to ensure that the system has recovered as expected. For theinteractivity objective, the data from many different services may have to be lookedat together. This type of examination may include multivariate analysis such as prin-cipal component or factor analysis. The objective identified in the very first step willbe the guidepost for the analysis. A successful analysis will meet the objectives setforth for the tests. If a gap in the data or missing test scenario prevents you fromcompleting the analysis, you should reexamine your steps and ensure you have accu-rately followed the eight-step process outlined earlier. 压力测试的最后一步是对测试期间收集的数据进行分析。压力测试数据的分析与性能测试的分析类似,可以根据分配的时间量、分析师的技能、可接受的风险量以及预期的详细程度。如何分析数据的另一个重要决定因素是步骤 1 中确定的目的或目标。如果目标是建立基线,则几乎不需要进行任何分析,也许只是为了验证数据是否准确地描述了基线,即它是统计性的显着,并且它只具有共同原因变异。如果目标是识别故障行为,则分析应侧重于比较负载低于断裂点和高于断裂点时的结果。这将有助于识别即将发生的问题的警告信号,以及系统在某些负载下是否存在问题或不当行为。如果目标是测试从系统中完全删除资源时的行为,则分析可能需要比较各种资源受限场景和加载后之间的响应时间和其他系统指标,以确保系统已按预期恢复。为了实现交互目标,来自许多不同服务的数据可能必须一起查看。此类检查可能包括多变量分析,例如主成分分析或因子分析。第一步确定的目标将成为分析的路标。成功的分析将满足测试所设定的目标。如果数据中的空白或缺少测试场景阻止您完成分析,您应该重新检查您的步骤并确保您准确地遵循了前面概述的八步流程。 #####Summary of Stress Testing Steps 压力测试步骤总结 When performing stress testing, the following steps are the critical steps to completing it properly.As with performance testing, you can add additional steps as necessary to fit your organization’s needs. 执行压力测试时,以下步骤是正确完成压力测试的关键步骤。与性能测试一样,您可以根据需要添加其他步骤以满足组织的需求。 1.Identify objectives. Identify why you are performing the test. These goals usually fall into one of four categories: establish a baseline, identify behavior during failure and recovery, identify behavior during loss of resources, and determine how the failure of one service will affect the entire system. 1.确定目标。确定您执行测试的原因。这些目标通常属于四类之一:建立基线、识别故障和恢复期间的行为、识别资源丢失期间的行为以及确定一项服务的故障将如何影响整个系统。 2.Identify key services. Time and resources are limited so you must select only the most important services to test. 2.确定关键服务。时间和资源有限,因此您必须仅选择最重要的服务进行测试。 3.Determine load. Calculate or estimate the amount of load that will be required to stress the application to the breaking point. 3.确定负载。计算或估计将应用程序施加到断裂点所需的负载量。 4.Environment. The environment should mimic production as much as possible to ensure the validity of the tests. 4.环境。环境应尽可能模仿生产环境,以确保测试的有效性。 5.Identify monitors. You don’t want to execute tests and then realize you are missing data. Plan ahead by using the objectives identified in Step 1 as criteria for what must be monitored. 5.识别监视器。您不想执行测试然后意识到丢失了数据。使用步骤 1 中确定的目标作为必须监控的标准来提前计划。 6.Create load. Create the actual load data, preferably from user data. 6.创建负载。创建实际负载数据,最好是根据用户数据。 7.Execute tests. This step is where the tests are actually being executed in the environ-ment established previously. 7.执行测试。这一步是在之前建立的环境中实际执行测试的地方。 8.Analyze data. The last step is to analyze the data. Follow these eight steps and any others that you need to add for your specific situations andorganization. Ensure the process fits the needs of the organization. 8.分析数据。最后一步是分析数据。请遵循这八个步骤以及您需要根据您的具体情况和组织添加的任何其他步骤。确保流程适合组织的需求。 We need to take a break in our description and praise of the stress testing processto discuss the downside. Although we encourage the use of stress testing, it is admit-tedly one of the hardest types of testing to perform properly; and if you don’t per-form it properly, the effort is almost always wasted. As we discussed in Step 4 aboutsetting up the proper environment, if you switch classes of storage or processorspeeds, these can completely throw off the validity of the test results. Unfortunately,the environment is relatively easy to get correct, especially when compared to thesixth step, creating the load. This is by far the hardest and most likely place that youor your team will mess up the process and cause erroneous or inaccurate results. It isvery, very difficult to accurately capture and replay real user behavior. As we dis-cussed, this often necessitates the synchronization of data within caches and stores,such as database or files, because inconsistencies will exercise different code pathsand render inaccurate results. Additionally, creating a very large load itself can oftenbe problematic from a capacity standpoint, especially when trying to test the interac-tivity of multiple services. For reasons such as these, we caution the use of stress test-ing as your only safety net. As we will discuss in the next chapter on go/no-godecisions and rollback, you must have multiple relief valves in the event problemsarise or disaster strikes. We will also cover this subject more in Part III, ArchitectingScalable Solutions, with the discussion of how to use swim lanes and other applica-tion splitting methods to improve scalability and stability. 我们需要暂停一下对压力测试过程的描述和赞扬,来讨论其缺点。尽管我们鼓励使用压力测试,但不可否认,它是最难正确执行的测试类型之一;如果你没有正确地执行它,那么你的努力几乎总是白费。正如我们在第 4 步中讨论的有关设置适当环境的内容,如果您切换存储类别或处理器速度,这些可能会完全破坏测试结果的有效性。不幸的是,环境相对容易获得正确,特别是与第六步(创建负载)相比。这是迄今为止最难、最有可能让您或您的团队搞乱流程并导致错误或不准确结果的地方。准确捕捉和重放真实的用户行为是非常非常困难的。正如我们所讨论的,这通常需要同步缓存和存储(例如数据库或文件)内的数据,因为不一致会使用不同的代码路径并呈现不准确的结果。此外,从容量的角度来看,创建非常大的负载本身通常会出现问题,特别是在尝试测试多个服务的交互性时。出于这些原因,我们警告您使用压力测试作为您唯一的安全网。正如我们将在下一章中讨论执行/不执行决策和回滚一样,您必须拥有多个安全阀,以应对出现问题或灾难发生的情况。我们还将在第三部分“架构可扩展解决方案”中详细介绍这个主题,并讨论如何使用泳道和其他应用程序拆分方法来提高可扩展性和稳定性。 As we stated in the beginning, the purpose of stress testing is to determine anapplication’s stability when subjected to above normal loads. Differentiated fromload testing, where the load is only as much as specified, in stress testing we go wellbeyond this to the breaking point and watch the failure and the recovery of the ser-vice or application. To more thoroughly understand the stress testing process, wecovered an eight-step process starting with defining objectives and ending with ana-lyzing the data. Each step in the process is critical to ensuring a successful test yield-ing the results that you desire. As with our other processes, we recommend startingwith this one intact and adding to it as necessary for your organization’s needs. 正如我们在开头所述,压力测试的目的是确定应用程序在承受高于正常负载时的稳定性。与负载测试不同,在负载测试中,负载仅达到指定的大小,在压力测试中,我们远远超出了这个极限,并观察服务或应用程序的故障和恢复。为了更全面地了解压力测试过程,我们介绍了一个八步过程,从定义目标开始到分析数据结束。该过程中的每一步对于确保成功测试并获得您想要的结果都至关重要。与我们的其他流程一样,我们建议从完整的流程开始,并根据您组织的需求进行必要的添加。 ####Performance and Stress Testing for Scalability 可扩展性的性能和压力测试 We usually lead off our chapters with the rhetorical question of how a particular pro-cess could possibly have anything to do with scalability. This time, we’ve waited untilwe covered the processes in depth to have this discussion; hopefully, as a result, youcan already start listing the reasons that performance testing and stress testing have agreat place among the multitude of factors that affect scalability. The three areas thatwe are going to focus on for exploring the relationship are the headroom, changecontrol, and managing risk. 我们通常会用一个反问句来开始我们的章节:一个特定的过程如何可能与可扩展性有任何关系。这一次,我们等到深入讨论了流程之后才进行讨论;希望您已经可以开始列出性能测试和压力测试在影响可扩展性的众多因素中占有重要地位的原因了。我们将重点探索这种关系的三个领域是净空、变更控制和管理风险。 As we discussed in Chapter 11, Determining Headroom for Applications, it is crit-ical to scalability that you know where you are in terms of capacity for a particularservice within your system. This is for you to calculate how much time and growthyou have left to scale. This is fundamental for planning headroom or infrastructureprojects, splitting databases/applications, and making budgets. The way to ensureyour calculations remain accurate is to conduct performance testing on all yourreleases to ensure you are not introducing unexpected load increases. It is not uncom-mon for an organization to implement a maximum load increase allowed per release.As you start to become more sophisticated in capacity planning, you will come to seethe load added by new features and functionality as a cost that must be accounted forin the cost/benefit analysis. Additionally, stress testing is necessary to ensure that theexpected breakpoint or degradation curve is still at the same point as previously iden-tified. It is possible to leave the normal usage load unchanged but decrease the totalload capacity through new code paths or changes in logic. For instance, an increasein a data structure lookup of 90 milliseconds would likely be unnoticed in totalresponse time for a user’s request, but if this service is tied synchronously to otherservices, as the load builds, hundreds or thousands of 90-millisecond delays adds upto decrease the peak capacity that services can handle. 正如我们在第 11 章“确定应用程序的余量”中讨论的那样,了解系统内特定服务的容量对于可扩展性至关重要。这是为了让你计算你还剩下多少时间和成长空间。这对于规划净空或基础设施项目、拆分数据库/应用程序以及制定预算至关重要。确保计算保持准确的方法是对所有版本进行性能测试,以确保不会引入意外的负载增加。对于组织来说,实现每个版本允许的最大负载增加并不罕见。当您开始在容量规划方面变得更加成熟时,您将发现新特性和功能增加的负载是必须考虑的成本。成本/效益分析。此外,压力测试是必要的,以确保预期的断点或退化曲线仍处于先前确定的同一点。可以保持正常使用负载不变,但通过新的代码路径或逻辑更改来减少总负载容量。例如,在用户请求的总响应时间中,数据结构查找时间的增加可能不会被注意到,但如果此服务与其他服务同步绑定,随着负载的增加,数百或数千个 90 毫秒的延迟加起来就会减少服务可以处理的峰值容量。 When we talk about change management, as defined in Chapter 10, ControllingChange in Production Environments, we are really discussing more than the lightweightchange identification process for small startup companies, but instead the fuller featuredprocess by which a company is attempting to actively manage the changes that occur intheir production environment. We defined change management as consisting of the fol-lowing components: change proposal, change approval, change scheduling, changeimplementation and logging, change validation, and change efficacy review. Performancetesting and stress testing augment this change management process by providing a prac-tice implementation and most importantly a validation of the change. You would neverexpect to make a change without verifying that it actually affected the system the waythat you think it should, such as fix a bug or provide a new piece of functionality. As partof performance and stress testing, we validate the expected results in a controlled envi-ronment prior to production. This is an additional step in ensuring that when the changeis made in production it will also work as it did during testing under varying loads. 当我们谈论变更管理时,如第 10 章“控制生产环境中的变更”中所定义,我们实际上讨论的不仅仅是小型初创公司的轻量级变更识别流程,而是公司尝试主动管理变更的更全面的流程。发生在他们的生产环境中。我们将变更管理定义为由以下组件组成:变更提案、变更审批、变更计划、变更实施和记录、变更验证和变更效力审核。性能测试和压力测试通过提供实践实施以及最重要的是对变更的验证来增强此变更管理流程。如果没有验证更改是否确实按照您认为应该的方式影响系统,您永远不会期望进行更改,例如修复错误或提供新功能。作为性能和压力测试的一部分,我们在生产前在受控环境中验证预期结果。这是一个额外的步骤,以确保当在生产中进行更改时,它也能像在不同负载下的测试期间一样工作。 The most significant factor that we should consider when relating performancetesting and stress testing to scalability is the management of risk. As outlined inChapter 16, Determining Risk, risk management is one the most important processeswhen it comes to ensuring your systems will scale. The precursor to risk managementis risk analysis, which attempts to calculate an amount of risk in various actions orcomponents. Performance testing and stress testing are two methods that can signifi-cantly decrease the risk associated with a particular service change. For example, ifwe were using a failure mode and effects analysis tool and identified a failure modeof a particular feature to be the increase in query time, the mitigation recommendedcould be to test this feature under actual load conditions, as with a performance test,to determine the actual behavior. This could also be done with extreme load condi-tions as with a stress test to observe behavior above normal conditions. Both of thesewould provide much more information with regard to the actual performance of thefeature and therefore would lower the amount of risk. These two testing processesare powerful tools when it comes to reducing and thus managing the amount of riskwithin the release or the overall system. 将性能测试和压力测试与可扩展性联系起来时,我们应该考虑的最重要因素是风险管理。正如第 16 章“确定风险”中所述,在确保系统可扩展性方面,风险管理是最重要的流程之一。风险管理的前身是风险分析,它试图计算各种行动或组成部分的风险量。性能测试和压力测试是两种可以显着降低与特定服务变更相关的风险的方法。例如,如果我们使用故障模式和影响分析工具,并将特定功能的故障模式确定为查询时间的增加,则建议的缓解措施可能是在实际负载条件下测试此功能,就像性能测试一样,以确定实际行为。这也可以在极端负载条件下完成,例如通过压力测试来观察高于正常条件的行为。这两者都将提供有关该功能的实际性能的更多信息,因此会降低风险。当涉及到减少并管理发布或整个系统内的风险量时,这两个测试过程是强大的工具。 From these three areas, headroom, change control, and risk management, we cansee the inherent relationship between successful scalability of your system and theadoption of the performance and stress testing processes. As we cautioned previouslyin the discussion of the stress test, the creation of the test load is not easy, and if donepoorly can lead to erroneous data. However, this does not mean that it is not worthpursuing the understanding, implementation, and (ultimately) mastery of theseprocesses. 从余量、变更控制和风险管理这三个领域,我们可以看到系统的成功可扩展性与性能和压力测试流程的采用之间的内在关系。正如我们之前在压力测试的讨论中所警告的,测试负载的创建并不容易,如果做得不好可能会导致错误的数据。然而,这并不意味着不值得去理解、实施和(最终)掌握这些过程。 ####Conclusion 结论 In this chapter, we discussed in detail the performance testing and stress testing pro-cesses. We also discussed how these processes related to scalability for the system. 在本章中,我们详细讨论了性能测试和压力测试过程。我们还讨论了这些过程如何与系统的可扩展性相关。 For the performance testing process, we defined a seven-step process. The key to theprocess is to be methodical and scientific about the testing. 对于性能测试过程,我们定义了一个七步过程。这个过程的关键是测试的方法论和科学性。 For the stress testing process, we defined an eight-step process. These were thebasic steps we felt necessary to have a successful process. It was suggested that othersteps be added as necessary for the proper fit within your organization. 对于压力测试流程,我们定义了八步流程。我们认为这些是成功流程所必需的基本步骤。建议根据需要添加其他步骤,以适合您的组织。 We concluded this chapter with a discussion on how performance testing andstress testing fit with scalability. We concluded that based on the relationship betweenthese testing processes and three factors (headroom, change control, and risk man-agement), that have already been established as being causal to scalability, these pro-cesses too are directly responsible for scalability. 我们在本章结束时讨论了性能测试和压力测试如何与可扩展性相结合。我们得出的结论是,根据这些测试流程与三个因素(净空、变更控制和风险管理)之间的关系,这些因素已经被确定为可扩展性的因果关系,这些流程也直接负责可扩展性。 #####Key Points 关键点 * Performance testing covers a broad range of engineering evaluations where theemphasis is on the final measurable performance characteristic. * 性能测试涵盖广泛的工程评估,重点是最终可测量的性能特征。 * The goal of performance testing is to identify, document, and where possibleeliminate bottlenecks in the system. * 性能测试的目标是识别、记录并尽可能消除系统中的瓶颈。 * Load testing is a process used in performance testing. * 负载测试是性能测试中使用的过程。 * Load testing is the process of putting load or user demand on a system in orderto measure its response and stability. * 负载测试是对系统施加负载或用户需求以测量其响应和稳定性的过程。 * The purpose of load testing is to verify that the application can meet a desiredperformance objective often specified as a service level agreement (SLA). * 负载测试的目的是验证应用程序是否能够满足通常指定为服务级别协议 (SLA) 的所需性能目标。 * Load and performance testing are not substitutes for proper architecture. * 负载和性能测试不能替代正确的架构。 * The seven steps of performance testing are as follows: * 性能测试的七个步骤如下 1.Establish the criteria expected from the application. 1.建立申请所期望的标准。 2.Establish the proper testing environment. 2.建立适当的测试环境。 3.Define the right test to perform. 3.定义要执行的正确测试。 4.Execute the tests. 4.执行测试。 5.Analyze the data. 5.分析数据。 6.Report to the engineers. 6.向工程师汇报。 7.Repeat as necessary. 7.根据需要重复。 * Stress testing is a process that is used to determine an application’s stabilitywhen subjected to above normal loads. * 压力测试是一个用于确定应用程序在承受高于正常负载时的稳定性的过程。 * Stress testing, as opposed to load testing, goes well beyond the normal traffic,often to the breaking point of the application, in order to observe the behaviors. * 与负载测试相反,压力测试远远超出正常流量,通常达到应用程序的临界点,以便观察行为。 * The eight steps of stress testing are as follows: * 压力测试的八个步骤如下 1.Identify the objectives of the test. 1.确定测试的目标。 2.Choose the key services for testing. 2.选择重点服务进行测试。 3.Determine how much load is required. 3.确定需要多少负载。 4.Establish the proper test environment.5.Identify what must be monitored. 4.建立适当的测试环境。5.确定必须监控的内容。 6.Actually create the test load. 6.实际创建测试负载。 7.Execute the tests. 7.执行测试。 8.Analyze the data. 8.分析数据。 * Performance testing and stress testing impact scalability through the areas ofheadroom, change control, and risk management. * 性能测试和压力测试通过余量、变更控制和风险管理领域影响可扩展性。
没有评论