RPC | Performance

RPC and Permance are the two topics we will cover today.

Fundamental abstractions

Which were the three fundamental abstractions?

Memory abstraction, interpreter abstraction, and communication link abstraction.

What were their APIs?

Memory -> READ(Name) \ WRITE(Name, Value)

Interpreters -> loop(print(eval(read)))

Communication links -> SEND(linkName, ourgoingMessageBuffer), RECEIVE(linkName, incomingMessageBuffer)

Must these abstractions be implemented in a single node or can they be distributed? Give an example.

Yes. For memory, the key-value storage is a distributed example.

What should we learn today?

Interpreter

There is one kind of interpreter one own layered program! (RPCs)

Layers and Modules

Why do we need layers and modules?

Interpreters are often organized in layers.

Modules: Components that can be separately designed/implemented/replaced. “Instructions” of higher-level interpreters. Recursive: can be whole interpreters themselves!

Example:

Twitter

If the user is tweeting away, the whole server plus the front end is sort of an interpreter. As the user gives the input, then, it gives you a set of commands to work with to play this.

Of course, first, you know this is not the single program that is running when you interact in this Twitter assets and entire layouts services and complications and all of these services have their own set of commands and old language that they understand and in the sense. Twitter is the entirety of these multiple interpreters structured together in a hierarchical way.

Continued example: Normally, the website may be Facebook, has a lot of components that implement very different algorithms and somehow need to interact with each other the main thing you want to avoid when building the system is the failure of one component, let’s say the component of all such services failure of this component should not bring down the entire system. We want to isolate failures and make the entire system fault-tolerant. So, how can we isolate errors?

Isolating Errors - Enforced Modularity

The basic idea of isolating errors is to use clients and services.

One way to separate concerns. Restrict the communication to messages only, Client request/Service respond (or reply), Conceptually client and service in different computers

The other way to separate concerns is to further sandbox these applications into different parts of the operating system so that they definitely cannot interact with each other as memory, even if they are running on the same machine, this is called virtualization so that it contains the components into sandboxes where they cannot harm anyone outside of the sandbox.

The above two techniques are very normal to use. In this course, we would talk more about Remote Procedure Call (RPC).

RPC - Remote Procedure Call

WebService is a very good example to apply RPC.

RPC is a way for clients and services to communicate with each other. Maybe breaking slightly the abstraction that communication only works by messages. RPC pretends that the two applications do run on the same computer even though in reality they don’t but they try to hide the fact that messages are being sent over the network from the developers from these applications (not fully understood here).

Here is a good explanation of RPC in Chinese. Referenced Here: https://www.jianshu.com/p/7d6853140e13. Wikipedia Definition of Marshall

The other explanation for RPC

In the context of RPC, a stub is a piece of code that acts as a client for remote service. It is responsible for marshaling (converting) the parameters of the call into a format that can be transmitted over the network to the remote service, and for unmarshalling (converting) the results of the call back into a form that can be understood by the calling program.

Stubs are used in RPC to allow client programs to call remote functions as if they were local, without having to worry about the details of network communications or the implementation of the remote service. This makes it easier to write distributed applications that can make use of services provided by other programs on different machines.

My understanding is, first of all, the aim of RPC is to call the services of a remote server as if it were a local function. The client calls a remote function which is GET_CONNECTIONS(uid) which has been mentioned above, then, the client part calls the local (client) stub, which will then package (Marshall) the parameters of the function call into a message and send it over the network to the server. The server will receive the message and the server stub will unpackage the marshaler requests. Then execute the requested function, and return the results back after the service stub marshaled reply, to the client in the form of another message. The client’s stub will then receive this message, extract the results, and pass them back to the calling program.

**Difference between RPC and local procedure call** and what can go wrong when a client crashes? What about when a server crashes?

By using RPC, the probability of failure occurs that brings down the whole system gets reduced as they are individuals.
RPC has longer latency.
RPC introduced a new failure mode which is “no-response”.
Some programming languages will not fit well or work well with RPC mode.

RPC semantics

Even though we can use the method timeout to address the case of no response for a service, it is useless sometimes as what we can know from it is only there is something wrong with when performing the service. Hence, we still have some other approaches to address no-response cases.

At-least-once (Because the functions are pure, it is no worry to think about whether they will cause any side effects. Hence, keep sending the requests, at least receive one or more than one response.)

Operation is idempotent. Naturally occurs if side-effect free

Stub just retries the operation -> failures can still occur!

Example: calculate SQRT

At-most-once (It is possible to set the timeout of the service when there is no response to the client, the server replies to the error to the client, and then, the money may or may not be there. We do not need to send requests anymore.)

Operation does have side-effects

Stub must ensure duplicate-free transmission

Example: transfer $100 from my account to yours

Exactly-once (Ideal case, impossible to implement.)

Possible for certain classes of failures

Stub & service keep track (durably) of requests and responses

Example: bank cannot develop amnesia!

RPC and Naming

Typically, we need to take care of the naming when using the RPCs. We want to know where the service sits and normally there is standard integration with the name of the directory service.

The server first registers the service with a name into the directory service. Then, the client will look up where the service sits. The directory service will return an address to the client. Then send the request with the received address and get the response.

Advantages

The directory for dev can be independent of the directory for deployment.
more flexible as if one of the services failed, then go to the directory to find the alternatives.
This directory can get extended, for example, DNS can host more than one record.

RPCs over HTTP

Scenario: Services are widely exposed on the web, accessible via HTTP.

Why is this a good idea?

Discuss how the following features of HTTP affect service interfaces, if at all:

Proxies: Country Security, A good design should confine the message transmitted in a specific area.

Persistent connections: The duplicated authentication is not needed, the service can focus on its service.

Caching: Similar to the part of memory (RAM). The cache stores the most popular requests and the answers, then it can reply to the clients faster.

Common Issues in Designing Services

Consistency

How to deal with updates from multiple clients?

Coherence

How to refresh caches while respecting consistency?

Scalability

What happens to resource usage if we increase the #clients or the #operations?

Fault Tolerance

Under what circumstances will the service be unavailable?

Performance

Abstractions, Implementation and Performance

Let I1 and I2 be two implementations of an abstraction

Examples

Web service with or without proxies

Virtual memory with or without paging

Transactions via concurrency or serialization

So, how do we choose between I1 and I2?

Performance Metrics

What do these metrics mean?

Capacity is a consistent measure of a service’s size or amount of resources.

Utilization is the percentage of capacity of a resource that is used for some given workload of requests.

Overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to perform a specific task.

Latency is the delay between a change at the input to a system and the corresponding change at its output. From the client/service perspective, the latency of a request is the time from issuing the request until the time the response is received from the service.

Throughput is a measure of the rate of useful work done by a service for some given workload of requests. In the camera example, the throughput we might care about is how many frames per second the system can process because it may determine what quality camera we want to buy.

How many operations the service can process in a given unit of time?

Example of Throughput

Consider a computer system with two stages: one that is able to process data at a rate of 1,000 kilobytes per second and a second one at a rate of 100 kilobytes per second. If the fast stage generates one byte of output for each byte of input, the overall throughput must be less than or equal to 100 kilobytes per second. If there is negligible overhead in passing requests between the two stages, then the throughput of the system is equal to the throughput of the bottleneck stage, 100 kilobytes per second. In this case, the utilization of stage 1 is 10% and that of stage 2 is 100%.

When a stage processes requests serially, the throughput and the latency of a stage are directly related. The average number of requests a stage handles is inversely proportional to the average time to process a single request:

\[throughput = \frac{1}{latency}\]

However, it does not hold when there is concurrency. It only works on only single thread.

Scalability is the measure of a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands. Examples would include how well a hardware system performs when the number of users is increased, how well a database withstands growing numbers of queries, or how well an operating system performs on different classes of hardware. Enterprises that are growing rapidly should pay special attention to scalability when evaluating hardware and software.

Common Issues with Performance Metrics

Properties of resources and/or services

Utilization, Capacity

How many requests or how many ports can be accepted or held? => Capacity
Overhead, throughput, latency
Scalability

Considering scalability alone is dangerous.

Here is an example that choosing B might be the best option.

Riding Moore’s Law

CPU clock speed does not get better anymore.

Memory size, Memory bandwidth, disk size, and disk bandwidth are still improved a lot (50%), while memory latency (1%) and disk latency go slower (10%).

Performance and Hardware Trends

This is very close to the real computer. Hence, we need to think about how to make the most usage of hardware like cache.

Storage hierarchy

The ratios between different levels of storage (memory) are not changed that much, and if we access the RAM, it is extremely fast. Oppositely, if we access the disk, it takes a quite long time.

What does that do to random access?

Actually what we call Random Access Memory actually behaves as Not-Quite-So-Random Access Memory because of the memory hierarchy. Access to a nearby cell is much faster than to a faraway cell.

General Techniques to improve performance

If considering the clients and services architecture, fast-path coding will be one way. A designer may observe that certain requests are more common than other requests, and use that observation to improve the performance of frequent operations by splitting the staged pipeline into a fast path for frequent requests and a slow path for other requests.

Fast-path Coding

Fast-path coding splits processing into two code paths.

One optimized path for common requests is a fast path.

One slow but comprehensive path for all other requests is a slow path.

Caching is an example of fast-path coding.

Batching

It runs multiple requests at once. Collect requests and execute them only once. Batch-I/Os is a good example of batching with the Elevator algorithm. It may improve latency and throughput.

批处理是指计算机用来周期性地完成大量重复数据作业的方法。某些数据处理任务（如备份、筛选和排序）可能需要大量计算，而且在单个数据事务上运行效率很低。相反，数据系统通常可以在计算资源更普遍可用的非高峰时间批量处理这些任务，例如一天结束时或夜间。例如，考虑一个全天接收订单的电子商务系统。系统可能会在每天结束时收集所有订单，并用一个批处理与订单履行团队共享，而不是在发生时处理每个订单。

Reference: https://aws.amazon.com/cn/what-is/batch-processing/

Dallying

It is delaying a request on the change that the operation won’t be needed, or to create more opportunities for batching. For example, a stage may delay a request that overwrites a disk block in the hope that a second one will come along for the same block. If a second one comes along, the stage can delete the first request and perform just the second one. As applied to writes, this benefit is sometimes called write absorption.

Dallying also increases the opportunities for batching. It purposely increases the latency of some requests in the hope that more requests will come along that can be combined with the delayed requests to form a batch. In this case, dallying increases the latency of some requests to improve the average latency of all requests.

A key design question in dallying is to decide how long to wait. There is no generic answer to this question. The costs and benefits of dallying are application and system specific.

Typical Example: Group Git

Parallelism

Increase the performance of hardware and the number of cores.

Concurrency

Run multiple requests in different threads

Example: different web requests run in different threads or even servers

May improve both throughput and latency, but must be careful with locking correctness.

Speculation

Predict the future. Guess the next requests and run them in advance. Prefetching is an example. May overlap expensive operations, instead of waiting for their completion.