Subroutines and Control Abstraction

Subroutines are the principal mechanism for control abstraction in most programming languages.

  • A subroutine that returns a value is usually called a function.
  • A subroutine that does not return a value is usually called a procedure.

Review of Stack Layout(栈内存布局)

the stack pointer register contains the address of either the last used location at the top of the stack, or the first unused location

The frame pointer register contains an address within the frame.

Calling Sequences

Maintenance of the subroutine call stack is the responsibility of the calling sequence

Tasks that must be accomplished on the way into a subroutine include:

  • passing parameters
  • saving the return address
  • changing the program counter
  • changing the stack pointer to allocate space
  • saving registers (including the frame pointer) that contain important values and that may be overwritten by the callee
  • changing the frame pointer to refer to the new frame
  • executing initialization code for any objects in the new frame that require it

Tasks that must be accomplished on the way out include:

  • passing return parameters or function values
  • executing finalization code for any local objects that require it
  • deallocating the stack frame (restoring the stack pointer)
  • restoring other saved registers (including the frame pointer)
  • restoring the program counter
Saving and Restoring Registers

函数调用最棘手的部分就是保存和恢复寄存器。

Perhaps the trickiest division-of-labor issue pertains to saving registers.

A simpler solution is for the caller to save all registers that are in use, or for the callee to save all registers that it will overwrite.

调用约定:

strike something of a compromise: registers not reserved for special purposes are divided into two sets of approximately equal size. One set is the caller’s responsibility, the other is the callee’s responsibility. A callee can assume that there is nothing of value in any of the registers in the caller-saves set; a caller can assume that no callee will destroy the contents of any registers in the callee-saves set.

Maintaining the Static Chain

In languages with nested subroutines,at least part of the work required to maintain the static chain must be performed by the caller,rather than the callee,because this work depends on the lexical nesting depth of the caller.

A Typical Calling Sequence

The caller:

  1. saves any caller-saves registers whose values will be needed after the call
  2. computes the values of arguments and moves them into the stack or registers
  3. computes the static link (if this is a language with nested subroutines), and passes it as an extra, hidden argument
  4. uses a special subroutine call instruction to jump to the subroutine, simultaneously passing the return address on the stack or in a register

the callee:

  1. allocates a frame by subtracting an appropriate constant from the sp
  2. saves the old frame pointer into the stack, and assigns it an appropriate new value
  3. saves any callee-saves registers that may be overwritten by the current routine (including the static link and return address, if they were passed in registers)

After the subroutine has completed, the epilogue:

  1. moves the return value (if any) into a register or a reserved location in the stack
  2. restores callee-saves registers if needed
  3. restores the fp and the sp
  4. jumps back to the return address

Finally, the caller:

  1. moves the return value to wherever it is needed
  2. restores caller-saves registers if needed

Displays

One disadvantage of static chains is that access to an object in a scope k levels out requires that the static chain be dereferenced k times.

This number can be reduced to a constant by use of a display.

为了优化这一过程,可以引入一个叫做 display 的数据结构。display 是一个数组,其中的每个元素都是一个指针,指向不同嵌套层级的活动记录(activation record)。当进入一个函数时,编译器会更新 display 来反映当前的调用环境。具体来说,display[i] 会指向第 i 层嵌套的最近活动记录。

使用 display 可以直接通过数组索引快速定位到任何层级的活动记录,从而让访问外层变量的操作更加高效。这种方法减少了通过多个静态链指针进行跳转的需要,因此可以显著提高程序的运行速度,尤其是在函数嵌套层次较深的情况下。

Case Studies: C on the MIPS; Pascal on the x86

Calling sequences differ significantly from machine to machine and even compiler tocompiler

  • Compilers for CISC machines tend to pass arguments on the stack; compilers for RISC machines tend to pass arguments in registers.
  • Compilers for CISC machines usually dedicate a register to the frame pointer; compilers for RISC machines often do not.
  • Compilers for CISC machines often rely on special-purpose instructions to implement parts of the calling sequence; available instructions on a RISC machine are typically much simpler.

Register Windows

As an alternative to saving and restoring registers on subroutine calls and returns, the original Berkeley RISC machines introduced a hardware mechanism known as register windows.

The basic idea is to map the ISA’s limited set of register names onto some subset (window) of a much larger collection of physical registers, and to change the mapping when making subroutine calls.

In-Line Expansion

many language implementations allow certain subroutines to be expanded in-line at the point of call:

A copy of the “called” routine becomes a part of the “caller”; no actual subroutine calloccurs.

In-line expansion avoids a variety of overheads,including:

  • space allocation,
  • branch delays from the call and return,
  • maintaining the static chain or display,
  • and (often) saving and restoring registers.

It also allows the compiler to perform code improvements such as:

  • global register allocation
  • instruction scheduling
  • common subexpression elimination across the boundaries between subroutines

Parameter Passing

Most subroutines are parameterized: they take arguments that control certain aspects of their behavior, or specify the data on which they are to operate.

Parameter names that appear in the declaration of a subroutine are known as formal parameters.

Variables and expressions that are passed to a subroutine in a particular call are known as actual parameters.

Parameter Modes

The two most common parameter-passing modes, called:

  • call-by-value
  • call-by-reference

call-by-value只要在函数返回时把参数的值写回到调用方,就可以实现和call-by-reference类似的效果

Call-by-sharing

不是值传递。因为:

if we modify the object to which the formal parameter refers, the program will be able to see those changes through the actual parameter after the subroutine returns

也不是引用传递,因为:

although the called routine can change the value of the object to which the actual parameter refers, it cannot change the identity of that object.

Call-by-sharing is thus commonly implemented the same as call-by-value for objects of immutable type.

The Purpose of Call-by-Reference
  • 需要修改参数
  • 传递地址比复制参数节约时间

Call-by-Name

Explicit subroutine parameters are not the only language feature that requires a closure to be passed as a parameter.

In general, a language implementation must pass a closure whenever the eventual use of the parameter requires the restoration of a previous referencing environment.

Special-Purpose Parameters

  • Conformant Arrays
  • Default (Optional) Parameters
  • Named Parameters: A(name=’xxx’, age=24)
  • Variable Numbers of Arguments: fun(string…)

Generic Subroutines and Modules

需要泛型的原因:

In a language like Pascal or Fortran, this static declaration of item type means that the programmer must create separate copies of enqueue and dequeue for every type of item, even though the entire text of these copies (other than the type names in the procedure headers) is the same.

Implementation Options

Generics can be implemented several ways.

  • the compiler creates a separate copy of the code for every instance
  • guarantees that all instances of a given generic will share the same code at run time.

Generic Parameter Constraints(泛型约束)

避免使用隐式泛型参数:

To avoid surprises, it is best to avoid implicit use of the operations of a generic parameter type.

Exception Handling

exception handling generally requires the language implementation to “unwind” the subroutine call stack.

try catch语法:

all provide exception-handling facilities in which handlers are lexically bound to blocks of code, and in which the execution of the handler replaces the yet-to-be-completed portion of the block.

In practice, exception handlers tend to perform three kinds of operations:

  • First, ideally, a handler will compensate for the exception in a way that allows the program to recover and continue execution.
  • Second, when an exception occurs in a given block of code but cannot be handled locally, it is often important to declare a local handler that cleans up any resources allocated in the local block, and then “reraises”the exception, so that it will continue to propagate back to a handler that can (hopefully) recover.
  • Third, if recovery is not possible, a handler can at least print a helpful error message before the program terminates.

Defining Exceptions

In many languages, dynamic semantic errors automatically result in exceptions, which the program can then catch. The programmer can also define additional, application-specific exceptions.

Most languages use a throw or raise statement,embedded in an if statement, to raise an exception at run time.

已知和未知异常:

If a subroutine raises an exception but does not catch it internally, it may “return” in an unexpected way.

include in each subroutine header a list of the exceptions that may propagate out of the routine.

Unchecked exceptions are typically run-time errors that most programs will want to be fatal

Exception Propagation(异常传播)

When an exception arises, the handlers are examined in order; control is transferred to the first one that matches the exception.

Implementation of Exceptions

The most obvious implementation for exceptions maintains a linked-list stack of handlers. When control enters a protected block, the handler for that block is added to the head of the list.

Coroutines

vs continuation:

a continuation is a constant—it does not change once created—while a coroutine changes every time it runs.

coroutines are execution contexts that exist concurrently, but that execute one at a time, and that transfer control to each other explicitly, by name. Coroutines can be used to implement iterators and threads.

Events

An event is something to which a running program (a process) needs to respond, but which occurs outside the program, at an unpredictable time.

事件和回调:

Instead, the programmer usually wants a handler—a special subroutine—to be invoked when a given event occurs. Handlers are sometimes known as callback functions,because the run-time system calls back into the main program instead of being called from it.

Summary and Concluding Remarks

这一章主要关注了控制抽象的主题,特别是子程序。子程序允许程序员将代码封装在一个狭窄的接口后面,然后可以不考虑其实现方式进行使用。控制抽象对于任何大型软件系统的设计和维护都至关重要。从审美的角度来看,像Lisp和Smalltalk这样的语言中,内置和用户定义的控制结构使用相同的语法,这使得控制抽象特别有效。

我们在8.1节开始研究子程序,首先回顾了子程序调用堆栈的管理。然后我们考虑了用于维护堆栈的调用序列,PLP CD的额外部分专门讨论了展示;MIPSpro C编译器和GNU x86 Pascal编译器(gpc)的案例研究;以及SPARC的寄存器窗口。在简要考虑内联扩展之后,我们在8.3节转向了参数的主题。我们首先考虑了参数传递模式,所有这些模式都是通过传递值、引用或闭包来实现的。我们注意到,语义清晰和实现速度的目标有时会有冲突:通常通过引用传递大参数最有效,但是由此产生的别名可能会导致程序错误。在8.3.3节,我们考虑了特殊的参数传递机制,包括一致的数组、默认(可选)参数、命名参数和可变长度的参数列表。我们注意到,默认和命名参数提供了一种对动态范围的有吸引力的替代方案。在8.4节,我们考虑了泛型子程序和模块的设计和实现。泛型允许在编译时将控制抽象参数化,以参数的类型而不仅仅是它们的值为基础。

在最后的三个主要部分,我们考虑了异常处理机制,这些机制允许程序以良好的结构方式从嵌套的子程序调用序列中“解开”;协程,它允许程序维护(并在两个或更多执行上下文之间切换);以及事件,它允许程序响应异步外部活动。在PLP CD上,我们解释了协程如何用于离散事件模拟。我们还注意到,它们可以用来实现迭代器,但在这里存在更简单的替代方案。在第12章,我们将基于协程来实现线程,这些线程并行运行(或看起来并行运行)。

在几个情况下,我们可以看出关于语言应该提供哪些类型的控制抽象的观点正在形成共识。像Fortran和Algol 60这样的语言的有限参数传递模式已被更广泛或灵活的选项取代。Ada和C++等语言中,标准的位置记号法已被默认参数和命名参数所增强。较少结构化的错误处理机制,如标签参数、非局部goto和动态绑定处理器,已被结构化的异常处理器取代,这些异常处理器在子程序内部进行词法范围处理,并且在常见(无异常)情况下可以零成本实现。传统的信号处理机制中的自发子程序调用已被专用线程中的回调取代。在许多情况下,实现这些新特性需要编译器和运行时系统变得更复杂。偶尔,如call-by-name参数、标签参数或非局部goto的情况,语义上令人困惑的特性也难以实现,放弃它们使编译器变得更简单。在其他情况下,一些有用但难以实现的语言特性仍然在一些语言中出现,但在其他语言中则不出现。这一类别的例子包括一等子程序、协程、迭代器、续延和具有无限范围的局部对象。

DataTypes

Types serve two principal purposes:

  • Types provide implicit context for many operations, so that the programmer does not have to specify that context explicitly. 让编译器知道是整数还是浮点数加减、自定义类型申请正确的堆空间、执行用户自定义的类型构造器(constructor)
  • Types limit the set of operations that may be performed in a semantically valid program.

Type Systems

硬件层面所有bit没有区别:

Computer hardware can interpret bits in memory in several different ways: as instructions, addresses, characters, and integer and floating-point numbers of various lengths.

高级语言必须使用类型系统来提供某一块bit的类型上下文信息和错误检查(整数加浮点数。。。)

High-level languages, on the other hand, almost always associate types with values, to provide the contextual information and error checking alluded to above.

a type system consists:

  • a mechanism to define types and associate them with certain language constructs
  • a set of rules for type equivalence, type compatibility, and type inference.

Type Checking

Type checking is the process of ensuring that a program obeys the language’s type compatibility rules.

几种分类:

  • 强类型
  • 弱类型
  • 静态类型
  • 动态类型
  • 编译时绑定
  • 运行时绑定

Polymorphism(多态)

Polymorphism allows a single body of code to work with objects of multiple types.

多态、动态类型会造成运行时消耗并且延迟暴露类型错误

explicit parametric polymorphism(泛型)

The Meaning of “Type”

There are at least three ways to think about types, which we may call the denotational, constructive, and abstraction-based points of view.

  • denotational(表示意义):A value has a given type if it belongs to the set
  • constructive(构造):a type is either one of a small collection of built-in types, or a composite type created by applying a type constructor
  • abstraction-based(基于抽象):a type is an interface consisting of a set of operations with well-defined and mutually consistent semantics

Types are domains, and the meaning of an expression is a value from the domain that represents the expression’s type.

One of the nice things about the denotational view of types is that it allows us in many cases to describe user-defined composite types.

Classification of Types

Most languages provide built-in types similar to those supported in hardware by most processors: integers, characters, Booleans, and real (floating-point) numbers.

  • Numeric Types
  • Enumeration Types
  • Subrange Types: 0..100
  • Composite Types
    • Records (structures)
    • Variant records (unions)
    • Arrays
    • Sets
    • Pointers
    • Lists
    • Files

Orthogonality(正交性)

Orthogonality is equally important in type system design. A highly orthogonal language tends to be easier to understand, to use, and to reason about in a formal way.

Type Checking

Type Equivalence

there are two principal ways of defining type equivalence.

  • Structural equivalence is based on the content of type definitions(结构一样就一样)
  • Name equivalence is based on the lexical occurrence of type definitions(结构一样换个名字也不一样)

Type Compatibility

Most languages do not require equivalence of types in every context. Instead, they merely say that a value’s type must be compatible with that of the context in which it appears.

In general, modern compiled languages display a trend toward static typing and away from type coercion.

For systems programming,or to facilitate the writing of general-purpose container (collection) objects (lists, stacks, queues, sets, etc.) that hold references to other objects, several languages provide a universal reference type.

Records (Structures) and Variants (Unions)

Record types allow related data of heterogeneous types to be stored and manipulated together.

Memory Layout and Its Impact

The fields of a record are usually stored in adjacent locations in memory. In its symbol table,the compiler keeps track of the offset of each field within each record type.

Variant Records (Unions)

Many allowed the programmer to specify that certain variables (presumably ones that would never be used at the same time) should be allocated “on top of” one another, sharing the same bytes in memory.

Arrays

Arrays are the most common and important composite data types. They have been a fundamental part of almost every high-level language.

接下来的章节包含下面的内容:

  • Syntax and Operations
  • Dimensions, Bounds, and Allocation
    • Stack Allocation
    • Heap Allocation
  • Memory Layout
    • row-major
    • column-major
    • Row-Pointer Layout
    • Address Calculations

Strings

Sets

Pointers and RecursiveTypes

  • Dangling References
  • Garbage Collection
    • Reference Counts
    • Tracing Collection
    • Mark-and-Sweep
    • Pointer Reversal
    • Stop-and-Copy
    • Generational Collection
    • Conservative Collection

Lists

Files and Input/Output

EqualityTesting and Assignment

Summary and Concluding Remarks

这一节结束了我们关于语言设计五个核心章节中的第三章(第一部分的名称、控制流、类型、子程序和类)。在前两节中,我们讨论了类型系统和类型检查的一般问题。在剩余的章节中,我们检查了最重要的复合类型:记录和变体、数组和字符串、集合、指针和递归类型、列表和文件。我们注意到类型有两个主要目的:它们为许多操作提供了隐式上下文,使程序员无需明确指定该上下文,并且它们允许编译器捕捉各种常见的编程错误。类型系统包括一组内置类型、定义新类型的机制以及类型等价、类型兼容性和类型推断的规则。类型等价确定两个名称或值何时具有相同的类型。类型兼容性确定何时可以在“期望”另一类型的上下文中使用某一类型的值。类型推断基于其组件的类型或(有时)周围上下文来确定表达式的类型。如果一个语言从不允许对不支持它的对象进行操作,则称该语言为强类型语言;如果一个语言在编译时强制执行强类型,则称该语言为静态类型语言。

在我们关于类型的一般讨论中,我们区分了表示性、构造性和基于抽象的观点,分别从它们的值、它们的子结构以及它们支持的操作来看待类型。我们为常见的内置类型、枚举、子范围以及常见的类型构造器引入了术语。我们讨论了几种不同的类型等价、兼容性和推断方法,包括(在PLP CD上)对ML的推断规则的详细检查。我们还检查了类型转换、强制和非转换类型转换。在类型等价的领域内,我们对比了结构性和基于名称的方法,指出虽然名称等价似乎越来越受欢迎,但结构等价仍有其拥护者。

在我们对复合类型的调查中,我们花了最多的时间在记录、数组和递归类型上。记录的关键问题包括变体记录的语法和语义、整个记录的操作、类型安全以及每个与内存布局的交互。内存布局对数组也很重要,在其中它与形状的绑定时间相互作用;静态、栈和基于堆的分配策略;数值应用中高效的数组遍历;C中指针和数组的互操作性;以及可用的整个数组和基于切片的操作集。

对于递归数据类型,很多都取决于变量/名称的值模型和引用模型之间的选择。递归类型是引用模型的自然结果;使用值模型时,它们需要指针的概念:其值为引用的变量。从实现角度来看,值与引用之间的区别很重要:将内置类型实现为引用是浪费的,因此具有引用模型的语言通常会以不同方式实现内置和用户定义的类型。Java在语言语义中反映了这种区别,要求内置类型采用值模型,用户定义的类类型的对象采用引用模型。

递归类型通常用于创建链接数据结构。在大多数情况下,这些结构必须从堆中分配。在一些语言中,程序员负责释放不再需要的堆对象。在其他语言中,语言运行时系统自动识别并回收这种垃圾。显式释放是对程序员的一种负担,并会导致内存泄漏和悬挂引用的问题。尽管语言实现几乎从不尝试捕捉内存泄漏(参见探索3.32和练习7.36,不过,有一些关于这个主题的想法),但有时会使用墓碑或锁和钥匙来捕捉悬挂引用。自动垃圾回收可能代价高昂,但已被证明越来越受欢迎。大多数垃圾回收技术要么依赖于引用计数,要么依赖于某种形式的递归探索(追踪)当前可访问的结构。这一类别中的技术包括标记-清扫、停止-复制和分代收集。

语言设计的几个领域中,I/O显示出了极大的变化。我们的讨论(主要在PLP CD上)区分了交互式I/O,这往往非常具有平台特定性,以及基于文件的I/O,后者又细分为临时文件,用于单次程序运行中的大量数据,以及用于离线存储的持久文件。文件还细分为那些以二进制形式表示信息的文件,这些文件模仿内存中的布局,以及那些转换为字符基础文本和从字符基础文本转换回来的文件。与二进制文件相比,文本文件通常会产生时间和空间的开销,但它们具有可移植性和人类可读性的重要优势。

在我们对类型的检查中,我们看到了许多语言创新的例子,这些创新有助于提高程序的清晰度和可维护性,而且通常几乎没有或根本没有性能开销。例子包括用户定义类型的原始想法(Algol 68)、枚举和子范围类型(Pascal)、记录和变体的整合(Pascal)以及Ada中子类型和派生类型之间的区别。在第9章中,我们将检查许多人认为过去30年中最重要的创新,即面向对象。

在某些情况下,语言之间的区别不太是进化的问题,而是哲学上的根本差异。我们已经提到了变量/名称的值模型和引用模型之间的选择。同样地,大多数语言采用了静态类型,但Smalltalk、Lisp和许多脚本语言则与动态类型配合得很好。大多数静态类型语言采用了名称等价,但ML和Modula-3则与结构等价配合得很好。大多数语言已经远离了类型强制转换,但C++却接受了它们:结合运算符重载,它们使得在语言本身之外定义简洁、类型安全的I/O例程成为可能。

正如上一章中所看到的,为了简化编译器,或使编译后的程序更小或更快,一种语言的便利性、正交性或类型安全似乎已经受到了妥协。例子包括大多数语言中缺乏对记录的相等性测试,Pascal和Ada要求记录的变体部分位于末尾的要求,许多语言对集合最大尺寸的限制,C语言中缺乏对I/O的类型检查,以及许多语言实现中通常缺乏动态语义检查。我们还看到了几个至少部分是为了高效实现而引入的语言特性的例子。这些包括打包类型、多长度数值类型、with语句、十进制算术和C风格的指针算术。

与此同时,可以看出语言设计者和用户越来越愿意接受语言实现中的复杂性和成本,以改善语义。这里的例子包括Ada的类型安全变体记录;Java和C#的标准长度数值类型;Icon、Java和C#的变长字符串和字符串操作符;Ada中数组边界的后期绑定;以及Fortran 90中丰富的整个数组和基于切片的数组操作。也可以包括ML的多态类型推断。当然,还应该包括自动垃圾回收的趋势。曾经被认为对于生产质量的命令式语言来说太昂贵的垃圾回收,现在不仅在Clu和Cedar等实验性语言中是标准配置,而且在Ada、Modula-3、Java和C#中也是如此。许多这样的特性,包括变长字符串、切片和垃圾回收,已被脚本语言所采纳。

Control Flow

the language mechanisms used to specify ordering into several categories:

  • Sequencing
  • Selection
  • Iteration
  • Procedural abstraction
  • Recursion
  • Concurrency
  • Exception handling and speculation
  • Nondeterminacy

Expression Evaluation

An expression generally consists of either a simple object (e.g., a literal constant, or a named variable or constant) or an operator or function applied to a collection of operands or arguments, each of which in turn is an expression.

Precedence and Associativity

运算符的优先级和结合性

Assignments

computation typically consists of an ordered series of changes to the values of variables in memory. Assignments provide the principal means by which to make the changes.

side effect

Assignment is perhaps the most fundamental side effect: while the evaluation of an assignment may sometimes yield a value, what we really care about is the fact that it changes the value of a variable, thereby influencing the result of any later computation in
which the variable appears.

References and Values

there are some subtle but important differences in the semantics of assignment in different imperative languages. These differences are often invisible, because they do not affect the behavior of simple programs.

  • 值传递和引用传递
  • 左值和右值
Boxing

基础类型的拆装箱

Initialization

There are several reasons, however, why such initial values may be useful:

  • 静态变量需要初始化
  • 编译器预分配初始化值优化(Java Integer预分配)
  • 使用未初始化变量引起的问题

It should be emphasized that initialization saves time only for variables that are statically allocated.

If a variable is not given an initial value explicitly in its declaration, the language may specify a default value.

Dynamic Checks

Instead of giving every uninitialized variable a default value, a language or implementation can choose to define the use of an uninitialized variable as a dynamic semantic error, and can catch these errors at run time.

Definite Assignment

通过控制流分析静态检测未初始化变量:

This notion is based on the control flow of the program, and can be statically checked by the compiler.

Constructors

构造器默认初始化:

Many object-oriented languages (Java and C# among them) allow the programmer to define types for which initialization of dynamically allocated variables occurs automatically, even when no initial value is specified in the declaration.

Ordering within Expressions

表达式内求值重要的原因:

  • Side effects: 表达式内函数求值副作用
  • Code improvement: 求值顺序和编译器优化

Applying Mathematical Identities(数学恒等式)

表达式优化影响求值顺序会导致精度问题

Short-Circuit Evaluation(短路表达式)

A compiler that performs short-circuit evaluation of Boolean expressions will generate code that skips the second half of both of these computations when the overall value can be determined from the first half.

Structured and Unstructured Flow

汇编中使用的goto(非结构化)编程被高级语言抛弃(结构化编程)

Structured Alternatives to goto

Where once a goto might have been used to escape from the middle of a loop, most modern languages provide a break or exit statement for this purpose.

Multilevel Returns

return和 local goto都可以从当前子程序中返回,但是 nonlocal goto会破坏这种情况,如果直接goto到其他子程序中会破坏当前子程序堆栈,还需要立刻加载另一个子程序的堆栈。如果使用return这些都是在执行到return关键字时才发生的。

Errors and Other Exceptions

需要Exceptions的原因:

In a related and arguably more common situation, a deeply nested block or subroutine may discover that it is unable to proceed with its usual function, and moreover lacks the contextual information it would need to recover in any graceful way.

As a structured alternative, many modern languages provide an exception-handling mechanism for convenient, nonlocal recovery from exceptions.

大多数语言只提供异常机制而不提供从多级调用中直接return的机制(想直接跳出多级调用只能抛异常)。

Continuations

The notion of nonlocal gotos that unwind the stack can be generalized by defining what are known as continuations.

call/cc => 控制流变换:

在 Scheme 中,假设 call/cc 捕捉到的 current continuation 为 cc(位于 lambda 中),如果 cc 作为过程 直接或间接地被调用(即给它传值),call/cc 会立即返回,返回值即为传入 cc 的值。即一旦 current continuation 被调用,控制流会跳到 call/cc 处。因此,利用 call/cc,我们可以摆脱顺序执行的限制,在程序中跳来跳去,非常灵活。

Sequencing

Like assignment, sequencing is central to imperative programming. It is the principal means of controlling the order in which side effects (e.g.,assignments) occur

顺序和副作用,无副作用函数可以进行指令重排提升性能。

Selection

Short-Circuited Conditions

虽然if else语句中包含一个布尔表达式,但是通常不需要将其结果解析出来放在寄存器中,而是拆分布尔表达式用来控制代码跳转(通过短路表达式来优化代码)。

Case/Switch Statements

if else语句过长时可能会被优化成 switch/case 语句(通过查表来优化代码性能)。

case查表也需要优化性能(如果静态值范围稀疏。。。)

Iteration

Iteration and recursion are the two mechanisms that allow a computer to perform similar operations repeatedly.

Enumeration-Controlled Loops(for循环)

在编译器生成代码时的一个可能的优化是预先计算迭代次数:max([last-first+step]/step, 0)

枚举控制循环的几个设计问题:

  1. Can control enter or leave the loop in any way other than through the enumeration mechanism? => break/exit
  2. What happens if the loop body modifies variables that were used to compute the end-of-loop bound? => 不允许
  3. What happens if the loop body modifies the index variable itself? => 禁止在循环体内自己更新索引
  4. Can the program read the index variable after the loop has completed, and if so, what will its value be? => 循环结束后索引值是未定义的

Combination Loops

自定义步长的for循环

Iterators

  • range()
  • 对象迭代器
  • 惰性迭代流

Recursion 递归

Iteration and Recursion

Iteration is in some sense the more “natural” of the two in imperative languages, because it is based on the repeated modification of variables.

Recursion is the more natural of the two in functional languages, because it does not change variables.

通常认为迭代比递归更高效,但是编译器优化也可以生成高效的递归代码,特别是尾递归代码的优化(不需要额外生成新的函数堆栈,可以重用当前函数的栈空间)

编译器也可以优化非尾递归代码,转换为尾递归代码执行。

使用递归思想思考代码(没有副作用的递归)可以产生更少的bug。

Applicative- and Normal-Order Evaluation

Throughout the discussion so far we have assumed implicitly that arguments are evaluated before passing them to a subroutine. This need not be the case. It is possible to pass a representation of the unevaluated arguments to the subroutine instead, and to evaluate them only when (if) the value is actually needed.

惰性求值

Nondeterminacy

不确定的控制流(运行时才能确定)。

Summary and Concluding Remarks

在本章中,我们介绍了编程语言中发现的主要控制流形式:顺序、选择、迭代、程序抽象、递归、并发、异常处理和推测以及不确定性。顺序指定某些操作按顺序发生,一个接一个。选择表达了在两个或更多控制流替代方案之间的选择。迭代和递归是重复执行操作的两种方式。递归以自身的简单实例定义操作;它依赖于程序抽象。迭代重复一个操作以获得其副作用。顺序和迭代是命令式(特别是冯·诺依曼)编程的基础。递归是函数式编程的基础。不确定性允许程序员故意不指定控制流的某些方面。我们只简要地触及了并发;它将是第12章的主题。程序抽象(子程序)是第8章的主题。异常处理和推测将在第8.5节和第12.4.4节中介绍。

在我们对控制流机制的调查之前,我们讨论了表达式评估。我们考虑了左值和右值之间的区别,以及变量的值模型和引用模型之间的区别,在值模型中,变量是数据的命名容器,在引用模型中,变量是对数据对象的引用。我们考虑了表达式内的优先级、结合性和排序问题。我们检查了短路布尔评估及其通过跳转代码的实现,这既是一个影响表达式正确性的语义问题(其子部分并非总是定义良好),也是一个影响评估复杂布尔表达式所需时间的实现问题。

构造的发展受到了许多目标的驱动,包括编程的便利性、语义的优雅性、实现的便利性和运行时效率。在某些情况下,这些目标被证明是互补的。例如,我们已经看到短路评估既导致更快的代码,也(在许多情况下)导致更清晰的语义。同样地,为枚举控制循环的索引变量引入一个新的局部作用域,避免了循环后索引值的语义问题和(在某种程度上)潜在溢出的实现问题。

在其他情况下,语言语义的改进被认为值得在运行时效率上付出一点小代价。我们在迭代器的发展中看到了这一点:像许多形式的抽象一样,它们在许多情况下增加了适度的运行时成本(例如,与显式嵌入枚举集合的实现到循环的控制流中相比),但却带来了模块化、清晰性和代码重用机会的巨大回报。同样地,Java的开发者会辩称,对于许多应用程序来说,通过广泛的语义检查、标准格式的数值类型等提供的可移植性和安全性远比速度更重要。

在一些情况下,编译器技术的进步或设计者构建更复杂编译器的简单意愿,使得整合曾被认为过于昂贵的功能成为可能。Ada中的案例语句的标签范围要求编译器准备生成采用二进制搜索的代码。C++中的内联函数消除了在微小函数的低效率和宏的混乱语义之间选择的需要。异常(如我们将在第8.5.3节中看到)可以以这样一种方式实现,即在常见情况下(当它们不发生时)不会产生任何成本,但实现相当复杂。迭代器、装箱、泛型(第8.4节)和一级函数同样相当复杂,但在主流命令式语言中越来越常见。

一些实现技术(例如,重新排列表达式以发现公共子表达式,或者在找到可接受的选择后避免在不确定性构造中评估守卫)足够重要,以至于可以证明对程序员施加适度负担是合理的(例如,在必要时添加括号以避免溢出或确保数值稳定性,或确保守卫中的表达式没有副作用)。其他在语义上有用的机制(例如,惰性评估、续体或真正随机的不确定性)通常被认为足够复杂或昂贵,只在特殊情况下才值得使用(如果有的话)。

在相对原始的语言中,我们通常可以通过编程约定获得一些缺失功能的好处。例如,在早期的Fortran方言中,我们可以限制goto的使用,以模仿更现代语言的控制流。在没有短路评估的语言中,我们可以编写嵌套的选择语句。在没有迭代器的语言中,我们可以编写一组提供等效功能的子程序。

0%