起源

这个事故来源于某个PHPer转职Javaer时发生的灵异事件,在云原生应用脚手架中选择了应用架构为MVC架构之后生成的脚手架打包的jar包启动之后无法访问Controller中配置的路径。

现象

据某PHPer所说,在IDEA中直接启动项目没有问题但是打包成jar之后就访问不到controller中定义的请求路径了。

第一次排障

面对这个bug的第一直觉就是maven打包有问题,应该是代码没有打包进去。但是在解压jar包之后发现代码确实已经在lib文件夹里,而且解压之后也能找到对应的class文件。

这时候怀疑对象就转到了spring bean装配有问题上,于是便使用ApplicationContext的getBean方法尝试在spring启动完成之后获取basicController的bean,然后发现果然是bean并没有成功装载。

于是死马当活马医的给启动类加上了@ComponentScan注解尝试抢救一下,结果抢救失败仍然没有装载。

这时候已经是万策尽放弃治疗了,直接劝退了PHPer。

第二次排障

一天之后这个bug迎来了专家会诊,某Java领域大神准备对其进行一番操作,但是失败了。于是又将项目clone下来看了一看,因为病因是IDEA启动时可以正确加载而maven打包之后有问题,于是又将嫌疑人锁定到了maven上,但是根据本人浅薄的maven知识并没有发现配置有任何有问题的地方。

于是又开始死马当活马医,准备使用SPI机制给spring强制注入bean试试。在spring.factories文件中配置 org.springframework.boot.autoconfigure.EnableAutoConfiguration 为controller之后发现IDEA正常启动。但是maven打包时springboot启动直接报错:找不到该class文件。

嗯,有错误是好事,说明并不是spring装配的时候有问题而是项目启动时根本没有这个class的文件。这时候随便找了一个自己项目里的SDK jar包解压看了一下目录,又看了一下多模块中web模块的jar包:BOOT-INF。凶手已经找到了。

破案

按照配置给web模块和service模块增加了<skip>true</skip>配置,跳过了springboot打包插件,项目正常启动并可以访问请求路径。

很明显,在创建模板的时候pom.xml直接被复制了三份放到了三个模块当中,导致了spring-boot-maven-plugin插件把三个子项目都打包成了可部署的jar包。

一开始解压jar包的时候并没有关注到这个细节导致多次抢救无效。

令人疑惑的是 https://github.com/alibaba/cloud-native-app-initializer/issues/65 官方仓库里竟然有这种issue,但是<skip>true</skip>在最外层完全是反作用导致打包失败,而内部web模块明明需要这个配置却没有填写。属实神秘。

前言

又又又到了self-hosted engineer最喜欢的建站时间,由于上次使用凉心云新用户薅的服务器的手机号废弃了,导致控制台无法登录,也不知道还剩多久时间到期。于是只能重新买了一台服务器再折腾一次。好在现代建站技术足够发达,迁移全部内容基本只是复制粘贴一下compose文件(没想到备案才是最麻烦的)。

原先还想给网站加一个雷池WAF,略微折腾了一下没搞定就放弃了。下次有时间再好好研究研究。

建站速通

docker

1
2
curl -fsSL get.docker.com -o get-docker.sh
sudo sh get-docker.sh --mirror Aliyun

BBR算法

1
2
3
4
echo net.core.default_qdisc=fq >> /etc/sysctl.conf
echo net.ipv4.tcp_congestion_control=bbr >> /etc/sysctl.conf
sysctl -p
sysctl net.ipv4.tcp_available_congestion_control

portainer

1
2
docker network create traefik-public
docker volume create portainer_data

docker-compose.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
version: "3.3"

services:
portainer:
image: portainer/portainer-ce:latest
command: -H unix:///var/run/docker.sock
restart: always
ports:
- 9000:9000
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- portainer_data:/data
networks:
- traefik-public
labels:
- "traefik.enable=true"
- "traefik.http.routers.portainer.rule=Host(`portainer.oxcafebabe.cn`)"
- "traefik.http.routers.portainer.entrypoints=websecure"
- "traefik.http.services.portainer.loadbalancer.server.port=9000"
- "traefik.http.routers.portainer.service=portainer"
- "traefik.http.routers.portainer.tls.certresolver=leresolver"
volumes:
portainer_data:
external: true

networks:
traefik-public:
external: true

traefik

启动portainer后通过端口访问,然后在portainer里创建stack:

注意: acme.json文件需要手动创建并chmod 600

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
version: "3.3"

services:
traefik:
container_name: traefik
image: traefik:latest
command:
- --providers.docker
- --log.level=INFO
- --accesslog=true
- --api.dashboard=true
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- --certificatesresolvers.leresolver.acme.httpchallenge=true
- --certificatesresolvers.leresolver.acme.email=tangbin97@outlook.com
- --certificatesresolvers.leresolver.acme.storage=./acme.json
- --certificatesresolvers.leresolver.acme.httpchallenge.entrypoint=web
ports:
- 80:80
- 443:443
networks:
- traefik-public
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /root/acme/acme.json:/acme.json
labels:
- traefik.http.routers.http-catchall.rule=hostregexp(`{host:.+}`)
- traefik.http.routers.http-catchall.entrypoints=web
- traefik.http.routers.http-catchall.middlewares=redirect-to-https
- traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https
- traefik.http.routers.dashboard.rule=Host(`traefik.oxcafebabe.cn`) && (PathPrefix(`/api`) || PathPrefix(`/dashboard`))
- traefik.http.routers.dashboard.service=api@internal
- traefik.http.routers.dashboard.entrypoints=websecure
- traefik.http.routers.dashboard.tls.certresolver=leresolver
environment:
- TZ=Asia/Shanghai

networks:
traefik-public:
external: true

vaultwarden

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
version: "3"

services:
bitwarden:
image: vaultwarden/server
container_name: bitwarden-server
volumes:
- bitwarden:/data
environment:
TZ: "Asia/Shanghai"
WEBSOCKET_ENABLE: "true"
SIGNUPS_ALLOWED: "false"
WEB_VAULT_ENABLE: "true"
DOMAIN: "https://bitwarden.oxcafebabe.cn"
ADMIN_TOKEN: "!!!!!!!!!!!!!!!"
networks:
- traefik-public
labels:
- "traefik.enable=true"
- "traefik.http.routers.bitwarden.rule=Host(`bitwarden.oxcafebabe.cn`)"
- "traefik.http.routers.bitwarden.entrypoints=websecure"
- "traefik.http.services.bitwarden.loadbalancer.server.port=80"
- "traefik.http.routers.bitwarden.service=bitwarden"
- "traefik.http.routers.bitwarden.tls.certresolver=leresolver"
- "traefik.http.routers.bitwarden-websocket.rule=Host(`bitwarden.oxcafebabe.cn`) && Path(`/notifications/hub`)"
- "traefik.http.routers.bitwarden-websocket.entrypoints=websecure"

volumes:
bitwarden:

networks:
traefik-public:
external: true

RSSHUB

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
version: '3'

services:
rsshub:
image: diygod/rsshub
environment:
ACCESS_KEY: !!!!!!!!!
NODE_ENV: production
CACHE_TYPE: redis
REDIS_URL: 'redis://redis:6379/'
PUPPETEER_WS_ENDPOINT: 'ws://browserless:3000'
PROXY_URI: socks5h://172.17.0.1:20173
depends_on:
- redis
- browserless
networks:
- traefik-public

labels:
- "traefik.enable=true"
- "traefik.http.routers.rsshub.rule=Host(`rsshub.oxcafebabe.cn`)"
- "traefik.http.routers.rsshub.entrypoints=websecure"
- "traefik.http.services.rsshub.loadbalancer.server.port=1200"
- "traefik.http.routers.rsshub.service=rsshub"
- "traefik.http.routers.rsshub.tls.certresolver=leresolver"

browserless:
image: browserless/chrome
ulimits:
core:
hard: 0
soft: 0
redis:
image: redis:alpine
volumes:
- redis-data:/data

volumes:
redis-data:

networks:
traefik-public:
external: true

其余项目同理迁移,非常的迅速。

数据恢复

只有bitwarden和rss数据需要恢复,直接在网页上导入导出一次即可。

Programming Language Syntax

Unlike natural languages such as English or Chinese,computer languages must be precise. Both their form (syntax) and meaning (semantics) must be specified without ambiguity, so that both programmers and computers can tell what a program is supposed to do. To provide the needed degree of precision, language designers and implementors use formal syntactic and semantic notation. To facilitate the discussion of language features in later chapters, we will cover this notation first: syntax in the current chapter and semantics in Chapter 4.

与英语或中文等自然语言不同,计算机语言必须要精确。它们的形式(语法)和含义(语义)必须要明确定义,以便程序员和计算机都能理解程序的预期行为。为了提供所需的精确度,语言设计者和实现者使用正式的语法和语义符号。为了便于后续章节中讨论语言特性,我们将首先介绍这些符号:本章介绍语法,第四章介绍语义。

Specifying Syntax: Regular Expressions and Context-Free Grammars

编程语言的语法之所以使用BNF(巴科斯-诺尔范式)而不是正则表达式,主要是因为编程语言的语法复杂性通常超出了正则表达式的表达能力。以下是几个关键点来解释这一选择:

  1. 表达能力:

    • 正则表达式:它们是用来描述正规语言的,只能表达有限的语法模式,通常用于简单的字符串匹配。正则表达式不能很好地处理递归模式,这是编程语言常见的结构,如嵌套的括号、if-else结构和函数调用等。
    • BNF范式:它是一种用于表示上下文无关文法(CFG)的符号,能够描述更复杂的语言结构,包括递归,这使得BNF非常适合表示编程语言的语法。
  2. 递归结构:

    • 编程语言中经常会有递归结构,比如在语言中可以有无限嵌套的函数调用或者循环结构。BNF范式自然而然地支持递归定义,而正则表达式则不支持。
  3. 上下文无关文法:

    • BNF是上下文无关文法的一种表示方法,它能够描述大多数编程语言的语法。上下文无关文法比正则表达式的表达能力更强,因为它允许使用非终结符和规则来定义语言的语法结构。
  4. 解析算法:

    • BNF范式定义的语法可以直接被转换成解析器,如LL或LR解析器,这些解析器可以构建出源代码的抽象语法树(AST)。正则表达式不适合这种用途,因为它们缺乏构建复杂结构的能力。
  5. 可读性和维护性:

    • BNF范式提供了一种清晰的方式来描述语法规则,这让语言设计者更容易定义、阅读和维护语言的语法。相比之下,复杂的正则表达式可以难以理解和维护。

总之,正则表达式在处理简单的文本模式时非常有用,但对于定义编程语言这样结构复杂、层次多、需要递归定义的语法来说,BNF范式提供了更强大、更灵活的语法描述能力。

Scanning

By grouping input characters into tokens, the scanner dramatically reduces the number of individual items that must be inspected by the more computationally intensive parser. In addition, the scanner typically removes comments (so the parser doesn’t have to worry about them appearing throughout the context-free grammar); saves the text of “interesting” tokens like identifiers, strings, and numeric literals; and tags tokens with line and column numbers, to make it easier to generate high-quality error messages in subsequent phases.

通过将输入字符分组成标记,扫描器大大减少了需要由计算密集型的解析器检查的单个项的数量。此外,扫描器通常会移除注释(这样解析器就不必担心它们出现在上下文无关语法中);保存“有趣”的标记的文本,比如标识符、字符串和数字字面量;并给标记添加行号和列号,以便在后续阶段生成高质量的错误消息变得更加容易。

Generating a Finite Automaton

While a finite automaton can in principle be written by hand, it is more common to build one automatically from a set of regular expressions, using a scanner generator tool.

有限自动机原则上可以手动编写,但更常见的做法是利用扫描器生成工具从一组正则表达式自动构建。

As it turns out, however, there is no obvious one-step algorithm to convert a set of regular expressions into an equivalent deterministic finite automaton (DFA). The typical scanner generator implements the conversion as a series of three separate steps.

然而,事实证明,并没有明显的一步算法可以将一组正则表达式转换为等价的确定有限自动机(DFA)。典型的扫描器生成器将转换实现为三个单独步骤的系列。

  • The first step converts the regular expressions into a nondeterministic finite automaton (NFA).
  • the second step of a scanner generatortranslates the NFA into an equivalentDFA
  • The third step is a space optimization that generates a final DFA with the minimum possible number of states.

Scanner Code

We can implement a scanner that explicitly captures the “circles-and-arrows” structure of a DFA in either of two main ways. One embeds the automaton in the control flow of the program using gotos or nested case (switch) statements; the other, described in the following subsection, uses a table and a driver.

我们可以通过两种主要方式之一,明确地捕获DFA的“圆圈和箭头”结构来实现扫描器。一种方式是利用goto或嵌套的case(switch)语句将自动机嵌入程序的控制流中;另一种方式在下一小节中描述,它使用表和驱动程序。

Parsing

The parser is the heart of a typical compiler. It calls the scanner to obtain the tokens of the input program, assembles the tokens together into a syntax tree, and passes the tree (perhaps one subroutine at a time) to the later phases of the compiler, which perform semantic analysis and code generation and improvement. In effect, the parser is “in charge” of the entire compilation process; this style of compilation is sometimes referred to as syntax-directed translation.

解析器是典型编译器的核心。它调用扫描器以获取输入程序的标记,将这些标记组装成语法树,然后将语法树(可能一次一个子程序)传递给编译器的后续阶段,这些阶段执行语义分析、代码生成和优化。实际上,解析器“主导”整个编译过程;这种编译风格有时被称为语法定向翻译。

解析算法:

  • LL(自顶向下)
  • LR(自底向上)
  • 递归下降
  • 表驱动自顶向下解析

Syntax Errors

In general, the term syntax error recovery is applied to any technique that allows the compiler, in the face of a syntax error, to continue looking for other errors later in the program. High-quality syntax error recovery is essential in any production-quality compiler. The better the recovery technique, the more likely the compiler will be to recognize additional errors (especially nearby errors) correctly, and the less likely it will be to become confused and announce spurious cascading errors later in the program.

一般来说,“语法错误恢复”这个术语适用于任何允许编译器在面对语法错误时继续寻找程序中其他错误的技术。在任何生产级编译器中,高质量的语法错误恢复都是至关重要的。恢复技术越好,编译器识别额外错误(尤其是附近的错误)的准确性就越高,它混淆并在程序后期宣布虚假的级联错误的可能性就越小。

Theoretical Foundations

Our understanding of the relative roles and computational power of scanners, parsers, regular expressions, and context-free grammars is based on the formalisms of automata theory. In automata theory, a formal language is a set of strings of symbols drawn from a finite alphabet. A formal language can be specified either by a set of rules (such as regular expressions or a context-free grammar) that generates the language, or by a formal machine that accepts (recognizes) the language. A formal machine takes strings of symbols as input and outputs either “yes” or “no.” A machine is said to accept a language if it says “yes” to all and only those strings that are in the language. Alternatively, a language can be defined as the set of strings for which a particular machine says “yes.”

我们对扫描仪、解析器、正则表达式和上下文无关文法的相对作用和计算能力的理解是基于自动机理论的形式主义。在自动机理论中,形式语言是从有限字母表中提取的一组符号串。形式语言可以通过一组规则(如正则表达式或上下文无关文法)来指定,这些规则生成语言,也可以通过一个接受(识别)语言的形式机来指定。形式机将符号串作为输入,并输出“是”或“否”。如果机器对于所有属于语言的字符串都回答“是”,则称该机器接受该语言。或者,语言可以被定义为一组字符串,对于这些字符串,特定的机器回答“是”。

Summary and Concluding Remarks

  • 正则表达式
  • 上下文无关文法
  • 实际编译器中的扫描和解析算法
  • 语法错误恢复(painc)
  • 自顶向下解析
  • 自底向上解析
0%