sox-audit-tips

大概做啥

每个季度会有外部的财务公司审计一下保险费,会根据 CM tools 上规定的每一个 CHG 里包含 commit 的实际完成情况来决定

也就是说,会从 CM tools 上去拉一下这个 CHG 里应该包含些什么,再从 GitLab 上去看一下实际 commit 了什么,审计出来的结果需要发送一封邮件给 Dev

不过现在年久失修,大概有半年以上没发出去过邮件了,两个季度过去好像也没什么事情……

而且如果实际的 commit 完成得比要求的更多,那不是很棒嘛

现在碰到的问题是,原来的 Cassandra 本体包括三台node在内全炸了,重新在 KVM 里搭了一台 master,但没有 node

光杆司令在删除数据的时候需要跑三个小时,而原来的只需要五分钟

目前我只知道 Cassandra 在删数据的时候会先标注上一个小墓碑,然后会给一个 TTL 续命,到期了之后才会真的删掉数据, 默认值是 10 天

具体咋做

1

Spring Boot + Cassandra

TODO: 把这棵目录树弄得好看一点……

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
src/
main/G
java/
com.soxaudit/
conf/
BeanInit
db/
config/
CassandraConfig
model/
AuditKey
RecordDB
service/
RecordService
RecordDAO
exception/
BaseException
CMNotFoundException
GitSourceNotFoundException
helper/
jira/
client/
IssueGetter
model/
JiraRestClientConfig
model/
DefaultAuthenticator
CMToolClient
COllectionTool
DiffHandler
DiffHelper
FileHelper
GitTools
JiraUtils
MailReportHelper
inf/
Compare
model/
AuditConstants
GitLogBean
Merge
App
SoxauditApplication
resources/
shell/
testaudit.sh
sql/
audit.sql
application.properties
log4j2.xml
test/
java/
com.soxaudit/
SoxauditApplicationTests

2

入口是 SoxauditApplication.java

里面我不太明白为什么需要创建一个 App 对象,再调一下 getBean,再用 bean.app(args) 把参数传进去

还有 @ComponentScan("com.soxaudit") 是在做什么?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@SpringBootApplication
@ComponentScan("com.soxaudit")
public class SoxauditApplication {
// 1. Marshal xml,rm merges with 006
// 2. Get cmtool info
// 3. Diff
public static void main(String[] args) {
ConfigurableApplicationContext run = SpringApplication.run(SoxauditApplication.class, args);
App bean = run.getBean(App.class);
bean.app(args);

SpringApplication.exit(run, new ExitCodeGenerator() {
@Override
public int getExitCode() {
return 0;
}
});
}
}

3. App.java

@Component

@Autowired

app(String[] args)

app(String[] args) 会去判断拿到的参数长度是不是正好是 4, 然后去用 auditBranch() 处理

auditBranch(String source, String branch, String chgBranch, String cr)

先用 GitTools 的 unmarshal() , 得到 GitLogBean,如果这里面没有 merge 记录就直接 return

GitLogBean

@XmlRootElement(name = "root")

@XmlAccessorType(XmlAccessType.NONE)

@XmlElement(name = "merge")

只有一份 List

GitTools

大致上是先去 cassandra 里删库,然后跑路(

就是在 service.removeAll(cr) 的时候

卡住了三个小时

然后 start()

RecordService.removeAll

@Transactional

先打一行带 chgNum 的 log

然后用 dao.findAll 去拿到 DB 里所有记录

接着开始遍历每一条记录

如果 CHG 对上了,就去 dao.delete

start()

sendReport(List opsMerges, List ownerMerges)

log(String info, Collection o)

会把 o 里每个元素的 size() 打进 log

test-day-134

Today’s Task

  • Hide Sox Audit and Catalog MySQL Magics into Jenkins Credentials
  • Investigate on Sox Audit Code

Additional Task

Afterwork Task

  • Spring Boot RESTful Web Service tutorial
  • Navbar of CardCaptor
  • Try some Vue3 Components API if needed

Thought

1

早上滨江路口差点撞到 AMG

心有余悸

回家就去调试刹车,还有后轮的气压

如果有必要的话,周末送去车店做个大保养

test-day-133

Today’s Task

  • Release Support

Additional Task

Thought

1

可以跟着脚本手动执行 ClearStagingCatalog 系列,大致就是登上对应的机器,scp 一些 sql 过去,再用魔法执行

2

不断地 rerun,不断地 fail

有什么能够改善的办法吗?

test-day-132

Today’s Task

Additional Task

Thought

看了好久 Cassandra,依然不太明白每个选项是为什么要那么设置

在删数据的时候会花相当久,却没有解决的思路

test-day-130

Today’s Task

  • Add more components to Create Emergency Branch
  • Modify error message of nucleus PCT

Additional Task

Thought

1

很难分清楚 create emergency branch pipeline 跑不通过是我加进去的代码有问题还是本来就这样

另外 PCT check 的部分看了两个小时,终究是又改回了原来的样子

test-day-129

Today’s Task

  • Fix Sox Audit
    • figure out cassandra settings
    • Bypass security check on password in properties
  • Add missing components to create-emergency-branch pipeline

Additional Task

Thought

1

昨天一整天给自己糊的单页面前端项目整了个 image release pipeline 还蛮开心的

有点想去看看 gitlab runners 和 github actions

2

我其实对 TS 和 JS 的语言细节都不熟悉

需要过一遍语法还有概念的

起码读一遍高程吧

test-day-128

Today’s Task

  • Deploy onebox command generator

Additional Task

  • create a branch from the current image on PRD and build images containing Dev’s changes
  • Fix nucleus building scripts

Thought

1

1
[Warning] One or more build-args [env] were not consumed

ARG 在跑完 Dockerfile 就会消失

ENV 不会

这里 docker build –build-arg env=cn 的时候不知为啥就是传不进去

有可能是跨 stage 的时候 env 就不通用了?不太确定

后来直接拆成两个 dockerfile 绕开了

2

container run 起来瞬间 exit

Docker Exited 127

然后看了下 log,发现是启动 nginx 的脚本没有 COPY 过来

docker logs -t onebox.setup.cn

2022-02-28T07:43:04.208781427Z /bin/sh: 1: ./startup.sh: not found

…后来还忘记改权限了

3

nginx 到底用的哪个端口……

周五看是 5000

现在起来以后里面又变成了默认的 80

full-cycle script 里面 run 的时候是 -p 80:5000, 外面 host 80,里面是 5000

所以直接给一份 nginx.conf 指定下端口吧

4

指定了端口之后为什么会去找 /etc/nginx/html ???

所以又同时设定了 root /usr/share/nginx/html;

行末分号特别重要

cassandra-cql-tips

Cassandra CQL Sample

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE KEYSPACE sox WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;

CREATE TABLE sox.audit_history (
hash_code text,
cm_num text,
message text,
tag text,
PRIMARY KEY (hash_code, cm_num)
) WITH CLUSTERING ORDER BY (cm_num ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Cassandra FAQ

Simple Strategy

Assign the same replication factor to the entire cluster. For Test and Dev envs only.

Replication Factor

How many times the data is replicated across the clusters

Need at least 3 nodes if this value is set to 3.

Durable Writes

Bypass the commit log when writing to the keyspace if set to false.

NEVER disable it when using Simple Strategy.

Bloom Filter FP Chance

False-positive probability for SSTable bloom filter.

checks if the row exists before executing disk I/O

0 enables with the largest possible bloom filter and uses the most memory.

1.0 disables it.

Default: 0.01

Recommended: 0.1

Caching

Optimize the memory usage of tables.

will be weighted by size and frequency.

work with cassandra.yml

Compaction

Compression

CRC Check Chance

dclocal_read_repair_chance

Probability that a successful read operation triggers a read repair

limited to replicas in the same DC as the coordinator

default_time_to_live

max: 20 years in second

disabled: 0

a new TTL timestamp is calculated when the data is updated and the row is removed after all the data expires

gc_grace_seconds

GC: Garbage Collection

Tombstone: Deletion marker

data is marked with a Tombstone -> wait some time -> eligible for gc !

this config is about how long it should wait

Default: 10 days in second

memtable_flush_period_in_ms

Memtables:

Milliseconds before memtables associated with the table are flushed

max_index-interval

min_index_interval

read_repair_chance

similar to dclocal_read_repair_chance, but this repair is not limited to replicas in the same DC as the coordinator

speculative_retry

not configured variables

dclocal_read_repair_chance

read_repair_chance

memtable_flush_period_in_ms

Reference

https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlCreateTable.html#cqlCreateTable