lonelystar777 发表于 2024-8-5 17:18:38

【解决方式关闭PQ】升级到GreatSQL执行一个简单的SQL导致crash

错误日志如下:
Assertion failure: row0pread.cc:1350:is_queue_empty() thread 139918854407936
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/ ... nnodb-recovery.html
InnoDB: about forcing recovery.
2024-08-05T08:46:57Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID=531c3f9a0bd2b9927ad15adeeb1c651bf5df0d72

Build ID: Not Available
Server Version: 8.0.32-26 GreatSQL, Release 26, Revision a68b3034c3d

Thread pointer: 0x7f41f62f35a0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f41659d2ff0 thread_stack 0x80000
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(print_fatal_signal(int)+0x3cf)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(my_server_abort()+0x7e)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(my_abort()+0xa)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x31f)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(Parallel_reader::dispatch_ctx(row_prebuilt_t*)+0x405)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(ha_innobase::pq_worker_scan_next(void*, unsigned char*)+0x12b)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(handler::ha_pq_next(unsigned char*, void*)+0x270)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(PQblockScanIterator::Read()+0x25)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(FilterIterator::Read()+0x14)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(AggregateIterator::Read()+0x183)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0xa6a)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld(pq_worker_exec(void*)+0x90)
/usr/local/GreatSQL-8.0.32-26-Linux-glibc2.17-x86_64-minimal/bin/mysqld()
/lib64/libpthread.so.0(+0x7ea5)
/lib64/libc.so.6(clone+0x6d)

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f419bc196f0): SELECT COUNT(*) AS total FROM dept d WHERE d.delete_flag = 0
Connection ID (thread ID): 1218
Status: NOT_KILLED

Please help us make Percona Server better by reporting any
bugs at https://bugs.percona.com/

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.


从日志里看到用count(*)统计一下dept表的数据就导致了整个服务崩溃。
dept表结构如下:


CREATE TABLE `dept` (
`id` int NOT NULL AUTO_INCREMENT COMMENT '主键',
`company_code` varchar(20) DEFAULT NULL COMMENT '所属公司code',
`company_name` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '所属公司名称',
`dept_code` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '部门编码',
`dept_name` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '部门名称',
`abbreviation` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '别名,简称',
`parent_id` int DEFAULT '0' COMMENT '上级部门id',
`full_path_id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '部门全路径id(1/2/3)',
`full_path_name` varchar(1024) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '部门全路径名称(中心/部门/组)',
`dept_large_attribute` int DEFAULT NULL COMMENT '部门属性(职能;业务)',
`dept_large_attribute_name` varchar(10) DEFAULT NULL COMMENT '部门属性(职能;业务)',
`dept_middle_attribute` int DEFAULT NULL COMMENT '部门属性中类(职能(中心、区、部、组);业务(一手、二手))',
`dept_middle_attribute_name` varchar(10) DEFAULT NULL COMMENT '部门属性中类(职能(中心、区、部、组);业务(一手、二手))',
`dept_small_attribute` int DEFAULT NULL COMMENT '部门属性小类',
`dept_small_attribute_name` varchar(10) DEFAULT NULL COMMENT '部门属性小类',
`level` int DEFAULT NULL COMMENT '部门层级',
`level_name` varchar(10) DEFAULT NULL COMMENT '层级名称',
`dept_status` int DEFAULT NULL COMMENT '部门状态1:预开;2:使用中;3:预关;4:关闭',
`dept_status_name` varchar(10) DEFAULT NULL COMMENT '部门状态名称',
`start_date` date DEFAULT NULL COMMENT '开始日期',
`end_date` date DEFAULT NULL COMMENT '结束日期',
`real_number` int DEFAULT NULL COMMENT '部门总人数',
`work_id` int DEFAULT NULL COMMENT '办公地id',
`description` varchar(255) DEFAULT NULL COMMENT '部门描述',
`remark` varchar(500) DEFAULT NULL COMMENT '备注',
`sort` int DEFAULT '0' COMMENT '排序字段',
`delete_flag` int DEFAULT '0' COMMENT '删除标识;删除标识(0:未删除;1:已删除)',
`create_user` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '创建人',
`create_user_name` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '创建人姓名',
`create_date` datetime(3) DEFAULT NULL COMMENT '创建时间',
`update_user` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '更新人',
`update_user_name` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL COMMENT '修改人姓名',
`update_date` datetime(3) DEFAULT NULL COMMENT '更新时间',
PRIMARY KEY (`id`) USING BTREE,
KEY `idx_parent_id` (`parent_id`) USING BTREE,
KEY `idx_dept_code` (`dept_code`)
) ENGINE=InnoDB AUTO_INCREMENT=28565 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='部门表';



lonelystar777 发表于 2024-8-5 17:23:16

回滚到8.0.32-25后,没有再发现类似问题

yejr 发表于 2024-8-5 17:56:19

lonelystar777 发表于 2024-8-5 17:23
回滚到8.0.32-25后,没有再发现类似问题

我的测试环境:
CentOS 8, x86_64, glibc 2.28,RPM包

用相同表结构测试,用 mysql_random_data_load 灌入1000条测试数据,未能重现。

不知道您是否方便提供完整的测试数据,可能和具体数据这个有关系。

yejr 发表于 2024-8-6 09:31:59

刚注意到,您可能是启用了PQ(InnoDB并行查询),请关闭这个开关应该就好了。

另外,可以参考我们提供的 my.cnf 模板:https://gitee.com/GreatSQL/GreatSQL-Doc/blob/master/docs/my.cnf-example-greatsql-8.0.32-26#L90 ,在这份模板里也是默认关闭PQ的


# Disalbe InnoDB PQ
loose-force_parallel_execute = OFF

yejr 发表于 2024-8-6 09:38:13

另外,如果您是从GreatSQL 8.0.32-25升级到8.0.32-26的话,最好是先设置 upgrade=FORCE,强制升级所有表,包括系统表。

lonelystar777 发表于 2024-8-6 12:56:51

yejr 发表于 2024-8-6 09:31
刚注意到,您可能是启用了PQ(InnoDB并行查询),请关闭这个开关应该就好了。

另外,可以参考我们提供的 m ...

测试后发现,如果是在单节点实例上执行,无论开启并行查询与否,这个SQL都不会造成崩溃。
但如果在MGR集群的节点上执行的话,只要开启并行查询,则必然会造成这个节点崩溃。

KAiTO 发表于 2024-8-6 13:43:14

lonelystar777 发表于 2024-8-6 12:56
测试后发现,如果是在单节点实例上执行,无论开启并行查询与否,这个SQL都不会造成崩溃。
但如果在MGR集 ...

关闭并行查询即可{:5_200:}

yejr 发表于 2024-8-6 14:23:08

lonelystar777 发表于 2024-8-6 12:56
测试后发现,如果是在单节点实例上执行,无论开启并行查询与否,这个SQL都不会造成崩溃。
但如果在MGR集 ...

并行查询bug较多,未来我们计划将其从代码中删除,请尽量不要再使用哈
页: [1]
查看完整版本: 【解决方式关闭PQ】升级到GreatSQL执行一个简单的SQL导致crash