crsctl start crs 启动数据库集群,遇到报错
[root@db02 ~]# crsctl start crsCRS-41053: checking Oracle Grid Infrastructureforfilepermission issues CRS-4124: Oracle High Availability Services startup failed. CRS-4000: Command Start failed, or completed with errors.权限不足?没做什么呀
用ai排查说是目录、文件权限的问题(开始误导),比较了节点1上的权限没什么不同。
检查alert.log
2026-07-02 09:19:13.054[OCSSD(6397)]CRS-1601: CSSD Reconfiguration complete. Active nodes are db02.2026-07-02 09:19:13.067[CRSD(11119)]CRS-5504: Node down event reportedfornode'tyyh01'.2026-07-02 09:19:13.074[CRSD(11119)]CRS-2773: Server'tyyh01'has been removed from pool'Generic'.2026-07-02 09:19:13.077[CRSD(11119)]CRS-2773: Server'tyyh01'has been removed from pool'ora.cscsetyyh'.2026-07-02 09:19:23.370[GIPCD(5157)]CRS-7503: The Oracle Grid Infrastructure process'gipcd'observed communication issues betweennode'db02'andnode'tyyh01', interface list oflocalnode'db02'is'10.0.2.31:23313;', interface list of remotenode'tyyh01'is'10.0.2.30:19583;'.2026-07-02 09:20:27.984[CRSD(11119)]CRS-2757: Command'Stop'timed out waitingforresponse from the resource'ora.qosmserver'.Details at(:CRSPE00221:){2:37109:12318}in/u01/app/grid/diag/crs/db02/crs/trace/crsd.trc.2026-07-02 09:20:27.889[SCRIPTAGENT(12454)]CRS-5818: Abortedcommand'stop'forresource'ora.qosmserver'.Details at(:CRSAGF00113:){2:37109:12318}in/u01/app/grid/diag/crs/db02/crs/trace/crsd_scriptagent_grid.trc.2026-07-02 09:20:31.847[ORAROOTAGENT(11752)]CRS-5822: Agent'/u01/app/19.0/grid/bin/orarootagent_root'disconnected from server. Details at(:CRSAGF00117:){0:1:12}in/u01/app/grid/diag/crs/db02/crs/trace/crsd_orarootagent_root.trc.2026-07-02 09:20:34.754[MDNSD(4280)]CRS-5602: mDNSservicestopping by request.2026-07-02 09:20:35.366[MDNSD(4280)]CRS-8504: Oracle Clusterware MDNSD process with operating system process ID4280is exiting2026-07-02 09:20:49.096[OCTSSD(8486)]CRS-2405: The Cluster Time Synchronization Service onhostdb02 isshutdownby user2026-07-02 09:20:49.097[OCTSSD(8486)]CRS-8504: Oracle Clusterware OCTSSD process with operating system process ID8486is exiting2026-07-02 09:20:50.112[OCSSD(6397)]CRS-1603: CSSD onnodedb02 has been shut down.2026-07-02 09:20:53.119[GPNPD(4480)]CRS-2329: GPNPD onnodedb02 shut down.2026-07-02 09:20:54.126[OHASD(3333)]CRS-2793: Shutdown of Oracle High Availability Services-managed resources on'db02'has completed2026-07-02 09:20:54.140[ORAROOTAGENT(4059)]CRS-5822: Agent'/u01/app/19.0/grid/bin/orarootagent_root'disconnected from server. Details at(:CRSAGF00117:){0:1:9}in/u01/app/grid/diag/crs/db02/crs/trace/ohasd_orarootagent_root.trc.2026-07-02 09:22:23.158[CLSECHO(30305)]CRS-10131: Failure to create named pipe /var/tmp/.oracle/npohasd2. Details[mkfifo: cannot create fifo ‘/var/tmp/.oracle/npohasd2’: No space left on device].2026-07-02 09:28:40.622[CLSECHO(777)]CRS-10131: Failure to create named pipe /var/tmp/.oracle/npohasd2. Details[mkfifo: cannot create fifo ‘/var/tmp/.oracle/npohasd2’: No space left on device].2026-07-0210:05:17.139[CLSECHO(16785)]CRS-10131: Failure to create named pipe /var/tmp/.oracle/npohasd2. Details[mkfifo: cannot create fifo ‘/var/tmp/.oracle/npohasd2’: No space left on device].[grid@db02 trace]$df-h原来是No space left on device
检查一下
[root@db02 ~]# df -hFilesystem Size Used Avail Use% Mounted on devtmpfs 32G032G0% /dev tmpfs 32G032G0% /dev/shm tmpfs 32G2.0G 30G7% /run tmpfs 32G032G0% /sys/fs/cgroup /dev/mapper/rhel-root 50G 50G 60K100% / /dev/vda1 1014M 171M 844M17% /boot /dev/mapper/vgora-lvora 200G 22G 179G11% /u01 /dev/mapper/rhel-home 46G8.1G 38G18% /home tmpfs6.3G06.3G0% /run/user/1101 tmpfs6.3G06.3G0% /run/user/0在/opt/oracle.ahf/orachk 目录下有个35G的orachk.zip 文件
清理后再启动crs
[root@db02 orachk]# crsctl start crsCRS-4123: Oracle High Availability Services has been started.节点2还是起不来crs。
重启主机后crs正常了(可能是有些临时文件在/tmp下没创建成功,重启后全部释放就自动恢复正常)。
问题又多了一个,为什么/tmp不单独创建?遇到这种根目录比较小的情况,容易出问题。