ceph-diary-3-pg-state-unknown.md

Table of Contents

Name

Environment

本文基于 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)

Situation

ceph 在一次掉盘恢复后, 有 pg 出现 state unknown 的状况. 运行 ceph health detail, 显示:

[root@storage02-ib ~]# ceph health detail
HEALTH_WARN 211801/3035471 objects misplaced (6.978%); Reduced data availability: 4 pgs inactive
OBJECT_MISPLACED 211801/3035471 objects misplaced (6.978%)
PG_AVAILABILITY Reduced data availability: 4 pgs inactive
    pg 20.16 is stuck inactive for 12365.100778, current state unknown, last acting []
    pg 20.29 is stuck inactive for 12365.100778, current state unknown, last acting []
    pg 20.2d is stuck inactive for 12365.100778, current state unknown, last acting []
    pg 20.37 is stuck inactive for 12365.100778, current state unknown, last acting []

显示这4个 pg 卡住了.

运行 pg query, 查看该 pg 的具体信息, 显示:

root@storage01-ib:~# ceph pg 20.16 query
Error ENOENT: i don't have pgid 20.16

无法找到该 pg id.

运行 pg dump_stuck unclean, 显示:

root@storage01-ib:~#  ceph pg dump_stuck unclean
ok
PG_STAT STATE                         UP     UP_PRIMARY ACTING ACTING_PRIMARY 
21.f9   active+remapped+backfill_wait  [7,4]          7  [4,9]              4 
21.f7     active+remapped+backfilling  [4,8]          4  [4,2]              4 
21.d9   active+remapped+backfill_wait [10,0]         10  [0,9]              0 
21.c9   active+remapped+backfill_wait  [8,4]          8  [4,7]              4 
21.b8     active+remapped+backfilling  [2,7]          2  [6,7]              6 
21.af   active+remapped+backfill_wait  [0,5]          0  [0,8]              0 
21.a3   active+remapped+backfill_wait  [4,8]          4  [4,9]              4 
21.9c   active+remapped+backfill_wait [9,10]          9 [5,10]              5 
21.cb   active+remapped+backfill_wait  [5,3]          5  [2,3]              2 
21.95     active+remapped+backfilling [10,4]         10 [10,7]             10 
21.f3   active+remapped+backfill_wait [2,10]          2  [2,5]              2 
21.3f   active+remapped+backfill_wait  [4,2]          4  [2,6]              2 
21.3a   active+remapped+backfill_wait  [3,6]          3  [6,9]              6 
20.37                         unknown     []         -1     []             -1 
21.b4   active+remapped+backfill_wait  [2,7]          2 [2,10]              2 
21.56   active+remapped+backfill_wait  [8,7]          8  [8,5]              8 
21.68     active+remapped+backfilling  [7,2]          7  [0,2]              0 
21.2e   active+remapped+backfill_wait [10,6]         10  [5,6]              5 
21.7f   active+remapped+backfill_wait  [9,8]          9  [9,7]              9 
21.e9   active+remapped+backfill_wait  [4,9]          4 [9,10]              9 
21.55   active+remapped+backfill_wait  [9,4]          9  [9,6]              9 
21.5e   active+remapped+backfill_wait  [0,6]          0  [0,9]              0 
21.87   active+remapped+backfill_wait  [7,8]          7  [7,4]              7 
21.76   active+remapped+backfill_wait [10,8]         10  [4,8]              4 
21.a1   active+remapped+backfill_wait  [2,5]          2  [5,6]              5 
21.43   active+remapped+backfill_wait  [3,0]          3  [0,5]              0 
21.82   active+remapped+backfill_wait  [7,3]          7  [7,9]              7 
20.2d                         unknown     []         -1     []             -1 
21.2d   active+remapped+backfill_wait  [4,8]          4  [4,2]              4 
21.1c   active+remapped+backfill_wait  [8,0]          8  [0,6]              0 
21.22   active+remapped+backfill_wait  [4,9]          4  [4,7]              4 
21.79   active+remapped+backfill_wait  [4,6]          4  [4,0]              4 
21.28   active+remapped+backfill_wait  [7,4]          7  [0,4]              0 
20.29                         unknown     []         -1     []             -1 
20.16                         unknown     []         -1     []             -1 
29.4    active+remapped+backfill_wait  [6,1]          6  [6,4]              6 
21.23   active+remapped+backfill_wait [10,0]         10  [0,3]              0 
21.63   active+remapped+backfill_wait  [9,0]          9  [0,2]              0 
29.5    active+remapped+backfill_wait  [1,9]          1  [0,9]              0 
21.6c   active+remapped+backfill_wait  [6,5]          6  [6,8]              6 
21.e    active+remapped+backfill_wait  [0,7]          0  [4,7]              4 

Diagnosis

看来是这几个 pgid 彻底找不到了. 我的 osd pool 有三个, 分别叫 l1 (1副本), l2 (2副本), l3 (3副本).
估计可能是之前写入 1 副本的数据由于硬盘挂掉导致的数据丢失.
既然是1副本, 也不要求数据可靠性了. 并且本身存储的也是一些下载到一半的数据, 也没什么关系.

Fix

通过阅读 CEPH 官方 PG troubleshooting 文档, 发现了解决方案:

POOL SIZE = 1
If you have the osd pool default size set to 1, you will only have one copy of the object. OSDs rely on other OSDs to tell them which objects they should have. If a first OSD has a copy of an object and there is no second copy, then no second OSD can tell the first OSD that it should have that copy. For each placement group mapped to the first OSD (see ceph pg dump), you can force the first OSD to notice the placement groups it needs by running:

ceph osd force-create-pg <pgid>

即, 多 osd 副本可以互相通知 pg 信息, 但是单副本就会丢, 为了恢复这个pg, 我们可以强行创建它.

root@storage01-ib:~# ceph osd force-create-pg 20.37
Error EPERM: This command will recreate a lost (as in data lost) PG with data in it, such that the cluster will give up ever trying to recover the lost data.  Do this only if you are certain that all copies of the PG are in fact lost and you are willing to accept that the data is permanently destroyed.  Pass --yes-i-really-mean-it to proceed.

运行创建命令, 提示, 运行会永久的丢失该 pg 的数据, 需要加上 --yes-i-really-mean-it.

root@storage01-ib:~# ceph osd force-create-pg 20.37 --yes-i-really-mean-it
pg 20.37 now creating, ok

执行成功.

查看新创建的 pg.

root@storage01-ib:~# ceph pg 20.37 query
{
    "state": "active+clean",
    "snap_trimq": "[]",
    "snap_trimq_len": 0,
    "epoch": 814,
    "up": [
        6
    ],
    "acting": [
        6
    ],
    "acting_recovery_backfill": [
        "6"
    ],
    "info": {
        "pgid": "20.37",
        "last_update": "0'0",
        "last_complete": "0'0",
        "log_tail": "0'0",
        "last_user_version": 0,
        "last_backfill": "MAX",
        "last_backfill_bitwise": 0,
        "purged_snaps": [],
        "history": {
            "epoch_created": 795,
            "epoch_pool_created": 795,
            "last_epoch_started": 797,
            "last_interval_started": 795,
            "last_epoch_clean": 797,
            "last_interval_clean": 795,
            "last_epoch_split": 0,
            "last_epoch_marked_full": 0,
            "same_up_since": 795,
            "same_interval_since": 795,
            "same_primary_since": 795,
            "last_scrub": "0'0",
            "last_scrub_stamp": "2019-03-05 03:02:04.611341",
            "last_deep_scrub": "0'0",
            "last_deep_scrub_stamp": "2019-03-05 03:02:04.611341",
            "last_clean_scrub_stamp": "2019-03-05 03:02:04.611341"
        },
        "stats": {
            "version": "0'0",
            "reported_seq": "25",
            "reported_epoch": "814",
            "state": "active+clean",
            "last_fresh": "2019-03-05 03:07:59.418140",
            "last_change": "2019-03-05 03:02:06.260474",
            "last_active": "2019-03-05 03:07:59.418140",
            "last_peered": "2019-03-05 03:07:59.418140",
            "last_clean": "2019-03-05 03:07:59.418140",
            "last_became_active": "2019-03-05 03:02:06.260323",
            "last_became_peered": "2019-03-05 03:02:06.260323",
            "last_unstale": "2019-03-05 03:07:59.418140",
            "last_undegraded": "2019-03-05 03:07:59.418140",
            "last_fullsized": "2019-03-05 03:07:59.418140",
            "mapping_epoch": 795,
            "log_start": "0'0",
            "ondisk_log_start": "0'0",
            "created": 795,
            "last_epoch_clean": 797,
            "parent": "0.0",
            "parent_split_bits": 0,
            "last_scrub": "0'0",
            "last_scrub_stamp": "2019-03-05 03:02:04.611341",
            "last_deep_scrub": "0'0",
            "last_deep_scrub_stamp": "2019-03-05 03:02:04.611341",
            "last_clean_scrub_stamp": "2019-03-05 03:02:04.611341",
            "log_size": 0,
            "ondisk_log_size": 0,
            "stats_invalid": false,
            "dirty_stats_invalid": false,
            "omap_stats_invalid": false,
            "hitset_stats_invalid": false,
            "hitset_bytes_stats_invalid": false,
            "pin_stats_invalid": false,
            "manifest_stats_invalid": false,
            "snaptrimq_len": 0,
            "stat_sum": {
                "num_bytes": 0,
                "num_objects": 0,
                "num_object_clones": 0,
                "num_object_copies": 0,
                "num_objects_missing_on_primary": 0,
                "num_objects_missing": 0,
                "num_objects_degraded": 0,
                "num_objects_misplaced": 0,
                "num_objects_unfound": 0,
                "num_objects_dirty": 0,
                "num_whiteouts": 0,
                "num_read": 0,
                "num_read_kb": 0,
                "num_write": 0,
                "num_write_kb": 0,
                "num_scrub_errors": 0,
                "num_shallow_scrub_errors": 0,
                "num_deep_scrub_errors": 0,
                "num_objects_recovered": 0,
                "num_bytes_recovered": 0,
                "num_keys_recovered": 0,
                "num_objects_omap": 0,
                "num_objects_hit_set_archive": 0,
                "num_bytes_hit_set_archive": 0,
                "num_flush": 0,
                "num_flush_kb": 0,
                "num_evict": 0,
                "num_evict_kb": 0,
                "num_promote": 0,
                "num_flush_mode_high": 0,
                "num_flush_mode_low": 0,
                "num_evict_mode_some": 0,
                "num_evict_mode_full": 0,
                "num_objects_pinned": 0,
                "num_legacy_snapsets": 0,
                "num_large_omap_objects": 0,
                "num_objects_manifest": 0
            },
            "up": [
                6
            ],
            "acting": [
                6
            ],
            "blocked_by": [],
            "up_primary": 6,
            "acting_primary": 6,
            "purged_snaps": []
        },
        "empty": 1,
        "dne": 0,
        "incomplete": 0,
        "last_epoch_started": 797,
        "hit_set_history": {
            "current_last_update": "0'0",
            "history": []
        }
    },
    "peer_info": [],
    "recovery_state": [
        {
            "name": "Started/Primary/Active",
            "enter_time": "2019-03-05 03:02:06.252112",
            "might_have_unfound": [],
            "recovery_progress": {
                "backfill_targets": [],
                "waiting_on_backfill": [],
                "last_backfill_started": "MIN",
                "backfill_info": {
                    "begin": "MIN",
                    "end": "MIN",
                    "objects": []
                },
                "peer_backfill_info": [],
                "backfills_in_flight": [],
                "recovering": [],
                "pg_backend": {
                    "pull_from_peer": [],
                    "pushing": []
                }
            },
            "scrub": {
                "scrubber.epoch_start": "0",
                "scrubber.active": false,
                "scrubber.state": "INACTIVE",
                "scrubber.start": "MIN",
                "scrubber.end": "MIN",
                "scrubber.max_end": "MIN",
                "scrubber.subset_last_update": "0'0",
                "scrubber.deep": false,
                "scrubber.waiting_on_whom": []
            }
        },
        {
            "name": "Started",
            "enter_time": "2019-03-05 03:02:05.356513"
        }
    ],
    "agent_state": {}
}

Conclusion

至此, 修复完毕.
后续检查, 1 副本的下载文件夹丢了几个小姐姐【手动滑稽】
建议在直到具体是什么问题的情况下才进行这样的操作. 如果是重要的数据, 请不要使用 1 副本, 并做好备份后再进行操作.
如果是2,3 副本情况下的 pg stat unknown, 建议做好心理准备... 很可能就是没了.
如果是其他 pg stuck 的情况, 建议仔细分析再进行操作.
以上.

Reference