系统之家 - Windows操作系统&装机软件下载网站!

当前位置: 首页  >  教程资讯  >  电脑教程 [ext4]12分配机制-关键的数据结构

[ext4]12分配机制-关键的数据结构

时间:2023-06-02 16:16:41 来源: 人气:

   在块分配机制中,涉及到几个主要的数据结构。,  通过ext4_allocation_request描述块请求,然后基于块查找结果即上层需求来决定是否执行块分配操作。,  在分配过程中,为了更好执行分配,记录一些信息,需要对分配行为进行描述,就有结构体ext4_allocation_contex。,  在搜寻可用空间过程中,是有可能使用预分配空间的,因此还需要有能够描述预分配空间大小等属性的描述符ext4_prealloc_space。,  下面,对各个关键结构体进行详细的分析。,  1. 块请求描述符ext4_allocation_request,  块分配请求属性,有请求描述符ext4_allocation_request来描述:,  structext4_allocation_request {,  /* target inode for block wereallocating */,  struct inode *inode;,  /* how many blocks we want to allocate*/,  unsigned int len;,  /* logical block in target inode */,  ext4_lblk_t logical;,  /* the closest logical allocated blockto the left */,  ext4_lblk_t lleft;,  /* the closest logical allocated blockto the right */,  ext4_lblk_t lright;,  /* phys. target (a hint) */,  ext4_fsblk_t goal;,  /* phys. block for the closest logicalallocated block to the left */,  ext4_fsblk_t pleft;,  /* phys. block for the closest logicalallocated block to the right */,  ext4_fsblk_t pright;,  /* flags. see above EXT4_MB_HINT_* */,  unsigned int flags;,  };,  这个请求描述符结构体在ext4_ext_map_blocks()中初始化(注:ext4_ext_map_blocks()的作用是查找或分配指定的block块,并完成与缓存空间的映射)。,  具体上述信息也就一个成员变量goal值的我们分析一下,goal记录是物理块号,其隐含含义比较重要:goal虽然只是记录物理块号,但是这个物理块号的选择可以很大程度的是文件保证locality特性及其物理地址连续性。,  goal是由函数ext4_ext_find_goal()来定义:,  static ext4_fsblk_t ext4_ext_find_goal(struct inode*inode,,  struct ext4_ext_path *path,,  ext4_lblk_t block),  {,  if(path) {,  intdepth = path->p_depth;,  structext4_extent *ex;,  /*,  * Try to predict block placement assuming thatwe are,  * filling in a file which will eventually be,  * non-sparse --- i.e., in the case of libbfdwriting,  * an ELF object sections out-of-order but in away,  * the eventually results in a contiguousobject or,  * executable file, or some database extendinga table,  * space file. However, this is actually somewhat,  * non-ideal if we are writing a sparse filesuch as,  * qemu or KVM writing a raw image file that isgoing,  * to stay fairly sparse, since it will end up,  * fragmenting the file systems free space. Maybe we,  * should have some hueristics or some way toallow,  * userspace to pass a hint to file system,,  * especially if the latter case turns out tobe,  * common.,  */,  ex= path[depth].p_ext;,  if(ex) {,  ext4_fsblk_text_pblk = ext4_ext_pblock(ex);,  ext4_lblk_text_block = le32_to_cpu(ex->ee_block);,  if(block > ext_block),  returnext_pblk + (block - ext_block);,  else,  returnext_pblk - (ext_block - block);,  },  /*it looks like index is empty;,  * try to find starting block from index itself*/,  if(path[depth].p_bh),  returnpath[depth].p_bh->b_blocknr;,  },  /*OK. use inodes group */,  returnext4_inode_to_goal_block(inode);,  },  细细分析这段代码,如果从根目录到指定逻辑块的path存在,那么就需要根据path来计算目标物理块的地址。,  (1) Path的终点若是dataextent,则说明该path是从根到叶子的。当请求block号大于path叶子extent的起始逻辑块号ext_block (对应物理块号为pblk),其逻辑块的距离为(block-ext_block),为在最可能上保证对应物理地址的连续性;只需返回与pblk+(block-ext_block)物理块号最接近的空闲物理块即可;而对于请求block号小于extent的起始逻辑块号ext_block的情况,只需尽最可能以pblk-( ext_block -block)物理块号为目标寻找与其物理地址最接近的空闲物理块即可。因此,我们指定goal分别为pblk+(block-ext_block)和pblk-(block-ext_block)。,  (2) 而如果path存在,却没有叶子,那则么办,很简单,我们只需要将goal物理块号指定为最后一个的extent block对应的物理块号既可。,  (3) 还有一种情况,没有给出path。个人认为,这种场景即inode刚create的情况。有专门的ext4_inode_to_goal_block()来实现:,  ext4_fsblk_t ext4_inode_to_goal_block(struct inode*inode),  {,  structext4_inode_info *ei = EXT4_I(inode);,  ext4_group_tblock_group;,  ext4_grpblk_tcolour;,  intflex_size = ext4_flex_bg_size(EXT4_SB(inode->i_sb));,  ext4_fsblk_tbg_start;,  ext4_fsblk_tlast_block;,  block_group= ei->i_block_group;,  if(flex_size >= EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME) {,  /*,  * If there are at leastEXT4_FLEX_SIZE_DIR_ALLOC_SCHEME,  * block groups per flexgroup, reserve thefirst block,  * group for directories and special files. Regular,  * files will start at the second blockgroup. This,  * tends to speed up directory access andimproves,  * fsck times.,  */,  block_group&= ~(flex_size-1);,  if(S_ISREG(inode->i_mode)),  block_group++;,  },  bg_start= ext4_group_first_block_no(inode->i_sb, block_group);,  last_block= ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es) - 1;,  /*,  * If we are doing delayed allocation, we dontneed take,  * colour into account.,  */,  if(test_opt(inode->i_sb, DELALLOC)),  returnbg_start;,  if(bg_start + EXT4_BLOCKS_PER_GROUP(inode->i_sb) <= last_block),  colour= (current->pid % 16) *,  (EXT4_BLOCKS_PER_GROUP(inode->i_sb)/ 16);,  else,  colour= (current->pid % 16) * ((last_block - bg_start) / 16);,  returnbg_start + colour;,  },  其思想是:如果flex_size至少有EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME个block groups,则定义inode所在flex_group的第二个block group的首个可用block为起始物理块号bg_block。,  当然,如果该flex_group的所有文件都以bg_block为goal的,肯定会产生竞争,所以增加color的作用,目的就是加入一个随机值,降低可能带来的竞争。,  因此,最后这种情况的goal会选择inode所在flex_group中某个随机值。,  【说明:如果flex_size只有不小于EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME,则才有可能将flex_group中第一个group分离出来,用于专门存放directories和一些特殊文件,普通文件从第二个group中分配,该特可以加速directory的访问及fsync效率。】,  2. 分配行为描述符ext4_allocation_contex,  在分配过程中,为了更好执行分配,记录一些信息,需要对分配行为进行描述,就有结构体ext4_allocation_contex:,  struct ext4_allocation_context{,  struct inode *ac_inode;,  struct super_block *ac_sb;,  /* original request */,  struct ext4_free_extent ac_o_ex;,  /* goal request (normalized ac_o_ex) */,  struct ext4_free_extent ac_g_ex;,  /* the best found extent */,  struct ext4_free_extent ac_b_ex;,  /* copy of the best found extent takenbefore preallocation efforts */,  struct ext4_free_extent ac_f_ex;,  __u16 ac_groups_scanned;,  __u16 ac_found;,  __u16 ac_tail;,  __u16 ac_buddy;,  __u16 ac_flags; /* allocation hints */,  __u8 ac_status;,  __u8 ac_criteria;,  __u8 ac_2order; /* if request is to allocate 2^N blocks and,  * N > 0, the field stores N, otherwise 0 */,  __u8 ac_op; /* operation, for history only */,  struct page *ac_bitmap_page;,  struct page *ac_buddy_page;,  struct ext4_prealloc_space *ac_pa;,  struct ext4_locality_group *ac_lg;,  };,  这个数据结构用来描述分配上下文的属性。基于结构体ext4_allocation_request,由函数ext4_mb_initialize_context()进行初始化。,  ext4_mb_initialize_context()主要工作: 利用请求描述符的信息初始化ac->ac_o_ex:申请的逻辑块号fe_logical、goal所在的group,goal的cluster号(暂时理解为物理块号);然后将ac_g_ex 赋值为ac_o_ex。,  ext4_mb_normalize_request()会对ext4_allocation_contex结构体进行normalization:,  1.计算file的大小size应该是i_size_read(ac->ac_inode)和(offset+请求长度)中的大值,其中offset是有指定block转化而来。,  2. 根据已定的算法估算文件可能的大小;,  #define NRL_CHECK_SIZE(req, size, max, chunk_size),  (req<= (size) || max <= (chunk_size)),  /*first, try to predict filesize */,  /*XXX: should this table be tunable? */,  start_off= 0;,  if(size <= 16 * 1024) {,  size= 16 * 1024;,  }else if (size <= 32 * 1024) {,  size= 32 * 1024;,  }else if (size <= 64 * 1024) {,  size= 64 * 1024;,  }else if (size <= 128 * 1024) {,  size= 128 * 1024;,  }else if (size <= 256 * 1024) {,  size= 256 * 1024;,  }else if (size <= 512 * 1024) {,  size= 512 * 1024;,  }else if (size <= 1024 * 1024) {,  size= 1024 * 1024;,  }else if (NRL_CHECK_SIZE(size, 4 * 1024 * 1024, max, 2 * 1024)) {,  start_off= ((loff_t)ac->ac_o_ex.fe_logical >>,  (21- bsbits)) << 21;,  size= 2 * 1024 * 1024;,  }else if (NRL_CHECK_SIZE(size, 8 * 1024 * 1024, max, 4 * 1024)) {,  start_off= ((loff_t)ac->ac_o_ex.fe_logical >>,  (22- bsbits)) << 22;,  size= 4 * 1024 * 1024;,  }else if (NRL_CHECK_SIZE(ac->ac_o_ex.fe_len,,  (8<<20)>>bsbits,max, 8 * 1024)) {,  start_off= ((loff_t)ac->ac_o_ex.fe_logical >>,  (23- bsbits)) << 23;,  size= 8 * 1024 * 1024;,  }else {,  start_off= (loff_t)ac->ac_o_ex.fe_logical << bsbits;,  size =ac->ac_o_ex.fe_len << bsbits;,  },  size= size >> bsbits;,  start= start_off >> bsbits;,  由此可见,预估文件大小之后得到的size和start肯定比原来的要大一些。,  3. check一下,是否覆盖了已有的prealloc空间。(如果覆盖,那就BUG);,  4. 更新ac_g_ex:根据(2)中size和start更新ac_g_ex;,  ac->ac_g_ex.fe_logical= start;,  ac->ac_g_ex.fe_len= EXT4_NUM_B2C(sbi, size);,  由上可见,通过ext4_mb_normalize_request()函数主要更新了ac->ac_g_ex成员。,  而ac->ac_b_ex是在ext4_mb_regular_allocator()函数初始化的,其表示可以分配的最佳的extent;隐含意思,就是就按这么分配。,  而ac-> ac_f_ex是在prealloc空间初始化之前保留ac_b_ex的副本,在ext4_mb_new_inode_pa()或ext4_mb_new_group_pa()中定义。,  3. 预分配空间描述符ext4_allocation_contex,  描述预分配空间大小等属性的描述符ext4_prealloc_space:,  structext4_prealloc_space {,  struct list_head pa_inode_list;,  struct list_head pa_group_list;,  union {,  struct list_head pa_tmp_list;,  struct rcu_head pa_rcu;,  } u;,  spinlock_t pa_lock;,  atomic_t pa_count;,  unsigned pa_deleted;,  ext4_fsblk_t pa_pstart; /*phys. block */,  ext4_lblk_t pa_lstart; /*log. block */,  ext4_grpblk_t pa_len; /*len of preallocated chunk */,  ext4_grpblk_t pa_free; /* howmany blocks are free */,  unsigned short pa_type; /* pa type.inode or group */,  spinlock_t *pa_obj_lock;,  struct inode *pa_inode; /*hack, for history only */,  };,  其中有四个结构体非常重要:,  pa_lstart -> prealloc 空间的起始逻辑地址(对文件而言);,  pa_pstart -> prealloc 空间的起始物理地址;,  pa_len -> prealloc 空间的长度;,  pa_free -> prealloc 空间的可用长度;,  这个结构体是在函数ext4_mb_new_inode_pa()或ext4_mb_new_group_pa()中初始化。,  暂时就分析这么几个结构体吧。,  作者:Younger Liu,,  本作品采用知识共享署名-非商业性使用-相同方式共享 3.0 未本地化版本许可协议进行许可。,

作者

教程资讯

电脑教程排行

系统教程

系统主题