CSM及TET，CS

2025-12-14 12:57:34 +08:00 · 2025-12-07 22:30:46 +08:00 · 2025-12-07 22:19:50 +08:00 · 2025-12-07 20:08:19 +08:00 · 2025-12-07 17:55:25 +08:00 · 2025-12-07 16:01:42 +08:00
20 changed files with 1870 additions and 182 deletions
--- a/.idea/Screen.iml
+++ b/.idea/Screen.iml
@@ -2,7 +2,7 @@
 <module type="PYTHON_MODULE" version="4">
  <component name="NewModuleRootManager">
    <content url="file://$MODULE_DIR$" />
-    <orderEntry type="jdk" jdkName="C:\Users\PC20230606\Miniconda3" jdkType="Python SDK" />
+    <orderEntry type="jdk" jdkName="Python 3.11" jdkType="Python SDK" />
    <orderEntry type="sourceFolder" forTests="false" />
  </component>
 </module>
--- a/main.sh
+++ b/main.sh
@@ -1,30 +1,64 @@
 #!/bin/bash
-#修改权限
-chmod -R u+w ../Screen
-#启用screen环境
-conda activate screen
+
+# =====================
+# 项目自动化筛选脚本
+# =====================
+
+# 1. 初始化设置
+# 修改上一级目录权限
+chmod -R u+w ../screen
+
+# 启用 screen 环境 (Python 3.11)
+source $(conda info --base)/etc/profile.d/conda.sh
+conda activate ~/anaconda3/envs/screen
+
 # 设置当前目录为 PYTHONPATH
 cd py/
 export PYTHONPATH=$(pwd):$PYTHONPATH
-#调用预筛选
+
+echo "============ Stage 1: Pre-process & Basic Screening ============"
+# 调用预筛选 (处理 input_pre 到 input)
 python pre_process.py
-#调用第一步筛选
+
+# 调用第一步筛选
+# 功能：读取 CIF，进行基础检查，按阴离子分类，创建独立文件夹 (Anion/ID/ID.cif)
 python step1.py

-#为第一步筛出的所有材料制作脚本
+# 为所有材料生成 Zeo++ 分析脚本
+# 功能：生成 analyze.sh，输出重定向至 log.txt
 python make_sh.py
-#调整环境
-conda deactivate
-conda activate zeo
-#运行脚本
-cd ../data/after_step1
-source sh_all.sh
-rm sh_all.sh
-#调整conda回到screen
-conda deactivate
-conda activate screen

-#启用不同的python文件做分析
+# 2. 切换环境运行 Zeo++
+echo "============ Stage 2: Zeo++ Calculations ============"
+conda deactivate
+conda activate ~/anaconda3/envs/zeo
+
+# 进入数据目录执行所有生成的 shell 脚本
+cd ../data/after_step1
+if [ -f "sh_all.sh" ]; then
+    source sh_all.sh
+    rm sh_all.sh
+else
+    echo "Error: sh_all.sh not found!"
+fi
+
+# 3. 后处理与筛选 (Step 2-4)
+echo "============ Stage 3: Data Extraction & Advanced Screening ============"
+# 切回 screen 环境
+conda deactivate
+conda activate ~/anaconda3/envs/screen
 cd ../../py
-python step2-5-file_process.py
-#python step6.py
+
+# 提取日志数据
+# 功能：遍历所有 log.txt，提取孔径、距离等参数，生成汇总 CSV 到 ../output
+python extract_data.py
+
+# 联合筛选 (Step 2, 3, 4)
+# 功能：读取 CSV，根据阈值筛选，将符合条件的材料软链接到 ../data/after_screening
+python step2_4_combined.py
+
+# Step 5 (扩胞与实际检查)
+# 注意：这一步目前尚未更新适配新的软链接结构，待后续处理
+# python step5.py
+
+echo "Done! Check results in ../data/after_screening"
--- a/main_property.sh
+++ b/main_property.sh
@@ -0,0 +1,85 @@
+#!/bin/bash
+
+# ==========================================
+# 全流程自动化脚本 (直通筛选版)
+# ==========================================
+
+# 1. 环境初始化
+echo "============ Stage 0: Initialization ============"
+chmod -R u+w ../Screen
+source $(conda info --base)/etc/profile.d/conda.sh
+
+# 激活 screen 环境
+conda activate ~/anaconda3/envs/screen
+cd py/
+export PYTHONPATH=$(pwd):$PYTHONPATH
+
+# 2. 预处理与文件整理 (替代原 Step 1)
+echo "============ Stage 1: File Organization (Direct Pass) ============"
+# 运行预处理 (可选，确保 input 文件夹就绪)
+python pre_process.py
+
+# 运行直通版整理脚本
+# 功能: 读取 input, 识别阴离子, 按结构复制到 after_step1/Anion/ID/ID.cif
+# 跳过 check_basic 等耗时检查
+python step1_direct.py
+
+# 生成 Zeo++ 运行脚本
+# 功能: 遍历 after_step1, 生成 analyze.sh 和 sh_all.sh
+python make_sh.py
+
+# 3. 运行 Zeo++ 计算
+echo "============ Stage 2: Zeo++ Calculations ============"
+conda deactivate
+conda activate ~/anaconda3/envs/zeo
+
+# 进入数据目录
+cd ../data/after_step1
+if [ -f "sh_all.sh" ]; then
+    # 执行所有计算
+    source sh_all.sh
+    # 清理总脚本 (可选)
+    # rm sh_all.sh
+else
+    echo "Error: sh_all.sh not found! Please check Stage 1."
+    exit 1
+fi
+
+# 4. 数据提取与高级分析
+echo "============ Stage 3: Data Extraction & Analysis ============"
+# 切回 screen 环境
+conda deactivate
+conda activate ~/anaconda3/envs/screen
+cd ../../py
+
+# 3.1 提取 Zeo++ 基础数据
+# 输出: ../output/Anion/Anion.csv (含 Perc, Min_d, Max_node)
+python extract_data.py
+
+# 3.2 计算角共享 (Corner Sharing)
+# 输出: 更新 CSV, 增加 Is_Only_Corner_Sharing 列
+echo "Running Corner Sharing Analysis..."
+python analyze_cs.py
+
+# 3.3 联合筛选
+# 功能: 读取 CSV, 根据阈值筛选, 生成 ../data/after_screening 软链接/文件
+python step2_4_combined.py
+
+# 3.4 CSM 分析 (仅针对筛选后的材料)
+# 输出: ../output/CSM/Anion/ID.dat
+echo "Running CSM Analysis..."
+python analyze_csm.py
+
+# 3.5 统计四面体占据率
+# 输出: 读取 .dat, 更新 CSV, 增加 Tet_Li_Ratio 列
+echo "Updating Tetrahedral Li Ratio..."
+python update_tet_occupancy.py
+
+# 5. 结束
+echo "========================================================"
+echo "All tasks completed!"
+echo "Results stored in:"
+echo "  - CSV Data:    ../output/"
+echo "  - Screened:    ../data/after_screening/"
+echo "  - CSM Details: ../output/CSM/"
+echo "========================================================"
--- a/py/CSM_reconstruct.py
+++ b/py/CSM_reconstruct.py
@@ -0,0 +1,224 @@
+import os
+import sys
+import numpy as np
+import argparse
+from tqdm import tqdm
+from scipy.spatial import ConvexHull
+from pymatgen.core import Structure
+from pymatgen.core.periodic_table import Element
+from pymatgen.analysis.chemenv.coordination_environments.coordination_geometry_finder import LocalGeometryFinder
+
+# ================= 配置区域 =================
+# 建议使用绝对路径，避免找不到文件夹
+INPUT_DIR = "../../solidstate-tools/corner-sharing/data/1209/input"  # 请确保这里有你的 .cif 文件
+OUTPUT_DIR = "../output/CSM"
+TARGET_ELEMENT = 'Li'
+ENV_TYPE = 'both'
+
+
+# ===========================================
+
+class HiddenPrints:
+    '''用于隐藏 pymatgen 繁杂的输出'''
+
+    def __enter__(self):
+        self._original_stdout = sys.stdout
+        sys.stdout = open(os.devnull, 'w')
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        sys.stdout.close()
+        sys.stdout = self._original_stdout
+
+
+def non_elements(struct):
+    """
+    【关键修复】保留卤素(F, Cl, Br, I) 和其他阴离子，防止氯化物结构被清空。
+    """
+    # 这里加入了 F, Cl, Br, I, P, Se, Te 等
+    anions_to_keep = {"O", "S", "N", "F", "Cl", "Br", "I", "P", "Se", "Te", "As", "Sb", "C"}
+    stripped = struct.copy()
+    species_to_remove = [el.symbol for el in stripped.composition.elements
+                         if el.symbol not in anions_to_keep]
+    if species_to_remove:
+        stripped.remove_species(species_to_remove)
+    return stripped
+
+
+def site_env(coord, struct, sp="Li", envtype='both'):
+    stripped = non_elements(struct)
+
+    # 如果剥离后结构为空（例如纯金属锂），直接返回
+    if len(stripped) == 0:
+        return {'csm': np.nan, 'vol': np.nan, 'type': 'Error_NoAnions'}
+
+    with_li = stripped.copy()
+    # 插入一个探测用的 Li 原子
+    with_li.append(sp, coord, coords_are_cartesian=False, validate_proximity=False)
+
+    # 尝试排序，如果因为部分占据导致排序失败，则使用原始顺序
+    try:
+        with_li = with_li.get_sorted_structure()
+    except:
+        pass
+
+    tet_oct_competition = []
+
+    # ---------------- 四面体 (Tet) 检测 ----------------
+    if envtype == 'both' or envtype == 'tet':
+        for dist in np.linspace(1, 4, 601):  # 扫描距离 1A 到 4A
+            neigh = with_li.get_neighbors(with_li.sites[0], dist)
+            if len(neigh) < 4:
+                continue
+            elif len(neigh) > 4:
+                break
+
+            neigh_coords = [i.coords for i in neigh]
+            try:
+                with HiddenPrints():
+                    lgf = LocalGeometryFinder(only_symbols=["T:4"])
+                    lgf.setup_structure(structure=with_li)
+                    lgf.setup_local_geometry(isite=0, coords=neigh_coords)
+
+                site_volume = ConvexHull(neigh_coords).volume
+                # 获取 CSM
+                csm_val = lgf.get_coordination_symmetry_measures()['T:4']['csm']
+
+                tet_env = {'csm': csm_val, 'vol': site_volume, 'type': 'tet'}
+                tet_oct_competition.append(tet_env)
+            except Exception:
+                pass
+            if len(neigh) == 4: break
+
+    # ---------------- 八面体 (Oct) 检测 ----------------
+    if envtype == 'both' or envtype == 'oct':
+        for dist in np.linspace(1, 4, 601):
+            neigh = with_li.get_neighbors(with_li.sites[0], dist)
+            if len(neigh) < 6:
+                continue
+            elif len(neigh) > 6:
+                break
+
+            neigh_coords = [i.coords for i in neigh]
+            try:
+                with HiddenPrints():
+                    lgf = LocalGeometryFinder(only_symbols=["O:6"], permutations_safe_override=False)
+                    lgf.setup_structure(structure=with_li)
+                    lgf.setup_local_geometry(isite=0, coords=neigh_coords)
+
+                site_volume = ConvexHull(neigh_coords).volume
+                csm_val = lgf.get_coordination_symmetry_measures()['O:6']['csm']
+
+                oct_env = {'csm': csm_val, 'vol': site_volume, 'type': 'oct'}
+                tet_oct_competition.append(oct_env)
+            except Exception:
+                pass
+            if len(neigh) == 6: break
+
+    # ---------------- 结果判定 ----------------
+    if len(tet_oct_competition) == 0:
+        return {'csm': np.nan, 'vol': np.nan, 'type': 'Non_' + envtype}
+    elif len(tet_oct_competition) == 1:
+        return tet_oct_competition[0]
+    elif len(tet_oct_competition) >= 2:
+        return min(tet_oct_competition, key=lambda x: x['csm'])
+
+
+def extract_sites(struct, sp="Li", envtype='both'):
+    envlist = []
+    # 遍历所有位点寻找 Li
+    for i, site in enumerate(struct):
+        site_elements = [el.symbol for el in site.species.elements]
+        if sp in site_elements:
+            try:
+                # 传入结构副本以防修改原结构
+                singleenv = site_env(site.frac_coords, struct.copy(), sp, envtype)
+                envlist.append({
+                    'site_index': i,
+                    'frac_coords': site.frac_coords,
+                    'type': singleenv.get('type', 'unknown'),
+                    'csm': singleenv.get('csm', np.nan),
+                    'volume': singleenv.get('vol', np.nan)
+                })
+            except Exception as e:
+                # 捕捉单个位点计算错误，不中断程序
+                # print(f"  [Warn] Site {i} calculation failed: {e}")
+                pass
+    return envlist
+
+
+def export_envs(envlist, sp, envtype, fname):
+    with open(fname, 'w') as f:
+        f.write('List of environment information\n')
+        f.write(f'Species : {sp}\n')
+        f.write(f'Envtype : {envtype}\n')
+        for item in envlist:
+            # 格式化输出，确保没有数据也能看懂
+            f.write(f"Site index {item['site_index']}: {item}\n")
+
+
+# ================= 主程序 =================
+def run_csm_analysis():
+    # 1. 检查目录
+    if not os.path.exists(INPUT_DIR):
+        print(f"错误: 输入目录不存在 -> {os.path.abspath(INPUT_DIR)}")
+        return
+
+    cif_files = []
+    for root, dirs, files in os.walk(INPUT_DIR):
+        for file in files:
+            if file.endswith(".cif"):
+                cif_files.append(os.path.join(root, file))
+
+    if not cif_files:
+        print(f"在 {INPUT_DIR} 中未找到 .cif 文件。")
+        return
+
+    print(f"开始分析 {len(cif_files)} 个文件 (目标元素: {TARGET_ELEMENT}, 包含阴离子: F,Cl,Br,I,O,S,N...)")
+
+    success_count = 0
+
+    for cif_path in tqdm(cif_files, desc="Calculating CSM"):
+        try:
+            # 准备路径
+            rel_path = os.path.relpath(cif_path, INPUT_DIR)
+            rel_dir = os.path.dirname(rel_path)
+            file_base = os.path.splitext(os.path.basename(cif_path))[0]
+
+            target_dir = os.path.join(OUTPUT_DIR, rel_dir)
+            if not os.path.exists(target_dir):
+                os.makedirs(target_dir)
+
+            target_dat_path = os.path.join(target_dir, f"{file_base}.dat")
+
+            # 如果文件已存在且不为空，可选择跳过
+            # if os.path.exists(target_dat_path) and os.path.getsize(target_dat_path) > 0:
+            #     continue
+
+            # 读取结构
+            struct = Structure.from_file(cif_path)
+
+            # 检查是否含 Li
+            if Element(TARGET_ELEMENT) not in struct.composition.elements:
+                continue
+
+            # 计算环境
+            env_list = extract_sites(struct, sp=TARGET_ELEMENT, envtype=ENV_TYPE)
+
+            # 写入结果 (即使 env_list 为空也写入一个标记文件，方便debug)
+            if env_list:
+                export_envs(env_list, sp=TARGET_ELEMENT, envtype=ENV_TYPE, fname=target_dat_path)
+                success_count += 1
+            else:
+                with open(target_dat_path, 'w') as f:
+                    f.write(f"No {TARGET_ELEMENT} environments found (Check connectivity or anion types).")
+
+        except Exception as e:
+            print(f"\n[Error] File: {os.path.basename(cif_path)} -> {e}")
+            continue
+
+    print(f"\n分析完成！成功生成 {success_count} 个文件。")
+    print(f"输出目录: {os.path.abspath(OUTPUT_DIR)}")
+
+
+if __name__ == "__main__":
+    run_csm_analysis()
--- a/py/CS_catulate.py
+++ b/py/CS_catulate.py
@@ -0,0 +1,118 @@
+import os
+import pandas as pd
+from pymatgen.core import Structure
+# 确保你的 utils 文件夹在 py 目录下，并且包含 CS_analyse.py
+from utils.CS_analyse import CS_catulate, check_only_corner_sharing
+from tqdm import tqdm
+
+# 配置路径
+CSV_ROOT_DIR = "../output"
+DATA_SOURCE_DIR = "../data/after_step1"
+
+
+def get_cif_path(group_name, anion_name, material_id):
+    """
+    根据 CSV 的层级信息构建 CIF 文件的绝对路径
+    """
+    # 构建路径: ../data/after_step1/Group/Anion/ID/ID.cif
+    # 注意处理单阴离子情况 (Group == Anion)
+    if group_name == anion_name:
+        # 路径: ../data/after_step1/S/123/123.cif
+        rel_path = os.path.join(DATA_SOURCE_DIR, group_name, material_id, f"{material_id}.cif")
+    else:
+        # 路径: ../data/after_step1/S+O/S/123/123.cif
+        rel_path = os.path.join(DATA_SOURCE_DIR, group_name, anion_name, material_id, f"{material_id}.cif")
+
+    return os.path.abspath(rel_path)
+
+
+def process_single_csv(csv_path, group_name, anion_name):
+    """
+    处理单个 CSV 文件：读取 -> 计算角共享 -> 添加列 -> 保存
+    """
+    print(f"正在处理 CSV: {csv_path}")
+
+    # 读取 CSV，强制 ID 为字符串
+    try:
+        df = pd.read_csv(csv_path, dtype={'Filename': str})
+    except Exception as e:
+        print(f"读取 CSV 失败: {e}")
+        return
+
+    # 检查是否已经存在该列，如果存在且想重新计算，可以先删除，或者跳过
+    if 'Is_Only_Corner_Sharing' in df.columns:
+        print("  - 'Is_Only_Corner_Sharing' 列已存在，将覆盖更新。")
+
+    results = []
+
+    # 使用 tqdm 显示进度
+    for index, row in tqdm(df.iterrows(), total=df.shape[0], desc=f"Analyzing {anion_name}"):
+        material_id = str(row['Filename']).replace('.0', '')
+        cif_path = get_cif_path(group_name, anion_name, material_id)
+
+        cs_result = None  # 默认值
+
+        if os.path.exists(cif_path):
+            try:
+                # 1. 加载结构
+                struct = Structure.from_file(cif_path)
+
+                # 2. 计算共享关系 (默认检测 Li 和常见阴离子)
+                # 你可以根据需要调整 anion 列表，或者动态使用 anion_name
+                target_anions = ['O', 'S', 'Cl', 'F', 'Br', 'I', 'N', 'P']
+                sharing_details = CS_catulate(struct, sp='Li', anion=target_anions)
+
+                # 3. 判断是否仅角共享 (返回 1 或 0 或 True/False)
+                # 根据你提供的截图，似乎是返回 0 或 1
+                is_only_corner = check_only_corner_sharing(sharing_details)
+
+                cs_result = is_only_corner
+
+            except Exception as e:
+                # print(f"计算出错 {material_id}: {e}")
+                cs_result = "Error"
+        else:
+            print(f"  - 警告: 找不到 CIF 文件 {cif_path}")
+            cs_result = "File_Not_Found"
+
+        results.append(cs_result)
+
+    # 将结果添加为新列
+    df['Is_Only_Corner_Sharing'] = results
+
+    # 保存覆盖原文件
+    df.to_csv(csv_path, index=False)
+    print(f"  - 已更新 CSV: {csv_path}")
+
+
+def run_cs_analysis():
+    """
+    遍历所有 CSV 并运行分析
+    """
+    if not os.path.exists(CSV_ROOT_DIR):
+        print(f"CSV 根目录不存在: {CSV_ROOT_DIR}")
+        return
+
+    for root, dirs, files in os.walk(CSV_ROOT_DIR):
+        for file in files:
+            if file.endswith(".csv"):
+                csv_path = os.path.join(root, file)
+
+                # 解析 Group 和 Anion (用于定位 CIF)
+                rel_root = os.path.relpath(root, CSV_ROOT_DIR)
+                path_parts = rel_root.split(os.sep)
+
+                if len(path_parts) == 1:
+                    group_name = path_parts[0]
+                    anion_name = path_parts[0]
+                elif len(path_parts) >= 2:
+                    group_name = path_parts[0]
+                    anion_name = path_parts[1]
+                else:
+                    continue
+
+                process_single_csv(csv_path, group_name, anion_name)
+
+
+if __name__ == "__main__":
+    run_cs_analysis()
--- a/py/csm.py
+++ b/py/csm.py
@@ -0,0 +1,90 @@
+import os
+from pymatgen.core import Structure
+from pymatgen.core.periodic_table import Element
+# 导入你的CSM计算工具库 (根据 provided context [11])
+try:
+    from utils.analyze_env_st import extract_sites, export_envs
+except ImportError:
+    print("Error: 找不到 utils.analyze_env_st 模块，请检查 utils 文件夹。")
+    exit()
+
+from tqdm import tqdm
+
+# ================= 配置区域 =================
+# 输入目录：使用筛选后的目录，只计算符合要求的材料
+INPUT_DIR = "../../solidstate-tools/corner-sharing/data/1209/input"
+# 输出目录
+OUTPUT_DIR = "../output/CSM"
+# 分析参数
+TARGET_ELEMENT = 'Na'
+ENV_TYPE = 'both'  # 可选 'tet', 'oct', 'both'
+# ===========================================
+
+def run_csm_analysis():
+    """
+    遍历 after_screening 文件夹，计算 CSM 并生成 .dat 文件到 output/CSM
+    """
+    if not os.path.exists(INPUT_DIR):
+        print(f"输入目录不存在: {INPUT_DIR}，请先运行筛选步骤。")
+        return
+
+    # 收集所有需要处理的 CIF 文件
+    cif_files = []
+    for root, dirs, files in os.walk(INPUT_DIR):
+        for file in files:
+            if file.endswith(".cif"):
+                # 保存完整路径
+                cif_files.append(os.path.join(root, file))
+
+    print(f"开始进行 CSM 分析，共找到 {len(cif_files)} 个筛选后的材料...")
+
+    for cif_path in tqdm(cif_files, desc="Calculating CSM"):
+        try:
+            # 1. 确定输出路径，保持目录结构
+            # 获取相对路径 (例如: S/195819.cif 或 S+O/S/195819.cif)
+            rel_path = os.path.relpath(cif_path, INPUT_DIR)
+            # 获取所在文件夹 (例如: S 或 S+O/S)
+            rel_dir = os.path.dirname(rel_path)
+            # 获取文件名 (例如: 195819)
+            file_base = os.path.splitext(os.path.basename(cif_path))[0]
+
+            # 构建目标文件夹: ../output/CSM/S/
+            target_dir = os.path.join(OUTPUT_DIR, rel_dir)
+            if not os.path.exists(target_dir):
+                os.makedirs(target_dir)
+
+            # 构建目标文件路径: ../output/CSM/S/195819.dat
+            target_dat_path = os.path.join(target_dir, f"{file_base}.dat")
+
+            # 2. 如果已经存在，跳过 (可选，视需求而定，这里默认覆盖)
+            # if os.path.exists(target_dat_path):
+            #     continue
+
+            # 3. 读取结构
+            struct = Structure.from_file(cif_path)
+
+            # 检查是否包含目标元素 (Li)
+            if Element(TARGET_ELEMENT) not in struct.composition.elements:
+                # print(f"Skipping {file_base}: No {TARGET_ELEMENT}")
+                continue
+
+            # 4. 计算 CSM (引用 utils 中的函数)
+            # extract_sites 返回环境列表
+            env_list = extract_sites(struct, sp=TARGET_ELEMENT, envtype=ENV_TYPE)
+
+            # 5. 导出结果 (引用 utils 中的函数)
+            # export_envs 将结果写入 .dat 文件
+            if env_list:
+                export_envs(env_list, sp=TARGET_ELEMENT, envtype=ENV_TYPE, fname=target_dat_path)
+            else:
+                # 如果没有提取到环境（例如没有配位环境），生成一个空文件或记录日志
+                with open(target_dat_path, 'w') as f:
+                    f.write("No environments found.")
+
+        except Exception as e:
+            print(f"处理出错 {cif_path}: {e}")
+
+    print(f"CSM 分析完成，结果已保存至 {OUTPUT_DIR}")
+
+if __name__ == "__main__":
+    run_csm_analysis()
--- a/py/extract_data.py
+++ b/py/extract_data.py
@@ -0,0 +1,154 @@
+import os
+import re
+import pandas as pd
+
+
+def extract_parameters_from_log(log_path):
+    """
+    从 log.txt 中提取三个关键参数。
+    如果未找到，返回 None。
+    """
+    if not os.path.exists(log_path):
+        return None, None, None
+
+    with open(log_path, 'r', encoding='utf-8') as f:
+        content = f.read()
+
+    # 正则表达式定义
+    # 1. Percolation diameter (原来的 Step 2)
+    # 匹配模式: #     Percolation diameter (A): 1.06
+    re_percolation = r"Percolation diameter \(A\):\s*([\d\.]+)"
+
+    # 2. Minimum of d (原来的 Step 3)
+    # 匹配模式: the minium of d \n 3.862140561244235
+    re_min_d = r"the minium of d\s*\n\s*([\d\.]+)"
+
+    # 3. Maximum node length (原来的 Step 4)
+    # 匹配模式: #     Maximum node length detected: 1.332 A
+    re_max_node = r"Maximum node length detected:\s*([\d\.]+)\s*A"
+
+    # 提取数据
+    match_perc = re.search(re_percolation, content)
+    match_d = re.search(re_min_d, content)
+    match_node = re.search(re_max_node, content)
+
+    # 获取值，如果没匹配到则为空字符串或None
+    val_perc = match_perc.group(1) if match_perc else None
+    val_d = match_d.group(1) if match_d else None
+    val_node = match_node.group(1) if match_node else None
+
+    return val_perc, val_d, val_node
+
+
+def process_folder_recursively(base_input_folder, base_output_folder):
+    """
+    递归遍历文件夹，提取数据并生成 CSV。
+    逻辑：
+    1. 遍历 base_input_folder 下的第一层子文件夹（通常是阴离子类别，如 O, S, O+S 等）。
+    2. 如果是单阴离子（如 O），直接处理其下的材料文件夹。
+    3. 如果是混合阴离子（如 O+S），需要进入下一层（如 O+S/O 和 O+S/S），分别处理。
+    4. 结果保存在 base_output_folder 下保持相同的目录结构。
+    """
+
+    # 获取 after_step1 下的所有顶层目录 (例如 O, S, Cl, S+O ...)
+    if not os.path.exists(base_input_folder):
+        print(f"输入目录 {base_input_folder} 不存在")
+        return
+
+    top_dirs = [d for d in os.listdir(base_input_folder) if os.path.isdir(os.path.join(base_input_folder, d))]
+
+    for top_dir in top_dirs:
+        top_path = os.path.join(base_input_folder, top_dir)
+
+        # 判断是否是混合阴离子目录（名字包含 +）
+        if "+" in top_dir:
+            # 混合阴离子情况：例如 S+O
+            # 需要遍历其子目录：S+O/S 和 S+O/O
+            sub_anions = [d for d in os.listdir(top_path) if os.path.isdir(os.path.join(top_path, d))]
+            for sub_anion in sub_anions:
+                # 构建路径：../data/after_step1/S+O/S
+                current_process_path = os.path.join(top_path, sub_anion)
+                # 构建输出 CSV 路径：../output/S+O/S/S.csv (或者 S+O_S.csv，这里按你要求的 O+S/O/O.csv 格式)
+                # 输出目录: ../output/S+O/S
+                output_dir = os.path.join(base_output_folder, top_dir, sub_anion)
+                csv_filename = f"{sub_anion}.csv"
+
+                extract_and_save(current_process_path, output_dir, csv_filename)
+        else:
+            # 单一阴离子情况：例如 O
+            # 路径：../data/after_step1/O
+            current_process_path = top_path
+            # 输出目录: ../output/O
+            output_dir = os.path.join(base_output_folder, top_dir)
+            csv_filename = f"{top_dir}.csv"
+
+            extract_and_save(current_process_path, output_dir, csv_filename)
+
+
+def extract_and_save(input_dir, output_dir, csv_name):
+    """
+    实际执行提取和保存的函数。
+    input_dir: 包含各个材料文件夹的目录 (例如 .../O/)
+    output_dir: CSV 保存目录
+    csv_name: CSV 文件名
+    """
+    data_list = []
+
+    # input_dir 下面应该是各个材料的文件夹，例如 141, 142 ...
+    if not os.path.exists(input_dir):
+        return
+
+    # 遍历下面的所有材料文件夹
+    material_folders = [f for f in os.listdir(input_dir) if os.path.isdir(os.path.join(input_dir, f))]
+
+    print(f"正在处理目录: {input_dir}, 发现 {len(material_folders)} 个材料文件夹")
+
+    for material_id in material_folders:
+        material_path = os.path.join(input_dir, material_id)
+        # 根据新的 step1 逻辑，log 文件名为 log.txt
+        log_path = os.path.join(material_path, "log.txt")
+
+        # 提取数据
+        perc, min_d, max_node = extract_parameters_from_log(log_path)
+
+        # 只要有一个数据存在，就记录（或者你可以改为必须全部存在）
+        # 这里设置为只要有记录就加入，方便排查错误
+        if perc or min_d or max_node:
+            data_list.append({
+                "Filename": material_id,
+                "Percolation Diameter (A)": perc,
+                "Minimum of d": min_d,
+                "Maximum Node Length (A)": max_node
+            })
+        else:
+            # 如果 log.txt 不存在或者提取不到数据，可以选择记录空值
+            data_list.append({
+                "Filename": material_id,
+                "Percolation Diameter (A)": None,
+                "Minimum of d": None,
+                "Maximum Node Length (A)": None
+            })
+
+    # 如果有数据，保存为 CSV
+    if data_list:
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+
+        csv_path = os.path.join(output_dir, csv_name)
+        df = pd.DataFrame(data_list)
+        # 调整列顺序
+        df = df[["Filename", "Percolation Diameter (A)", "Minimum of d", "Maximum Node Length (A)"]]
+
+        df.to_csv(csv_path, index=False)
+        print(f"数据已保存至: {csv_path}")
+    else:
+        print(f"目录 {input_dir} 未提取到有效数据")
+
+
+if __name__ == "__main__":
+    # 输入基础路径 (假设数据在 step1 处理后)
+    input_base = "../data/after_step1"
+    # 输出基础路径 (你提到的 output 文件夹)
+    output_base = "../output"
+
+    process_folder_recursively(input_base, output_base)
--- a/py/make_sh.py
+++ b/py/make_sh.py
@@ -1,63 +1,13 @@
 import os


-def creat_sh(input_folder, anion, sh_file_path='analyze.sh'):
-    """
-    创建shell脚本，只处理两类CIF文件：
-    1. 纯数字命名的CIF文件 (例如: 123.cif)
-    2. 数字-坐标格式的CIF文件 (例如: 123-x1y2z3.cif)
-
-    参数:
-    input_folder: 输入文件夹路径
-    anion: 阴离子类型
-    sh_file_path: 生成的shell脚本路径
-    """
-    # 文件夹路径
-    folder_path = input_folder
-
-    import re
-
-    # 定义两种文件名模式的正则表达式
-    pattern1 = re.compile(r'^\d+\.cif$')  # 纯数字.cif
-    pattern2 = re.compile(r'^\d+-x\d+y\d+z\d+\.cif$')  # 数字-x数字y数字z数字.cif
-
-    # 打开SH脚本文件用于写入
-    with open(sh_file_path, 'w') as sh_file:
-        # 写入脚本头部
-        sh_file.write('#!/bin/bash\n')
-
-        # 遍历文件夹中的所有文件
-        for filename in os.listdir(folder_path):
-            file_path = os.path.join(folder_path, filename)
-
-            # 只处理文件(不处理文件夹)
-            if os.path.isfile(file_path):
-                # 检查文件名是否匹配两种模式之一
-                if pattern1.match(filename) or pattern2.match(filename):
-                    # 生成对应的命令
-                    command = f"python ../../../tool/analyze_voronoi_nodes.py {filename} -i ../../../tool/{anion}.yaml > {filename}.txt\n"
-                    # 将命令写入SH脚本文件
-                    sh_file.write(command)
-
-    print(f"SH脚本已生成：{sh_file_path}")
-
-
-import os
-
-
 def create_sh_recursive(base_folder, tool_path="tool", relative_depth=2):
    """
-    递归遍历文件夹，为每个包含.cif文件的文件夹生成analyze.sh脚本，
+    递归遍历文件夹,为每个包含.cif文件的文件夹生成analyze.sh脚本,
    并在基础文件夹下创建一个sh_all.sh来执行所有脚本。
-
-    参数:
-        base_folder: 起始文件夹路径
-        tool_path: 工具目录的基本路径
-        relative_depth: 基础相对深度，用于计算正确的相对路径
    """
    # 用于收集所有生成的analyze.sh脚本的相对路径
    analyze_sh_paths = []
-    base_folder_name = os.path.basename(base_folder)

    def process_folder(folder_path, current_depth=0):
        print(f"处理文件夹: {folder_path}")
@@ -66,17 +16,31 @@ def create_sh_recursive(base_folder, tool_path="tool", relative_depth=2):
        folder_name = os.path.basename(folder_path)

        # 检查当前文件夹是否包含.cif文件
-        has_cif_files = any(
-            f.endswith('.cif') for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f)))
+        # 注意：这里我们只关心当前层级下的cif文件
+        cif_files = [f for f in os.listdir(folder_path) if
+                     f.endswith('.cif') and os.path.isfile(os.path.join(folder_path, f))]
+        has_cif_files = len(cif_files) > 0

-        # 如果当前文件夹包含.cif文件，生成脚本
+        # 如果当前文件夹包含.cif文件,生成脚本
        if has_cif_files:
-            # 计算正确的工具路径（根据深度增加../）
+            # 计算正确的工具路径(根据深度增加../)
            dots = "../" * (relative_depth + current_depth)
            tool_relative_path = f"{dots}{tool_path}"

-            # 确定anion参数（使用文件夹名）
-            anion = folder_name
+            # --- 修改开始: 修正Anion识别逻辑 ---
+            # 如果结构是 Anion/MaterialID/file.cif，此时folder_name是MaterialID
+            # 我们需要上一级目录的名字作为Anion (例如 'O', 'S', 'Cl') 来寻找对应的 .yaml 文件
+            # 简单的判断逻辑：如果当前文件夹名字主要由数字组成(或者是ID格式)，且包含cif文件，我们假设其父目录是Anion类型
+            # 或者更直接的逻辑：在你的新结构中，包含cif的文件夹必定是底层文件夹，其父目录必定是Anion
+            parent_dir_name = os.path.basename(os.path.dirname(folder_path))
+
+            # 这里做一个简单的保护，如果是在第一层(比如直接在O文件夹下有cif)，保持原状，否则使用父目录名
+            # 在新结构下，cif总是在 '.../O/141/141.cif'，所以anion应该是 parent_dir_name ('O')
+            if parent_dir_name in ['O', 'S', 'Cl', 'Br'] or folder_name not in ['O', 'S', 'Cl', 'Br']:
+                anion = parent_dir_name
+            else:
+                anion = folder_name
+            # --- 修改结束 ---

            # 生成脚本文件路径
            sh_file_path = os.path.join(folder_path, "analyze.sh")
@@ -84,18 +48,17 @@ def create_sh_recursive(base_folder, tool_path="tool", relative_depth=2):
            # 创建脚本
            with open(sh_file_path, 'w') as sh_file:
                sh_file.write('#!/bin/bash\n')
-                for filename in os.listdir(folder_path):
-                    file_path = os.path.join(folder_path, filename)
-                    if os.path.isfile(file_path) and filename.endswith('.cif'):
-                        command = f"python {tool_relative_path}/analyze_voronoi_nodes.py {filename} -i {tool_relative_path}/{anion}.yaml > {filename}.txt\n"
-                        sh_file.write(command)
+                for filename in cif_files:
+                    # --- 修改开始: 输出重定向到 log.txt ---
+                    command = f"python {tool_relative_path}/analyze_voronoi_nodes.py {filename} -i {tool_relative_path}/{anion}.yaml > log.txt\n"
+                    # --- 修改结束 ---
+                    sh_file.write(command)

            # 将此脚本添加到收集器中
-            # 计算相对于基础文件夹的路径
            rel_path = os.path.relpath(folder_path, base_folder)
            analyze_sh_paths.append(rel_path)

-            print(f"生成脚本: {sh_file_path} (工具路径: {tool_relative_path})")
+            print(f"生成脚本: {sh_file_path} (工具路径: {tool_relative_path}, Anion: {anion})")

        # 获取子文件夹列表
        subdirs = [d for d in os.listdir(folder_path) if os.path.isdir(os.path.join(folder_path, d))]
@@ -105,7 +68,7 @@ def create_sh_recursive(base_folder, tool_path="tool", relative_depth=2):
            elements = folder_name.split("+")
            for element in elements:
                element_dir = os.path.join(folder_path, element)
-                # 如果对应元素的子文件夹不存在，创建它
+                # 如果对应元素的子文件夹不存在,创建它
                if not os.path.exists(element_dir):
                    os.makedirs(element_dir)
                    print(f"创建子文件夹: {element_dir}")
@@ -144,13 +107,8 @@ def create_sh_recursive(base_folder, tool_path="tool", relative_depth=2):
    # 修改权限使脚本可执行
    os.chmod(sh_all_path, 0o755)
    print(f"生成总执行脚本: {sh_all_path}")
-    print("所有脚本生成完成！")
-# 示例调用
-# create_sh_recursive("../data/after_step1")
+    print("所有脚本生成完成!")
+

 if __name__ == '__main__':
-    # creat_sh("../data/after_step1/O","O","../data/after_step1/O/analyze.sh")
-    # creat_sh("../data/after_step1/S","S","../data/after_step1/S/analyze.sh")
-    # creat_sh("../data/after_step1/Cl","Cl","../data/after_step1/Cl/analyze.sh")
-    # creat_sh("../data/after_step1/Br","Br","../data/after_step1/Br/analyze.sh")
    create_sh_recursive("../data/after_step1")
--- a/py/step1.py
+++ b/py/step1.py
@@ -1,54 +1,113 @@
 from pymatgen.core import Structure
-from pymatgen.core.periodic_table import Element, Specie
-from pymatgen.io.cif import CifWriter
-
 from crystal_2 import crystal
-import crystal_2
 import os
 import shutil
-from pymatgen.io.cif import CifWriter
+
+
+def get_anion_type(structure):
+    """
+    判断阴离子类型。
+    仅识别 O, S, Cl, Br 及其组合。
+    其他非金属元素（如 P, N, F 等）将被忽略。
+    """
+    # 仅保留这四种目标阴离子
+    valid_anions = {'O', 'S', 'Cl', 'Br'}
+
+    # 获取结构中的所有元素符号
+    elements = set([e.symbol for e in structure.composition.elements])
+
+    # 取交集找到当前结构包含的目标阴离子
+    found_anions = elements.intersection(valid_anions)
+
+    if not found_anions:
+        return "Unknown"
+
+    # 如果有多个阴离子，按字母顺序排序并用 '+' 连接
+    sorted_anions = sorted(list(found_anions))
+    return "+".join(sorted_anions)
+

 def read_files_check_basic(folder_path):
-    file_contents = []
+    """
+    读取 CIF 文件，进行基础检查 (check_basic)，
+    通过筛选后按自定义阴离子规则分类并整理到 after_step1 文件夹。
+    """
+    # 输出基础路径
+    output_base = "../data/after_step1"

    if not os.path.exists(folder_path):
        print(f"{folder_path} 文件夹不存在")
-        return file_contents
+        return

-    for filename in os.listdir(folder_path):
+    # 确保输出目录存在
+    if not os.path.exists(output_base):
+        os.makedirs(output_base)
+
+    cif_files = [f for f in os.listdir(folder_path) if f.endswith(".cif")]
+    print(f"在 {folder_path} 发现 {len(cif_files)} 个 CIF 文件，开始筛选与整理...")
+
+    count_pass = 0
+
+    for filename in cif_files:
        file_path = os.path.join(folder_path, filename)

-        if os.path.isfile(file_path):
-            try:
-                temp = crystal(file_path)
-                file_contents.append(temp)
-            except Exception as e:
-                print(e)
-            print(f"正在处理{filename}")
+        # 1. 调用 crystal_2 进行基础筛选
+        try:
+            temp = crystal(file_path)
+            # 进行基础检查 (电荷平衡、化学式检查等)
            temp.check_basic()
-            if temp.check_basic_result:
-                if not "+" in temp.anion:
-                    target_folder = os.path.join("../data/after_step1",f"{temp.anion}")
+
+            if not temp.check_basic_result:
+                print(f"Skipped: {filename} (未通过 check_basic)")
+                continue
+
+        except Exception as e:
+            print(f"Error checking {filename}: {e}")
+            continue
+
+        # 2. 筛选通过，进行分类整理
+        try:
+            print(f"Processing: {filename} (Passed)")
+            count_pass += 1
+
+            # 为了确保分类逻辑与 Direct 版本一致，重新读取结构判断阴离子
+            # (忽略 crystal_2 内部可能基于 P/N 等元素的命名)
+            struct = Structure.from_file(file_path)
+            anion_type = get_anion_type(struct)
+
+            # 获取不带后缀的文件名 (ID)
+            file_base_name = os.path.splitext(filename)[0]
+
+            # --- 构建目标路径逻辑 (Anion/ID/ID.cif) ---
+
+            if "+" in anion_type:
+                # 混合阴离子情况 (如 S+O)
+                # 分别复制到 S+O/S 和 S+O/O 下
+                sub_anions = anion_type.split("+")
+                for sub in sub_anions:
+                    # 路径: ../data/after_step1/S+O/S/123/123.cif
+                    target_folder = os.path.join(output_base, anion_type, sub, file_base_name)
                    if not os.path.exists(target_folder):
                        os.makedirs(target_folder)

-                    # 目标文件路径
-                    target_file_path = os.path.join(target_folder, filename)
+                    target_file = os.path.join(target_folder, filename)
+                    shutil.copy(file_path, target_file)
+            else:
+                # 单一阴离子或 Unknown: ../data/after_step1/S/123/123.cif
+                target_folder = os.path.join(output_base, anion_type, file_base_name)
+                if not os.path.exists(target_folder):
+                    os.makedirs(target_folder)

-                    # 复制文件到目标文件夹
-                    shutil.copy(file_path, target_file_path)
-                    print(f"文件 {filename}通过基本筛选，已复制到 {target_folder}")
-                else:
-                    anions = temp.anion.split("+")
-                    for anion in anions:
-                        target_folder = os.path.join("../data/after_step1", f"{temp.anion}")
-                        target_folder = os.path.join(target_folder, anion)
-                        if not os.path.exists(target_folder):
-                            os.makedirs(target_folder)
+                target_file = os.path.join(target_folder, filename)
+                shutil.copy(file_path, target_file)

-                        # 目标文件路径
-                        target_file_path = os.path.join(target_folder, filename)
-                        # 复制文件到目标文件夹
-                        shutil.copy(file_path, target_file_path)
-                        print(f"文件 {filename}通过基本筛选，已复制到 {target_folder}")
-read_files_check_basic("../data/input")
+        except Exception as e:
+            print(f"Error copying {filename}: {e}")
+
+    print(f"处理完成。共 {len(cif_files)} 个文件，通过筛选 {count_pass} 个。")
+
+
+if __name__ == "__main__":
+    # 根据你的 readme，MP数据在 input_pre，ICSD在 input
+    # 这里默认读取 input，你可以根据实际情况修改
+    read_files_check_basic("../../solidstate-tools/corner-sharing/data/1209/input")
--- a/py/step1_direct.py
+++ b/py/step1_direct.py
@@ -0,0 +1,103 @@
+import os
+import shutil
+from pymatgen.core import Structure
+
+
+def get_anion_type(structure):
+    """
+    判断阴离子类型。
+    仅识别 O, S, Cl, Br 及其组合。
+    其他非金属元素（如 P, N, F 等）将被忽略：
+    - Li3PS4 (含 P, S) -> 识别为 S
+    - LiFePO4 (含 P, O) -> 识别为 O
+    - Li3P (仅 P) -> 识别为 Unknown
+    """
+    # --- 修改处：仅保留这四种目标阴离子 ---
+    valid_anions = {'O', 'S', 'Cl', 'Br'}
+
+    # 获取结构中的所有元素符号
+    elements = set([e.symbol for e in structure.composition.elements])
+
+    # 取交集找到当前结构包含的目标阴离子
+    found_anions = elements.intersection(valid_anions)
+
+    if not found_anions:
+        return "Unknown"
+
+    # 如果有多个阴离子，按字母顺序排序并用 '+' 连接
+    sorted_anions = sorted(list(found_anions))
+    return "+".join(sorted_anions)
+
+
+def organize_files_direct(input_folder, output_base):
+    if not os.path.exists(input_folder):
+        print(f"输入文件夹不存在: {input_folder}")
+        return
+
+    # 确保输出目录存在
+    if not os.path.exists(output_base):
+        os.makedirs(output_base)
+
+    cif_files = [f for f in os.listdir(input_folder) if f.endswith(".cif")]
+    print(f"发现 {len(cif_files)} 个 CIF 文件，开始直接整理...")
+
+    count_dict = {}
+
+    for filename in cif_files:
+        file_path = os.path.join(input_folder, filename)
+
+        try:
+            # 读取结构分类
+            struct = Structure.from_file(file_path)
+            anion_type = get_anion_type(struct)
+
+            # 统计一下分类情况（可选）
+            count_dict[anion_type] = count_dict.get(anion_type, 0) + 1
+
+            # 获取不带后缀的文件名 (ID)
+            file_base_name = os.path.splitext(filename)[0]
+
+            # --- 构建目标路径逻辑 ---
+            # 目标: ../data/after_step1 / AnionType / ID / ID.cif
+
+            if "+" in anion_type:
+                # 混合阴离子情况 (如 S+O)
+                # 将文件复制到 S+O 下的各个子阴离子文件夹中 (S+O/S/ID/ID.cif 和 S+O/O/ID/ID.cif)
+                # 这样既保留了组合关系，又方便后续脚本按元素查找
+                sub_anions = anion_type.split("+")
+                for sub in sub_anions:
+                    # 路径: after_step1/S+O/S/123/123.cif
+                    target_folder = os.path.join(output_base, anion_type, sub, file_base_name)
+                    if not os.path.exists(target_folder):
+                        os.makedirs(target_folder)
+
+                    target_file = os.path.join(target_folder, filename)
+                    shutil.copy(file_path, target_file)
+
+                # print(f"整理: {filename} -> {anion_type} (Split)")
+
+            else:
+                # 单一阴离子或 Unknown: after_step1/S/123/123.cif
+                target_folder = os.path.join(output_base, anion_type, file_base_name)
+                if not os.path.exists(target_folder):
+                    os.makedirs(target_folder)
+
+                target_file = os.path.join(target_folder, filename)
+                shutil.copy(file_path, target_file)
+                # print(f"整理: {filename} -> {anion_type}")
+
+        except Exception as e:
+            print(f"处理 {filename} 失败: {e}")
+
+    print("整理完成。分类统计:")
+    for k, v in count_dict.items():
+        print(f"  {k}: {v}")
+
+
+if __name__ == "__main__":
+    # 输入路径
+    input_dir = "../../solidstate-tools/corner-sharing/data/1209/input"  # 如果是MP数据请改为 ../data/input_pre
+    # 输出路径
+    output_dir = "../data/after_step1"
+
+    organize_files_direct(input_dir, output_dir)
--- a/py/step2_4_combined.py
+++ b/py/step2_4_combined.py
@@ -0,0 +1,162 @@
+import os
+import pandas as pd
+import math
+import shutil
+
+# ================= 配置区域 =================
+# 定义各阴离子的筛选阈值
+THRESHOLDS = {
+    "O": {"perc": 0.50, "min_d": 3.0, "node": 2.2},
+    "S": {"perc": 0.55, "min_d": 3.0, "node": 2.2},
+    "Cl": {"perc": 0.45, "min_d": 3.0, "node": 2.0},
+    "Br": {"perc": 0.45, "min_d": 3.0, "node": 2.0}
+}
+
+# 路径配置
+CSV_ROOT_DIR = "../output"  # CSV 所在的根目录
+DATA_SOURCE_DIR = "../data/after_step1"  # 原始 CIF 文件所在的根目录
+TARGET_DIR = "../data/after_screening"  # 筛选后放置软链接的目标目录
+
+
+# ===========================================
+
+def check_requirements(row, anion_type):
+    """
+    检查单行数据是否符合要求
+    """
+    config = THRESHOLDS.get(anion_type)
+    if not config:
+        return False
+
+    try:
+        perc = float(row["Percolation Diameter (A)"])
+        min_d = float(row["Minimum of d"])
+        node = float(row["Maximum Node Length (A)"])
+
+        if math.isnan(perc) or math.isnan(min_d) or math.isnan(node):
+            return False
+
+        # 筛选逻辑
+        c1 = perc > config["perc"]
+        c2 = min_d < config["min_d"]
+        c3 = node > config["node"]
+
+        return c1 and c2 and c3
+
+    except (ValueError, TypeError):
+        return False
+
+
+def create_result_file(group_name, anion_name, material_id):
+    """
+    创建结果文件 (这里改为直接复制，软链接在跨文件系统或某些环境下可能不稳定，复制更稳妥)
+    如果确实需要软链接，可以将 shutil.copy 换回 os.symlink
+    """
+    # 1. 构建源文件路径
+    # 正确路径: ../data/after_step1/Group/MaterialID/MaterialID.cif
+    # 例如: ../data/after_step1/S/195819/195819.cif
+    # 注意：如果原本结构是 S+O/S/ID... 这里会自动适配
+
+    # 这里的路径逻辑要非常小心，取决于 extract_data.py 是怎么生成 CSV 目录结构的
+    # 如果 CSV 在 output/S/S.csv -> 对应源文件在 after_step1/S/...
+    # 如果 CSV 在 output/S+O/S/S.csv -> 对应源文件在 after_step1/S+O/S/...
+
+    if group_name == anion_name:
+        # 单阴离子情况 (如 output/S/S.csv -> after_step1/S/ID/ID.cif)
+        rel_source_path = os.path.join(DATA_SOURCE_DIR, group_name, material_id, f"{material_id}.cif")
+    else:
+        # 混合阴离子情况 (如 output/S+O/S/S.csv -> after_step1/S+O/S/ID/ID.cif)
+        rel_source_path = os.path.join(DATA_SOURCE_DIR, group_name, anion_name, material_id, f"{material_id}.cif")
+
+    abs_source_path = os.path.abspath(rel_source_path)
+
+    if not os.path.exists(abs_source_path):
+        print(f"源文件不存在: {abs_source_path}")
+        return
+
+    # 2. 构建目标文件夹路径 ../data/after_screening/Group/Anion
+    if group_name == anion_name:
+        target_subdir = os.path.join(TARGET_DIR, group_name)
+    else:
+        target_subdir = os.path.join(TARGET_DIR, group_name, anion_name)
+
+    if not os.path.exists(target_subdir):
+        os.makedirs(target_subdir)
+
+    # 3. 构建目标文件路径
+    target_file_path = os.path.join(target_subdir, f"{material_id}.cif")
+
+    # 4. 执行复制 (改为复制以确保结果独立)
+    try:
+        if os.path.exists(target_file_path):
+            os.remove(target_file_path)
+
+        shutil.copy(abs_source_path, target_file_path)
+        # 如果你非常确定要软链接，请注释上一行，解开下一行：
+        # os.symlink(abs_source_path, target_file_path)
+
+    except OSError as e:
+        print(f"创建文件失败 {material_id}: {e}")
+
+
+def process_all_csvs():
+    """
+    遍历 output 文件夹下的所有 CSV 并处理
+    """
+    if not os.path.exists(CSV_ROOT_DIR):
+        print(f"CSV 目录不存在: {CSV_ROOT_DIR}")
+        return
+
+    print("开始执行 Step 2-4 联合筛选...")
+
+    for root, dirs, files in os.walk(CSV_ROOT_DIR):
+        for file in files:
+            if file.endswith(".csv"):
+                csv_path = os.path.join(root, file)
+
+                # --- 核心修正 1: 路径解析逻辑 ---
+                # 获取相对于 output 根目录的路径部分
+                # 例如 root = ../output/S -> rel_root = S
+                # 例如 root = ../output/S+O/S -> rel_root = S+O/S
+                rel_root = os.path.relpath(root, CSV_ROOT_DIR)
+                path_parts = rel_root.split(os.sep)
+
+                # 解析 Group 和 Anion
+                if len(path_parts) == 1:
+                    # 单层目录: output/S -> Group=S, Anion=S
+                    group_name = path_parts[0]
+                    anion_name = path_parts[0]
+                elif len(path_parts) >= 2:
+                    # 双层目录: output/S+O/S -> Group=S+O, Anion=S
+                    group_name = path_parts[0]
+                    anion_name = path_parts[1]
+                else:
+                    # 根目录下直接有csv的情况 (不应该发生)
+                    continue
+
+                if anion_name not in THRESHOLDS:
+                    print(f"跳过不支持的阴离子类型: {anion_name}")
+                    continue
+
+                print(f"正在处理: Group={group_name}, Anion={anion_name} ({file})")
+
+                # --- 核心修正 2: 防止 Filename 被读取为浮点数 ---
+                # dtypeStr={'Filename': str} 强制将第一列读取为字符串
+                df = pd.read_csv(csv_path, dtype={'Filename': str})
+
+                pass_count = 0
+                total_count = len(df)
+
+                for index, row in df.iterrows():
+                    # 去除可能存在的 .0 后缀 (以防万一 CSV 里已经写成了浮点格式)
+                    material_id = str(row['Filename']).replace('.0', '')
+
+                    if check_requirements(row, anion_name):
+                        create_result_file(group_name, anion_name, material_id)
+                        pass_count += 1
+
+                print(f"  - 完成: {pass_count}/{total_count} 个材料通过筛选并保存至 {TARGET_DIR}。")
+
+
+if __name__ == "__main__":
+    process_all_csvs()
--- a/py/update_tet_occupancy.py
+++ b/py/update_tet_occupancy.py
@@ -0,0 +1,129 @@
+import os
+import pandas as pd
+from tqdm import tqdm
+
+# ================= 配置区域 =================
+# CSV 所在的根目录
+CSV_ROOT_DIR = "../output"
+# CSM .dat 文件所在的根目录
+CSM_ROOT_DIR = "../output/CSM"
+
+
+# ===========================================
+
+def calculate_tet_ratio_from_dat(dat_path):
+    """
+    解析 .dat 文件，计算四面体位 Li 的占比。
+    返回: float (0.0 - 1.0) 或 None (如果文件不存在或为空)
+    """
+    if not os.path.exists(dat_path):
+        return None
+
+    tet_count = 0
+    total_count = 0
+
+    try:
+        with open(dat_path, 'r', encoding='utf-8') as f:
+            lines = f.readlines()
+
+            # 简单检查文件是否包含 "No environments found"
+            if len(lines) > 0 and "No environments found" in lines[0]:
+                return None
+
+            for line in lines:
+                # 根据截图，每行是一个位点的信息
+                # 简单字符串匹配，这比 eval 更安全且足够快
+                if "'type': 'tet'" in line:
+                    tet_count += 1
+                    total_count += 1
+                elif "'type': 'oct'" in line:
+                    total_count += 1
+                # 如果还有其他类型，可以在这里加，或者只要是位点行都算进 total
+
+        if total_count == 0:
+            return 0.0
+
+        return round(tet_count / total_count, 4)
+
+    except Exception as e:
+        print(f"解析出错 {dat_path}: {e}")
+        return None
+
+
+def process_single_csv(csv_path, group_name, anion_name):
+    """
+    读取 CSV -> 寻找对应的 CSM dat 文件 -> 计算比例 -> 更新 CSV
+    """
+    print(f"正在更新 CSV: {csv_path}")
+
+    # 读取 CSV，确保 ID 是字符串
+    try:
+        df = pd.read_csv(csv_path, dtype={'Filename': str})
+    except Exception as e:
+        print(f"读取 CSV 失败: {e}")
+        return
+
+    tet_ratios = []
+
+    # 遍历 CSV 中的每一行
+    for index, row in tqdm(df.iterrows(), total=df.shape[0], desc="Updating Occupancy"):
+        material_id = str(row['Filename']).replace('.0', '')
+
+        # 构建对应的 .dat 文件路径
+        # 路径逻辑: ../output/CSM/Group/Anion/ID.dat
+        # 注意: 这里的 Group/Anion 结构必须与 analyze_csm.py 生成的一致
+
+        if group_name == anion_name:
+            # 单一阴离子: ../output/CSM/S/123.dat
+            dat_rel_path = os.path.join(group_name, f"{material_id}.dat")
+        else:
+            # 混合阴离子: ../output/CSM/S+O/S/123.dat
+            dat_rel_path = os.path.join(group_name, anion_name, f"{material_id}.dat")
+
+        dat_path = os.path.join(CSM_ROOT_DIR, dat_rel_path)
+
+        # 计算比例
+        ratio = calculate_tet_ratio_from_dat(dat_path)
+        tet_ratios.append(ratio)
+
+    # 添加或更新列
+    df['Tet_Li_Ratio'] = tet_ratios
+
+    # 保存
+    df.to_csv(csv_path, index=False)
+    print(f"  - 已保存更新后的数据到: {csv_path}")
+
+
+def run_update():
+    """
+    主程序：遍历 output 目录下的 CSV
+    """
+    if not os.path.exists(CSV_ROOT_DIR):
+        print(f"CSV 目录不存在: {CSV_ROOT_DIR}")
+        return
+
+    for root, dirs, files in os.walk(CSV_ROOT_DIR):
+        for file in files:
+            if file.endswith(".csv"):
+                csv_path = os.path.join(root, file)
+
+                # 解析路径获取 Group 和 Anion
+                # root: ../output/S  --> rel: S
+                rel_root = os.path.relpath(root, CSV_ROOT_DIR)
+                path_parts = rel_root.split(os.sep)
+
+                if len(path_parts) == 1:
+                    group_name = path_parts[0]
+                    anion_name = path_parts[0]
+                elif len(path_parts) >= 2:
+                    group_name = path_parts[0]
+                    anion_name = path_parts[1]
+                else:
+                    continue
+
+                # 只有当 CSM 目录里有对应的文件夹时才处理（可选）
+                process_single_csv(csv_path, group_name, anion_name)
+
+
+if __name__ == "__main__":
+    run_update()
--- a/py/utils/CS_analyse.py
+++ b/py/utils/CS_analyse.py
@@ -0,0 +1,356 @@
+from typing import List, Dict
+
+from pymatgen.core.structure import Structure
+from pymatgen.analysis.local_env import VoronoiNN
+import numpy as np
+
+def check_real(nearest):
+    real_nearest = []
+    for site in nearest:
+        if np.all((site.frac_coords >= 0) & (site.frac_coords <= 1)):
+             real_nearest.append(site)
+
+    return real_nearest
+
+def special_check_for_3(site, nearest):
+    real_nearest = []
+    distances = []
+    for site2 in nearest:
+        distance = np.linalg.norm(np.array(site.frac_coords) - np.array(site2.frac_coords))
+        distances.append(distance)
+
+    sorted_indices = np.argsort(distances)
+    for index in sorted_indices[:3]:
+        real_nearest.append(nearest[index])
+
+    return real_nearest
+
+
+def CS_catulate(
+        struct,
+        sp: str = 'Li',
+        anion: List[str] = ['O'],
+        tol: float = 0,
+        cutoff: float = 3.0,
+        notice: bool = False
+) -> Dict[str, Dict[str, int]]:
+    """
+    计算结构中不同类型阳离子多面体之间的共享关系（角、边、面共享）。
+
+    该函数会分别计算以下三种情况的共享数量：
+    1. 目标原子 vs 目标原子 (e.g., Li-Li)
+    2. 目标原子 vs 其他阳离子 (e.g., Li-X)
+    3. 其他阳离子 vs 其他阳离子 (e.g., X-Y)
+
+    参数:
+        struct (Structure): 输入的pymatgen结构对象。
+        sp (str): 目标元素符号，默认为 'Li'。
+        anion (list): 阴离子元素符号列表，默认为 ['O']。
+        tol (float): VoronoiNN 的容差。对于Li，通常设为0。
+        cutoff (float): VoronoiNN 的截断距离。对于Li，通常设为3.0。
+        notice (bool): 是否打印详细的共享信息。
+
+    返回:
+        dict: 一个字典，包含三类共享关系的统计结果。
+              键 "sp_vs_sp", "sp_vs_other", "other_vs_other" 分别对应上述三种情况。
+              每个键的值是另一个字典，统计了共享2个(边)、3个(面)等情况的数量。
+              例如: {'sp_vs_sp': {'1': 10, '2': 4}, 'sp_vs_other': ...}
+              共享1个阴离子为角共享，2个为边共享，3个为面共享。
+    """
+    # 初始化 VoronoiNN 对象
+    voro_nn = VoronoiNN(tol=tol, cutoff=cutoff)
+
+    # 1. 分类存储所有阳离子的近邻阴离子信息
+    target_sites_info = []
+    other_cation_sites_info = []
+
+    for index, site in enumerate(struct.sites):
+        # 跳过阴离子本身
+        if site.species.chemical_system in anion:
+            continue
+
+        # 获取当前位点的近邻阴离子
+        try:
+            # 使用 get_nn_info 更直接
+            nn_info = voro_nn.get_nn_info(struct, index)
+            nearest_anions = [
+                nn["site"] for nn in nn_info
+                if nn["site"].species.chemical_system in anion
+            ]
+        except Exception as e:
+            print(f"Warning: Could not get neighbors for site {index} ({site.species_string}): {e}")
+            continue
+
+        if not nearest_anions:
+            continue
+
+        # 整理信息
+        site_info = {
+            'index': index,
+            'element': site.species.chemical_system,
+            'nearest_anion_indices': {nn.index for nn in nearest_anions}
+        }
+
+        # 根据是否为目标原子进行分类
+        if site.species.chemical_system == sp:
+            target_sites_info.append(site_info)
+        else:
+            other_cation_sites_info.append(site_info)
+
+    # 2. 初始化结果字典
+    # 共享数量key: 1-角, 2-边, 3-面
+    results = {
+        "sp_vs_sp": {"1": 0, "2": 0, "3": 0, "4": 0},
+        "sp_vs_other": {"1": 0, "2": 0, "3": 0, "4": 0},
+        "other_vs_other": {"1": 0, "2": 0, "3": 0, "4": 0},
+    }
+
+    # 3. 计算不同类别之间的共享关系
+
+    # 3.1 目标原子 vs 目标原子 (sp_vs_sp)
+    for i in range(len(target_sites_info)):
+        for j in range(i + 1, len(target_sites_info)):
+            atom_i = target_sites_info[i]
+            atom_j = target_sites_info[j]
+
+            shared_anions = atom_i['nearest_anion_indices'].intersection(atom_j['nearest_anion_indices'])
+            shared_count = len(shared_anions)
+
+            if shared_count > 0 and str(shared_count) in results["sp_vs_sp"]:
+                results["sp_vs_sp"][str(shared_count)] += 1
+                if notice:
+                    print(
+                        f"[Li-Li] Atom {atom_i['index']} and {atom_j['index']} share {shared_count} anions: {shared_anions}")
+
+    # 3.2 目标原子 vs 其他阳离子 (sp_vs_other)
+    for atom_sp in target_sites_info:
+        for atom_other in other_cation_sites_info:
+            shared_anions = atom_sp['nearest_anion_indices'].intersection(atom_other['nearest_anion_indices'])
+            shared_count = len(shared_anions)
+
+            if shared_count > 0 and str(shared_count) in results["sp_vs_other"]:
+                results["sp_vs_other"][str(shared_count)] += 1
+                if notice:
+                    print(
+                        f"[Li-Other] Atom {atom_sp['index']} and {atom_other['index']} share {shared_count} anions: {shared_anions}")
+
+    # 3.3 其他阳离子 vs 其他阳离子 (other_vs_other)
+    for i in range(len(other_cation_sites_info)):
+        for j in range(i + 1, len(other_cation_sites_info)):
+            atom_i = other_cation_sites_info[i]
+            atom_j = other_cation_sites_info[j]
+
+            shared_anions = atom_i['nearest_anion_indices'].intersection(atom_j['nearest_anion_indices'])
+            shared_count = len(shared_anions)
+
+            if shared_count > 0 and str(shared_count) in results["other_vs_other"]:
+                results["other_vs_other"][str(shared_count)] += 1
+                if notice:
+                    print(
+                        f"[Other-Other] Atom {atom_i['index']} and {atom_j['index']} share {shared_count} anions: {shared_anions}")
+
+    return results
+
+def CS_catulate_old(struct, sp='Li', anion=['O'], tol=0, cutoff=3.0,notice=False,ID=None):
+    """
+    计算结构中目标元素与最近阴离子的共享关系。
+
+    参数:
+        struct (Structure): 输入结构。
+        sp (str): 目标元素符号，默认为 'Li'。
+        anion (list): 阴离子列表，默认为 ['O']。
+        tol (float): VoronoiNN 的容差，默认为 0。
+        cutoff (float): VoronoiNN 的截断距离，默认为 3.0。
+
+    返回:
+        list: 包含每个目标位点及其最近阴离子索引的列表。
+    """
+    # 初始化 VoronoiNN 对象
+    if sp=='Li':
+        tol = 0
+        cutoff = 3.0
+    voro_nn = VoronoiNN(tol=tol, cutoff=cutoff)
+    # 初始化字典，用于统计共享关系
+    shared_count = {"2": 0, "3": 0,"4":0,"5":0,"6":0}
+    # 存储结果的列表
+    atom_dice = []
+
+    # 遍历结构中的每个位点
+    for index,site in enumerate(struct.sites):
+        # 跳过阴离子位点
+        if site.species.chemical_system in anion:
+            continue
+        # 跳过Li原子
+        if site.species.chemical_system == sp:
+            continue
+        # 获取 Voronoi 多面体信息
+        voro_info = voro_nn.get_voronoi_polyhedra(struct, index)
+
+        # 找到最近的阴离子位点
+        nearest_anions = [
+            nn_info["site"] for nn_info in voro_info.values()
+            if nn_info["site"].species.chemical_system in anion
+        ]
+
+        # 如果没有找到最近的阴离子，跳过
+        if not nearest_anions:
+            print(f"No nearest anions found for {ID} site {index}.")
+            continue
+        if site.species.chemical_system == 'B' or site.species.chemical_system == 'N':
+            nearest_anions = special_check_for_3(site,nearest_anions)
+        nearest_anions = check_real(nearest_anions)
+        # 将结果添加到 atom_dice 列表中
+        atom_dice.append({
+            'index': index,
+            'nearest_index': [nn.index for nn in nearest_anions]
+        })
+
+
+
+
+
+        # 枚举 atom_dice 中的所有原子对
+    for i, atom_i in enumerate(atom_dice):
+        for j, atom_j in enumerate(atom_dice[i + 1:], start=i + 1):
+            # 获取两个原子的最近阴离子索引
+            nearest_i = set(atom_i['nearest_index'])
+            nearest_j = set(atom_j['nearest_index'])
+
+            # 比较最近阴离子的交集大小
+            shared_count_key = str(len(nearest_i & nearest_j))
+
+            # 更新字典中的计数
+            if shared_count_key in shared_count:
+                shared_count[shared_count_key] += 1
+                if notice:
+                    if shared_count_key=='2':
+                        print(f"{atom_j['index']}与{atom_i['index']}之间存在共线")
+                        print(f"共线的阴离子为{nearest_i & nearest_j}")
+                    if shared_count_key=='3':
+                        print(f"{atom_j['index']}与{atom_i['index']}之间存在共面")
+                        print(f"共面的阴离子为{nearest_i & nearest_j}")
+
+    # # 最后将字典中的值除以 2，因为每个共享关系被计算了两次
+    # for key in shared_count.keys():
+    #     shared_count[key] //= 2
+
+    return shared_count
+
+
+def CS_count(struct, sharing_results: Dict[str, Dict[str, int]], sp: str = 'Li') -> float:
+    """
+    分析多面体共享结果，计算平均每个目标原子参与的共享阴离子数。
+
+    这个函数是 calculate_polyhedra_sharing 的配套函数。
+
+    参数:
+        struct (Structure): 输入的pymatgen结构对象，用于统计目标原子总数。
+        sharing_results (dict): 来自 calculate_polyhedra_sharing 函数的输出结果。
+        sp (str): 目标元素符号，默认为 'Li'。
+
+    返回:
+        float: 平均每个目标原子sp参与的共享阴离子数量。
+               例如，结果为2.5意味着平均每个Li原子通过共享与其他阳离子
+               （包括Li和其他阳离子）连接了2.5个阴离子。
+    """
+    # 1. 统计结构中目标原子的总数
+    target_atom_count = 0
+    for site in struct.sites:
+        if site.species.chemical_system == sp:
+            target_atom_count += 1
+
+    # 如果结构中没有目标原子，直接返回0，避免除以零错误
+    if target_atom_count == 0:
+        return 0.0
+
+    # 2. 计算加权的共享阴离子总数
+    total_shared_anions = 0
+
+    # 处理 sp_vs_sp (例如 Li-Li) 的共享
+    # 每个共享关系涉及两个目标原子，所以权重需要乘以 2
+    if "sp_vs_sp" in sharing_results:
+        sp_vs_sp_counts = sharing_results["sp_vs_sp"]
+        for num_shared_str, count in sp_vs_sp_counts.items():
+            num_shared = int(num_shared_str)
+            # 权重 = 共享阴离子数 * 涉及的目标原子数 (2) * 出现次数
+            total_shared_anions += num_shared * 2 * count
+
+    # 处理 sp_vs_other (例如 Li-X) 的共享
+    # 每个共享关系涉及一个目标原子，所以权重乘以 1
+    if "sp_vs_other" in sharing_results:
+        sp_vs_other_counts = sharing_results["sp_vs_other"]
+        for num_shared_str, count in sp_vs_other_counts.items():
+            num_shared = int(num_shared_str)
+            # 权重 = 共享阴离子数 * 涉及的目标原子数 (1) * 出现次数
+            total_shared_anions += num_shared * 1 * count
+
+    # 3. 计算平均值
+    # 平均每个目标原子参与的共享阴离子数 = 总的加权共享数 / 目标原子总数
+    average_sharing_per_atom = total_shared_anions / target_atom_count
+
+    return average_sharing_per_atom
+def CS_count_old(struct, shared_count, sp='Li'):
+    count = 0
+    for site in struct.sites:
+        if site.species.chemical_system == sp:
+            count += 1  # 累加符合条件的原子数量
+
+    CS_count = 0
+    for i in range(2, 7):  # 遍历范围 [2, 3, 4, 5]
+        if str(i) in shared_count:  # 检查键是否存在
+            CS_count += shared_count[str(i)] * i  # 累加计算结果
+
+    if count > 0:  # 防止除以零
+        CS_count /= count  # 平均化结果
+    else:
+        CS_count = 0  # 如果 count 为 0，直接返回 0
+
+    return CS_count
+
+
+def check_only_corner_sharing(sharing_results: Dict[str, Dict[str, int]]) -> int:
+    """
+    检查目标原子(sp)是否只参与了角共享（共享1个阴离子）。
+
+    该函数是 calculate_polyhedra_sharing 的配套函数。
+
+    参数:
+        sharing_results (dict): 来自 calculate_polyhedra_sharing 函数的输出结果。
+
+    返回:
+        int:
+        - 1: 如果 sp 的共享关系中，边共享(2)、面共享(3)等数量均为0，
+             并且至少存在一个角共享(1)。
+        - 0: 如果 sp 存在任何边、面等共享，或者没有任何共享关系。
+    """
+    # 提取与目标原子 sp 相关的共享数据
+    sp_vs_sp_counts = sharing_results.get("sp_vs_sp", {})
+    sp_vs_other_counts = sharing_results.get("sp_vs_other", {})
+
+    # 1. 检查是否存在任何边共享、面共享等 (共享数 > 1)
+    # 检查 sp-sp 的共享
+    for num_shared_str, count in sp_vs_sp_counts.items():
+        if int(num_shared_str) > 1 and count > 0:
+            return 0  # 发现了边/面共享，立即返回 0
+
+    # 检查 sp-other 的共享
+    for num_shared_str, count in sp_vs_other_counts.items():
+        if int(num_shared_str) > 1 and count > 0:
+            return 0  # 发现了边/面共享，立即返回 0
+
+    # 2. 检查是否存在至少一个角共享 (共享数 == 1)
+    # 运行到这里，说明已经没有任何边/面共享了。
+    # 现在需要确认是否真的存在角共享，而不是完全没有共享。
+    corner_share_sp_sp = sp_vs_sp_counts.get("1", 0) > 0
+    corner_share_sp_other = sp_vs_other_counts.get("1", 0) > 0
+
+    if corner_share_sp_sp or corner_share_sp_other:
+        return 1  # 确认只存在角共享
+    else:
+        return 0  # 没有任何共享关系，也返回 0
+
+# structure = Structure.from_file("../raw/0921/wjy_001.cif")
+# a = CS_catulate(structure,notice=True)
+# b = CS_count(structure,a)
+# print(f"{a}\n{b}")
+# print(check_only_corner_sharing(a))
--- a/py/utils/init.py
+++ b/py/utils/init.py
--- a/py/utils/analyze_env_st.py
+++ b/py/utils/analyze_env_st.py
@@ -0,0 +1,210 @@
+#!/usr/bin/env python
+# This code extracts the lithium environment of all of lithium sites provided in a structure file.
+import os, sys
+import numpy as np
+import scipy
+import argparse
+from scipy.spatial import ConvexHull
+from itertools import permutations
+from pymatgen.core.structure import Structure
+from pymatgen.core.periodic_table import *
+from pymatgen.core.composition import *
+from pymatgen.ext.matproj import MPRester
+from pymatgen.io.vasp.outputs import *
+from pymatgen.analysis.chemenv.coordination_environments.coordination_geometry_finder import LocalGeometryFinder
+from pymatgen.analysis.chemenv.coordination_environments.structure_environments import LightStructureEnvironments
+from pymatgen.analysis.chemenv.coordination_environments.chemenv_strategies import SimplestChemenvStrategy
+from pymatgen.analysis.chemenv.coordination_environments.coordination_geometries import *
+
+__author__ = "KyuJung Jun"
+__version__ = "0.1"
+__maintainer__ = "KyuJung Jun"
+__email__ = "kjun@berkeley.edu"
+__status__ = "Development"
+
+'''
+Input for the script : path to the structure file supported by Pymatgen
+Structures with partial occupancy should be ordered or modified to full occupancy by Pymatgen.
+'''
+parser = argparse.ArgumentParser()
+parser.add_argument('structure', help='path to the structure file supported by Pymatgen', nargs='?')
+parser.add_argument('envtype', help='both, tet, oct, choosing which perfect environment to reference to', nargs='?')
+args = parser.parse_args()
+
+
+class HiddenPrints:
+    '''
+    class to reduce the output lines
+    '''
+
+    def __enter__(self):
+        self._original_stdout = sys.stdout
+        sys.stdout = open(os.devnull, 'w')
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        sys.stdout.close()
+        sys.stdout = self._original_stdout
+
+
+def non_elements(struct, sp='Li'):
+    """
+    struct : 必须是一个有序结构
+    sp : the mobile specie
+    returns a new structure containing only the framework anions (O, S, N).
+    """
+    anions_to_keep = {"O", "S", "N","Br","Cl"}
+    stripped = struct.copy()
+    species_to_remove = [el.symbol for el in stripped.composition.elements
+                         if el.symbol not in anions_to_keep]
+
+    if species_to_remove:
+        stripped.remove_species(species_to_remove)
+
+    return stripped
+
+
+def site_env(coord, struct, sp="Li", envtype='both'):
+    '''
+    coord : Fractional coordinate of the target atom
+    struct : structure object from Pymatgen
+    sp : the mobile specie
+    envtype : This sets the reference perfect structure. 'both' compares CSM_tet and CSM_oct and assigns to the lower one.
+    'tet' refers to the perfect tetrahedron and 'oct' refers to the perfect octahedron
+    result : a dictionary of environment information
+    '''
+    stripped = non_elements(struct)
+    with_li = stripped.copy()
+    with_li.append(sp, coord, coords_are_cartesian=False, validate_proximity=False)
+    with_li = with_li.get_sorted_structure()
+    tet_oct_competition = []
+    if envtype == 'both' or envtype == 'tet':
+        for dist in np.linspace(1, 4, 601):
+            neigh = with_li.get_neighbors(with_li.sites[0], dist)
+            if len(neigh) < 4:
+                continue
+            elif len(neigh) > 4:
+                break
+            neigh_coords = [i.coords for i in neigh]
+            with HiddenPrints():
+                lgf = LocalGeometryFinder(only_symbols=["T:4"])
+                lgf.setup_structure(structure=with_li)
+                lgf.setup_local_geometry(isite=0, coords=neigh_coords)
+            try:
+                site_volume = ConvexHull(neigh_coords).volume
+                tet_env_list = []
+                for i in range(20):
+                    tet_env = {'csm': lgf.get_coordination_symmetry_measures()['T:4']['csm'], 'vol': site_volume,
+                               'type': 'tet'}
+                    tet_env_list.append(tet_env)
+                tet_env = min(tet_env_list, key=lambda x: x['csm'])
+                tet_oct_competition.append(tet_env)
+
+            except Exception as e:
+                print(e)
+                print("This site cannot be recognized as tetrahedral site")
+            if len(neigh) == 4:
+                break
+    if envtype == 'both' or envtype == 'oct':
+        for dist in np.linspace(1, 4, 601):
+            neigh = with_li.get_neighbors(with_li.sites[0], dist)
+            if len(neigh) < 6:
+                continue
+            elif len(neigh) > 6:
+                break
+            neigh_coords = [i.coords for i in neigh]
+            with HiddenPrints():
+                lgf = LocalGeometryFinder(only_symbols=["O:6"], permutations_safe_override=False)
+                lgf.setup_structure(structure=with_li)
+                lgf.setup_local_geometry(isite=0, coords=neigh_coords)
+            try:
+                site_volume = ConvexHull(neigh_coords).volume
+                oct_env_list = []
+                for i in range(20):
+                    '''
+                    20 times sampled in case of the algorithm "APPROXIMATE_FALLBACK" is used. Large number of permutations
+                    are performed, but the default value in the function "coordination_geometry_symmetry_measures_fallback_random"
+                    (NRANDOM=10) is often too small. This is not a problem if algorithm of "SEPARATION_PLANE" is used.
+                    '''
+                    oct_env = {'csm': lgf.get_coordination_symmetry_measures()['O:6']['csm'], 'vol': site_volume,
+                               'type': 'oct'}
+                    oct_env_list.append(oct_env)
+                oct_env = min(oct_env_list, key=lambda x: x['csm'])
+                tet_oct_competition.append(oct_env)
+
+            except Exception as e:
+                print(e)
+                print("This site cannot be recognized as octahedral site")
+            if len(neigh) == 6:
+                break
+
+    if len(tet_oct_competition) == 0:
+        return {'csm': np.nan, 'vol': np.nan, 'type': 'Non_' + envtype}
+    elif len(tet_oct_competition) == 1:
+        return tet_oct_competition[0]
+    elif len(tet_oct_competition) == 2:
+        csm1 = tet_oct_competition[0]
+        csm2 = tet_oct_competition[1]
+        if csm1['csm'] > csm2['csm']:
+            return csm2
+        else:
+            return csm1
+
+
+def extract_sites(struct, sp="Li", envtype='both'):
+    """
+    struct : structure object from Pymatgen
+    envtype : 'tet', 'oct', or 'both'
+    sp : target element to analyze environment
+    """
+    envlist = []
+
+    # --- 关键修改：直接遍历原始结构，即使它是无序的 ---
+    # 我们不再调用 get_sorted_structure()
+    # 我们只关心那些含有目标元素 sp 的位点
+
+    # 遍历每一个位点 (site)
+    for i, site in enumerate(struct):
+        # 检查当前位点的组分(site.species)中是否包含我们感兴趣的元素(sp)
+        # site.species.elements 返回该位点上的元素列表，例如 [Element Li, Element Fe]
+        # [el.symbol for el in site.species.elements] 将其转换为符号列表 ['Li', 'Fe']
+        site_elements = [el.symbol for el in site.species.elements]
+
+        if sp in site_elements:
+            # 如果找到了Li，我们就对这个位点进行环境分析
+            # 注意：我们将原始的、可能无序的 struct 传递给 site_env
+            # 因为 site_env 内部的函数 (如 LocalGeometryFinder) 知道如何处理它
+
+            # 为了让下游函数（特别是 non_elements）能够工作，
+            # 我们在这里创建一个一次性的、临时的有序结构副本给它
+            # 这可以避免我们之前遇到的所有 'ordered structures only' 错误
+            temp_ordered_struct = struct.get_sorted_structure()
+
+            singleenv = site_env(site.frac_coords, temp_ordered_struct, sp, envtype)
+
+            envlist.append({'frac_coords': site.frac_coords, 'type': singleenv['type'], 'csm': singleenv['csm'],
+                            'volume': singleenv['vol']})
+
+    if not envlist:
+        print(f"警告: 在结构中未找到元素 {sp} 的占位。")
+
+    return envlist
+
+def export_envs(envlist, sp='Li', envtype='both', fname=None):
+    '''
+    envlist : list of dictionaries of environment information
+    fname : Output file name
+    '''
+    if not fname:
+        fname = "extracted_environment_info" + "_" + sp + "_" + envtype + ".dat"
+    with open(fname, 'w') as f:
+
+        f.write('List of environment information\n')
+        f.write('Species : ' + sp + "\n")
+        f.write('Envtype : ' + envtype + "\n")
+        for index, i in enumerate(envlist):
+            f.write("Site index " + str(index) + ": " + str(i) + '\n')
+
+
+# struct = Structure.from_file("../raw/0921/wjy_475.cif")
+# site_info = extract_sites(struct, envtype="both")
+# export_envs(site_info, sp="Li", envtype="both")
--- a/readme.md
+++ b/readme.md
@@ -1,68 +1,74 @@
-# 高通量筛选
+# 高通量筛选与扩胞项目

-## 配置需求
+## 环境配置需求

-需要两个conda环境，名字分别为**screen**,**zeo**
+项目需要配置两个 Conda 环境，名称分别为 **screen** 和 **zeo**。

-### zeo
+### 1. zeo 环境 (用于几何结构分析)
+*   **Python**: 2
+*   **核心库**: `zeo++` (需编译), `pymatgen==2018.12.12`, `numpy==1.16.6`
+*   **其他**: `os`, `argparse`, `PrettyTable`, `monty`, `future`

-#### 运行库需求
+### 2. screen 环境 (用于逻辑筛选与数据处理)
+*   **Python**: 3.11.4
+*   **核心库**: `pymatgen==2024.11.13`, `pandas` (新增，用于处理CSV)

-``` 2018.12.12
-python == 2
-pymatgen == 2018.12.12
-Numpy = 1.16.6
-os
-argparse = 1.4.0
-PrettyTable = 1.01
-monty = 1.0.0
-future = 1.0.0
-```
+## 快速开始

-#### zeo++软件需求
+1.  **数据准备**:
+    *   如果数据来源为 **Materials Project (MP)**，请将 CIF 文件放入 `data/input_pre`。
+    *   如果数据来源为 **ICSD**，请直接将 CIF 文件放入 `data/input`。
+2.  **运行**:
+    *   确保已创建上述两个 Conda 环境。
+    *   在根目录下运行自动化脚本：
+        ```bash
+        bash main.sh
+        ```

-需要编译后放入python库中
+## 处理流程详解

-### screen
+### Stage 1: 预处理与基础筛选 (Step 1)
+*   **Pre-process**: 清洗数据，统一放入 `input` 文件夹。
+*   **Step 1**: 
+    *   读取 CIF 文件，利用 `crystal_2` 库检查电荷平衡与化学式。
+    *   **文件重组**: 将通过筛选的文件按阴离子类型分类。
+    *   **新结构**: 每个材料拥有独立的文件夹（例如 `after_step1/O/141/141.cif`），便于管理后续的计算日志。
+*   **Make SH**: 自动生成用于调用 Zeo++ 的 `analyze.sh` 脚本。

-```
-python == 3.11.4
-pymatgen == 2024.11.13
-```
+### Stage 2: Zeo++ 计算
+*   切换至 `zeo` 环境。
+*   计算材料的孔径 (Percolation diameter)、比表面积等几何参数。
+*   结果输出为每个文件夹下的 **`log.txt`**。

-## 使用说明
+### Stage 3: 数据提取与联合筛选 (Step 2-4)
+*   **数据提取 (`extract_data.py`)**: 
+    *   自动遍历所有文件夹中的 `log.txt`。
+    *   提取关键参数：`Percolation Diameter` (Step 2), `Minimum of d` (Step 3), `Maximum Node Length` (Step 4)。
+    *   结果汇总为 CSV 文件保存在 `output/` 目录下（例如 `output/O/O.csv`）。
+*   **联合筛选 (`step2_4_combined.py`)**:
+    *   读取 CSV 文件，根据预设的阈值（如 O: Perc>0.5, Min_d<3.0, Node>2.2）进行过滤。
+    *   **结果**: 将符合所有条件的材料，以**软链接**的形式汇聚到 `data/after_screening` 文件夹中。

-如果配置的conda环境同名，运行**main.sh**即可
+---

-当数据来源为MP时，需要将数据放在input_pre中
+## 扩胞逻辑 (Step 5 - 待后续执行)

-如果数据来源为ICSD，仅需将数据放在input中即可
+目前扩胞逻辑维持原状，基于筛选后的结构进行处理。

+### 算法分解
+1.  **读取结构**: 解析 CIF 文件。
+2.  **统计 Occupation**: 
+    *   将具有相同 Occupation 值的原子归为一类。
+    *   生成 `Occupation_list` 字典。
+3.  **计算扩大倍数**:
+    *   根据 Occupation 的分子分母情况（如 0.5 对应 1/2），计算公约数。
+4.  **生成结构列表**:
+    *   根据分子与分母生成 `structure_list`。
+5.  **对称性处理与扩胞**:
+    *   根据材料结构的对称性，生成三个方向的扩胞列表 (如 `{"x":1, "y":2, "z":1}`)。
+6.  **生成新文件**:
+    *   结合 `structure_list` 与扩胞倍数生成最终的超胞 CIF。

-
-# 扩胞
-## 以下为每一步的分解
-### Step1
-读取cif文件
-### Step2
-统计Occupation情况，将具有相同Occupation值的记为一类，用Occupation值作为Key创建字典，该字典的一个项为atom_serial,是一个列表，记录相同Ocupation值的原子序号
-将上述字典输入一个列表Occupation_list，字典预留分子与分母两个参数
-需要函数为
-```angular2html
-def process_cif_file(struct)
-    return Occupation_list 
-```
-### step3
-根据Occupation_list来计算扩大倍数\\
-首先逐一计算每个字典的分子与分母，根据key来计算，例如第一个key值为0.5,此时其对应分子为1，分母为2
-合并没一个字典，探索每一个分数的情况并求出公约数与对应的分子，更新每一个字典的值
-### step4
-根据分子与分母情况，生成structure_list，其中Occupation_list中的元素的number处的和为分子，总共个数为分母
-### step5
-根据材料结构决定对称性，对不同对称性得到不同等效情况
-根据对称性与最终扩胞生成三个方向扩胞列表，其中每个元素是字典，遵循格式为{["x":1,"y":2,"z":1]}
-### step5
-根据structure_list与Occupation_list生成新的cif并保存
-### 一些假设
-只考虑两个原子在同一位置上，暂不考虑三个原子以上的情况
-不考虑Li原子的共占位情况，对Li原子不做处理
+### 假设条件
+*   只考虑两个原子在同一位置上的共占位情况。
+*   不考虑 Li 原子的共占位情况，对 Li 原子不做处理。
--- a/tool/Li/Br.yaml
+++ b/tool/Li/Br.yaml
--- a/tool/Li/Cl.yaml
+++ b/tool/Li/Cl.yaml
--- a/tool/Li/O.yaml
+++ b/tool/Li/O.yaml
--- a/tool/Li/S.yaml
+++ b/tool/Li/S.yaml
Author	SHA1	Message	Date
koko	da26e0c619	CSM及TET，CS	2025-12-14 12:57:34 +08:00
koko	cea5ab6d3f	CSM及TET，CS	2025-12-07 22:30:46 +08:00
koko	e885893484	CSM及TET，CS	2025-12-07 22:19:50 +08:00
koko	3d44b31194	CSM及TET，CS	2025-12-07 20:08:19 +08:00
koko	08f5a51fc4	开始加入CSM值计算	2025-12-07 17:55:25 +08:00
koko	1d416d4dd8	calc—_v2	2025-12-07 16:01:42 +08:00
koko	b9da6d9592	calc—_v1	2025-12-07 15:22:36 +08:00
koko	35a4bf640f	calc	2025-12-07 14:02:24 +08:00