--- license: mit language: en tags: - materials science - synthesis prediction - lightgbm - cheminformatics datasets: [] metrics: - accuracy - f1 --- # Synthesis Condition Predictor This model predicts optimal temperature bins and atmosphere categories for inorganic material synthesis. It was trained on a dataset of text-mined synthesis procedures. Here is the source of the dataset: https://www.nature.com/articles/s41597-019-0224-1 **Models Included:** * Temperature Bin Prediction (LightGBM) * Atmosphere Category Prediction (LightGBM) **Intended Use:** To assist researchers in designing synthesis experiments by predicting key process parameters. Input a target material, precursors, and basic operational details to get predictions. **How to Use:** ```python # Ensure your inference script and its dependencies are in the PYTHONPATH # from synthesis_predictor_hf_repo.src.inference import predict_synthesis_outcome, load_all_artifacts_once # Or, if running from a cloned repo where 'src' is a subdirectory: # from src.inference import predict_synthesis_outcome, load_all_artifacts_once # if not load_all_artifacts_once(): # print("Failed to load model artifacts.") # else: # raw_input_example = { # 'target_formula_raw': "YBa2Cu3O7", # 'precursor_formulas_raw': ["Y2O3", "BaCO3", "CuO"], # 'operations_simplified_list': [ # {'type': 'MixingOperation', 'string': 'Ball milling for 2h', 'conditions': {'duration': [{'value':2, 'unit':'h'}]}}, # {'type': 'HeatingOperation', 'string': 'Calcined at 920C for 10h in air', # 'conditions': {'heating_temperature': [{'value':920}], 'heating_time': [{'value':10}], 'atmosphere':'air'}}, # {'type': 'HeatingOperation', 'string': 'Sintered at 950C for 20h in O2', # 'conditions': {'heating_temperature': [{'value':950}], 'heating_time': [{'value':20}], 'atmosphere':'Oxygen'}} # ], # 'reactants_coeffs': [("Y2O3", 0.5), ("BaCO3", 2.0), ("CuO", 3.0)], # Example, adjust as needed # 'products_coeffs': [("YBa2Cu3O7", 1.0)] # Example # } # predictions = predict_synthesis_outcome(raw_input_example) # print(predictions) ``` **Limitations:** * The model's accuracy is around 68-72%. * Predictions are based on patterns in the training data and may not generalize to all chemical systems. * The feature engineering for process parameters in the inference script relies on the user providing an `operations_simplified_list` that can be parsed by the internal logic. The quality of these inputs directly affects prediction accuracy. **Training Data:** The model was trained on a proprietary dataset of text-mined inorganic synthesis procedures. (Kononova et al.) https://www.nature.com/articles/s41597-019-0224-1 **Evaluation Results:** The models were evaluated on a hold-out test set. **1. Tuned Temperature Bin Prediction Model:** * **Overall Test Set Accuracy:** 0.6821 * **Overall Test Set F1 Score (Weighted):** 0.6785 * **Per-Class Performance (Test Set):** ``` precision recall f1-score support TempBin_1_(1_to_900] 0.77 0.79 0.78 954 TempBin_2_(900_to_1100] 0.62 0.53 0.57 743 TempBin_3_(1100_to_1300] 0.58 0.58 0.58 768 TempBin_4_(1300_to_3000] 0.72 0.80 0.76 715 accuracy 0.68 3180 macro avg 0.67 0.68 0.67 3180 weighted avg 0.68 0.68 0.68 3180 ``` **2. Tuned Atmosphere Category Prediction Model:** * **Overall Test Set Accuracy:** 0.7193 * **Overall Test Set F1 Score (Weighted):** 0.7174 * **Per-Class Performance (Test Set):** ``` precision recall f1-score support Inert 0.59 0.38 0.46 139 Other_Atm_Target 1.00 0.44 0.62 9 Oxidizing 0.67 0.71 0.69 1552 Reducing 0.70 0.47 0.56 100 Unknown_Atm_Category 0.76 0.76 0.76 2098 accuracy 0.72 3898 macro avg 0.74 0.55 0.62 3898 weighted avg 0.72 0.72 0.72 3898 ``` )