--- title: Path_Planning_evaluate datasets: - GeoBenchmark tags: - evaluate - metric description: 'TODO: add a description here' sdk: gradio sdk_version: 6.5.1 app_file: app.py pinned: false --- # Metric Card for Path_Planning_evaluate This metric is used to evaluate path planning tasks where an LM as to generate a valid path going from a starting point to one or multiple end points in a grid and by avoiding all the obstacles. ## Metric Description This metric is used to evaluate path planning tasks where an LM as to generate a valid path going from a starting point to one or multiple end points in a grid and by avoiding all the obstacles. ## How to Use This metric takes 5 mandatory arguments : `generations` (a list of string), `golds` (a list of list of integers corresponding to the gold paths in a list format), `obstacles` (a list of list of integers corresponding to the coordinates of the obstacles for each question), `ends` (a list of list of integers corresponding to the coordinates of the ending points for each question) and `n` (a list of integers corresponding to the size pf the grid). ```python import evaluate pp_eval = evaluate.load("rfr2003/path_planning_evaluate") results = pp_eval.compute( generations=['[(0,0), (0,1), (1,1)]', '[(0,0), (1,0), (1,1)]', '[(0,0), (1,0), (1,1), (0,1)]', '(0,0'], golds=[[(0,0), (0,1), (1,1)], [(0,0), (0,1), (1,1)], [(0,0), (0,1)], []], obstacles=[[(1,0)], [(1,0)], [], []], ends=[[(1,1)], [(1,1)], [(0,1)], [(0,1)]], n=[2, 2, 2, 2] ) print(results) {'compliance_ratio': 0.75, 'success_ratio': 0.6666666666666666,'optimal_ratio': 0.3333333333333333, 'feasible_ratio': 0.6666666666666666, 'distance': 0, 'unreachable_acc': 1.0} ``` This metric doesn't take any optionnal arguments. ### Output Values This metric outputs a dictionary with the following values: `compliance_ratio`: The ratio of `generations` that complied to a list format across all questions, which ranges from 0.0 to 1.0. `feasible_ratio`: The ratio of `generations` that are feasable among all reachable questions, which ranges from 0.0 to 1.0. `sucess_ratio`: The ratio of `generations` that are correct among all reachable questions, which ranges from 0.0 to 1.0. `optimal_ratio`: The ratio of `generations` that are optimal among all reachable questions, which ranges from 0.0 to 1.0. `distance`: The mean distance to the end point for feasable paths that were not correct, it's a positive real. `unreachable_acc`: The ratio of detected unreachable paths among all unreachable paths, which ranges from 0.0 to 1.0. #### Values from Popular Papers ### Examples ```python import evaluate pp_eval = evaluate.load("rfr2003/path_planning_evaluate") results = pp_eval.compute( generations=['[(0,0), (0,1), (1,1)]', '[(0,0), (1,0), (1,1)]', '[(0,0), (1,0), (1,1), (0,1)]', '(0,0'], golds=[[(0,0), (0,1), (1,1)], [(0,0), (0,1), (1,1)], [(0,0), (0,1)], []], obstacles=[[(1,0)], [(1,0)], [], []], ends=[[(1,1)], [(1,1)], [(0,1)], [(0,1)]], n=[2, 2, 2, 2] ) print(results) {'compliance_ratio': 0.75, 'success_ratio': 0.6666666666666666,'optimal_ratio': 0.3333333333333333, 'feasible_ratio': 0.6666666666666666, 'distance': 0, 'unreachable_acc': 1.0} ``` ## Limitations and Bias ## Citation ## Further References