The project is not yet taken by anybody.
It might however be that I don't have capacities left in general. Please check out the main page on research projects to see whether I'm maxed out. (I don't post this here so that I don't have to update all my project descriptions in case I don't accept more students.)
TLDR:
In a very nutshell, your task is to run multiple planners multiple times with different random seeds and evaluate the performance difference thus showing the impact of randomness. In a second step, depending on the outcome, this should be exploited via restarting the respective planer after a certain (well-chosen) time. The usefulness of restarts should be put into perspective by investigating other problem solving techniques such as SAT solving.
The slightly longer version:
The performance of planners performing search depends on many factors, such as deadend detectors, the search algorithm, and quality of heuristics. But if the search space contains millions (or more) of search nodes, some decisions -- i.e., which node to select next -- are bound to be not unique. Some programs are artificially made deterministic by choosing the newest or oldest node as a tie-breaker (although this can be regarded a doubtful strategy as this tie-breaker might not always carry a clear semantics). Even if such a strategy is not deployed, we still need deterministic results to make programs comparable. It would be quite bad if we tried to reproduce results reported in a scientific paper and would get very different outcomes -- just due to randomness. Thus, in such cases programs are "artificially" made deterministic by always making the same random decision in cases where multiple equivalent options exist. This is done with a random seed, that often can be specified as an input parameter (or that is otherwise set in the code).
Since a search procedure (including the chosen heuristic and all other parameters) never always lead to unique decisions throughout the entire search it would be good practice to call each planner many times and report the average as well as standard deviation. Since time is always limited nobody ever does this (sadly) -- the experiments would simply take too long. So, currently, it's just not known how well some search configuration actually performs on average, and which impact randomness has.
This project aims at closing this knowledge gap. Thus, the task is to conduct a comprehensive empirical evaluation on a large range of planning problems with multiple state-of-the-art planners. Both classical as well as non-hierarchical planning systems should be chosen. The research may focus on planning via search, but other approaches, such as planning via compilation to SAT, can also be evaluated. In this evaluation, each planner is called many times, so that mean and standard deviation (most importantly for number of generated search nodes) can be reported.
Depending on these values it shall be investigated whether restarts might be beneficial. I.e., rather than investing more time for the current search configuration, it could be cancelled and started again with a different random seed in hope that different (random) choices might solve the problem quicker than before. In principle, this can also be investigated in combination with iterative deepening search, so that different planning episodes have different search horizons (search depths) available.
Your task is: