algbench package
Subpackages
Submodules
algbench.benchmark module
- class algbench.benchmark.Benchmark(path: str, save_output: bool = True, hide_output: bool = True, save_output_with_time: bool = True)[source]
Bases:
objectThis is the heart of the library. It allows to run, save, and load a benchmark.
The function add will run a configuration, if it is not already in the database. You can also split this into check and run. This may be advised if you want to distribute the execution.
The following functions are thread-safe:
exists
run
add
insert
front
capture_logger
unlink_logger
__iter__
Don’t call any of the other functions while the benchmark is running. It could lead to data loss.
- __init__(path: str, save_output: bool = True, hide_output: bool = True, save_output_with_time: bool = True) None[source]
Just specify the path of where to put the database and everything else happens magically. Make sure not to use the same path for different databases, as they will get mixed.
- Parameters:
path – The path to the database.
save_output – If true, all output (stdout and stderr) will be saved. If set to false, the output will be discarded. This is useful if you have a lot of output and don’t want to waste disk space. However, you will not be able to see the output of the algorithm afterwards. Note that the output can only be saved if the code aquires the Python sys.stdout and sys.stderr streams during the execution, as the corresponding streams are replaced by the benchmark. Normal
printstatements do so, butlogging.StreamHandlerdoes not. For the latter, useBenchmark.capture_logger.hide_output – If true, all output (stdout and stderr) will be hidden. This is useful if you have a lot of output and don’t want to clutter your console. However, you will not be able to see the output of the algorithm while it is running. Code the aquired handles to the Python sys.stdout and sys.stderr streams earlier will still be able to print to the console, as they circumvent the replacement.
save_output_with_time – If true, all output (stdout and stderr) will be saved with the time it was written. This gives you more insights on the runtime of the algorithm, but also increases the size of the database. This option is ignored if save_output is set to false.
- capture_logger(logger_name: str, level=0)[source]
Capture the logs of a logger of the Python logging module. This allows you to precisely control which logs you want to capture. Prefer logging to stdout/stderr, as just using
printwill not allow you to control the output of sub-algorithms. The logging module also allows you to serch more easily for specific log entries, if used correctly. However, it is more expensive than just usingprintas more metadata is created. Don’t overuse it but only log important events in the algorithm.- Parameters:
logger_name – The name of the logger to capture.
level – The level of the logger to capture. The logger will will automatically be set to this level while capturing, but will be reset afterwards. NOTSET will not change the level.
- Returns:
None
- unlink_logger(logger_name: str)[source]
Stop capturing the logs of a logger of the Python logging module while the benchmark is running.
- exists(func: Callable, *args, **kwargs) bool[source]
Use this function to check if an entry already exist and thus does not have to be run again. If you want to have multiple samples, add a sample index argument.
Caveat: This function may have false negatives. i.e., says that it does not exist despite it existing (only for fresh data).
- run(func: Callable, *args, **kwargs)[source]
Will add the function call with the arguments to the benchmark.
The output of stdout and stderr will be captured and stored, but not printed to the console.
- add(func: Callable, *args, **kwargs)[source]
Will add the function call with the arguments to the benchmark if not yet contained.
Combination of check and run. Will only call run if the arguments are not yet in the benchmark.
- delete()[source]
Delete the benchmark and all its files. Do not use it afterwards, there are no files left to write results into. If you just want to delete the content, use `clear.
NOT THREAD-SAFE!
- front() Dict | None[source]
Return the first entry of the benchmark. Useful for checking its content.
- clear()[source]
Clears all entries of the benchmark, without deleting the benchmark itself. You can continue to use it afterwards.
NOT THREAD-SAFE!
- delete_if(condition: Callable[[Dict], bool])[source]
Delete entries if a specific condition is met (return True). Recreates the internal ‘results’ folder for this porpose. Use front to get a preview on how an entry that is passed to the condition looks like.
NOT THREAD-SAFE!
- apply(func: Callable[[Dict], Dict | None])[source]
Allows to modify all entries (in place !) inside this benchmark, based on the provided callable. It is being called for every entry inside the database, and the returned entry will be stored instead. If None is returned, the provided entry will be deleted from the database.
NOT THREAD-SAFE, execute this while no other instance is accessing the file system.
algbench.benchmark_db module
algbench.environment module
Gathering information of the environment the code is running.
- algbench.environment.get_git_revision() str | None[source]
Return the git revision of the current working directory.
algbench.fingerprint module
algbench.pandas module
- algbench.pandas.read_as_pandas(path: str, row_creator: Callable[[Dict], Dict | None]) pandas.DataFrame[source]
Read the benchmark as pandas table. For this, you have to tell the function, which data should go into which column. If you want to skip an entry, return None (or an empty dict) in the row_creator.
An example could look like this:
t = read_as_pandas( "./03_benchmark_data/", lambda result: { "instance": result["parameters"]["args"]["instance_name"], "strategy": result["parameters"]["args"]["alg_params"]["strategy"], "interchange": result["parameters"]["args"]["alg_params"].get( "interchange", None ), "colors": result["result"]["n_colors"], "runtime": result["runtime"], "num_vertices": result["result"]["num_vertices"], "num_edges": result["result"]["num_edges"], }, )
- Parameters:
path – Path to the benchmark
row_creator – Function that creates a row from an entry
- Returns:
Pandas DataFrame
Module contents
AlgBench is designed to perform benchmarks on algorithms. It saves a lot of the information automatically, reducing the usual boilerplate code.
benchmark = Benchmark("./test_benchmark")
def f(x, _test=2, default="default"):
print("Run Algorithm!")
x = x + x
return {"r1": x, "r2": "test"}
benchmark.add(f, 1, _test=None)
benchmark.add(f, 2)
benchmark.add(f, 3, _test=None)
benchmark.compress()
for entry in benchmark:
print(entry["parameters"], entry["data"])
benchmark.delete()
- class algbench.Benchmark(path: str, save_output: bool = True, hide_output: bool = True, save_output_with_time: bool = True)[source]
Bases:
objectThis is the heart of the library. It allows to run, save, and load a benchmark.
The function add will run a configuration, if it is not already in the database. You can also split this into check and run. This may be advised if you want to distribute the execution.
The following functions are thread-safe:
exists
run
add
insert
front
capture_logger
unlink_logger
__iter__
Don’t call any of the other functions while the benchmark is running. It could lead to data loss.
- __init__(path: str, save_output: bool = True, hide_output: bool = True, save_output_with_time: bool = True) None[source]
Just specify the path of where to put the database and everything else happens magically. Make sure not to use the same path for different databases, as they will get mixed.
- Parameters:
path – The path to the database.
save_output – If true, all output (stdout and stderr) will be saved. If set to false, the output will be discarded. This is useful if you have a lot of output and don’t want to waste disk space. However, you will not be able to see the output of the algorithm afterwards. Note that the output can only be saved if the code aquires the Python sys.stdout and sys.stderr streams during the execution, as the corresponding streams are replaced by the benchmark. Normal
printstatements do so, butlogging.StreamHandlerdoes not. For the latter, useBenchmark.capture_logger.hide_output – If true, all output (stdout and stderr) will be hidden. This is useful if you have a lot of output and don’t want to clutter your console. However, you will not be able to see the output of the algorithm while it is running. Code the aquired handles to the Python sys.stdout and sys.stderr streams earlier will still be able to print to the console, as they circumvent the replacement.
save_output_with_time – If true, all output (stdout and stderr) will be saved with the time it was written. This gives you more insights on the runtime of the algorithm, but also increases the size of the database. This option is ignored if save_output is set to false.
- capture_logger(logger_name: str, level=0)[source]
Capture the logs of a logger of the Python logging module. This allows you to precisely control which logs you want to capture. Prefer logging to stdout/stderr, as just using
printwill not allow you to control the output of sub-algorithms. The logging module also allows you to serch more easily for specific log entries, if used correctly. However, it is more expensive than just usingprintas more metadata is created. Don’t overuse it but only log important events in the algorithm.- Parameters:
logger_name – The name of the logger to capture.
level – The level of the logger to capture. The logger will will automatically be set to this level while capturing, but will be reset afterwards. NOTSET will not change the level.
- Returns:
None
- unlink_logger(logger_name: str)[source]
Stop capturing the logs of a logger of the Python logging module while the benchmark is running.
- exists(func: Callable, *args, **kwargs) bool[source]
Use this function to check if an entry already exist and thus does not have to be run again. If you want to have multiple samples, add a sample index argument.
Caveat: This function may have false negatives. i.e., says that it does not exist despite it existing (only for fresh data).
- run(func: Callable, *args, **kwargs)[source]
Will add the function call with the arguments to the benchmark.
The output of stdout and stderr will be captured and stored, but not printed to the console.
- add(func: Callable, *args, **kwargs)[source]
Will add the function call with the arguments to the benchmark if not yet contained.
Combination of check and run. Will only call run if the arguments are not yet in the benchmark.
- delete()[source]
Delete the benchmark and all its files. Do not use it afterwards, there are no files left to write results into. If you just want to delete the content, use `clear.
NOT THREAD-SAFE!
- front() Dict | None[source]
Return the first entry of the benchmark. Useful for checking its content.
- clear()[source]
Clears all entries of the benchmark, without deleting the benchmark itself. You can continue to use it afterwards.
NOT THREAD-SAFE!
- delete_if(condition: Callable[[Dict], bool])[source]
Delete entries if a specific condition is met (return True). Recreates the internal ‘results’ folder for this porpose. Use front to get a preview on how an entry that is passed to the condition looks like.
NOT THREAD-SAFE!
- apply(func: Callable[[Dict], Dict | None])[source]
Allows to modify all entries (in place !) inside this benchmark, based on the provided callable. It is being called for every entry inside the database, and the returned entry will be stored instead. If None is returned, the provided entry will be deleted from the database.
NOT THREAD-SAFE, execute this while no other instance is accessing the file system.
- algbench.read_as_pandas(path: str, row_creator: Callable[[Dict], Dict | None]) pandas.DataFrame[source]
Read the benchmark as pandas table. For this, you have to tell the function, which data should go into which column. If you want to skip an entry, return None (or an empty dict) in the row_creator.
An example could look like this:
t = read_as_pandas( "./03_benchmark_data/", lambda result: { "instance": result["parameters"]["args"]["instance_name"], "strategy": result["parameters"]["args"]["alg_params"]["strategy"], "interchange": result["parameters"]["args"]["alg_params"].get( "interchange", None ), "colors": result["result"]["n_colors"], "runtime": result["runtime"], "num_vertices": result["result"]["num_vertices"], "num_edges": result["result"]["num_edges"], }, )
- Parameters:
path – Path to the benchmark
row_creator – Function that creates a row from an entry
- Returns:
Pandas DataFrame
- class algbench.JsonLogHandler(level=0)[source]
Bases:
HandlerA logging handler that stores log entries in a list of JSON compatible dictionaries.
- class algbench.JsonLogCapture(logger_name: str, level=0, handler: JsonLogHandler | None = None)[source]
Bases:
objectA context manager that captures logs and returns them as a list of JSON
- __init__(logger_name: str, level=0, handler: JsonLogHandler | None = None) None[source]
- Parameters:
logger_name – The name of the logger to catch.
level – The level of the logger to catch.