From a1edccd6474e16c35ad0992f18bc02d03d4e8365 Mon Sep 17 00:00:00 2001
From: Zihao Yu <yuzihao@ict.ac.cn>
Date: Mon, 28 Sep 2020 13:42:17 +0800
Subject: [PATCH] benchmarks: readd README.md

---
 .gitignore                      |   1 +
 benchmarks/coremark/README.md   | 231 ++++++++++++++++++++++++++++++++
 benchmarks/microbench/README.md |  69 ++++++++++
 3 files changed, 301 insertions(+)
 create mode 100644 benchmarks/coremark/README.md
 create mode 100644 benchmarks/microbench/README.md

diff --git a/.gitignore b/.gitignore
index 8653c2c..5170a12 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,7 @@
 !*.S
 !Makefile
 !README
+!README.md
 !LICENSE
 .*
 _*
diff --git a/benchmarks/coremark/README.md b/benchmarks/coremark/README.md
new file mode 100644
index 0000000..209e4c2
--- /dev/null
+++ b/benchmarks/coremark/README.md
@@ -0,0 +1,231 @@
+# Coremark
+
+'''
+File: CoreMark
+
+Topic: Welcome
+Copyright � 2009 EEMBC All rights reserved. 
+CoreMark is a trademark of EEMBC and EEMBC is a registered trademark of the Embedded Microprocessor Benchmark Consortium.
+
+CoreMark�s primary goals are simplicity and providing a method for testing only a processor�s core features. 
+
+For more information about EEMBC's comprehensive embedded benchmark suites, please see www.eembc.org.
+
+Topic: Building and running
+	Download the release files from the www.coremark.org.
+	You can verify the download using the coremark_<version>.md5 file
+	> md5sum -c coremark_<version>.md5
+	
+	Unpack the distribution (tar -vzxf coremark_<version>.tgz && tar -vzxf coremark_<version>_docs.tgz) 
+	then change to the coremark_<version> folder.
+	
+	To build and run the benchmark, type 
+	> make
+	Full results are available in the files run1.log and run2.log.
+	CoreMark result can be found in run1.log.
+	
+	For self hosted Linux or Cygwin platforms, a simple make should work.
+	
+	Cross Compile:
+	For cross compile platforms please adjust <core_portme.mak>, <core_portme.h> (and possibly <core_portme.c>) 
+	according to the specific platform used.
+	When porting to a new platform, it is recommended to copy one of the default port folders 
+	(e.g. mkdir <platform> && cp linux/* <platform>), adjust the porting files, and run 
+	> make PORT_DIR=<platform>
+	
+	Systems without make:
+	The following files need to be compiled:
+	- <core_list_join.c> 
+	- <core_main.c> 
+	- <core_matrix.c> 
+	- <core_state.c>	
+	- <core_util.c>	
+	- <PORT_DIR>/<core_portme.c>
+	
+	For example
+	> gcc -O2 -o coremark.exe core_list_join.c core_main.c core_matrix.c core_state.c core_util.c simple/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=1000 
+	> ./coremark.exe > run1.log
+	The above will compile the benchmark for a performance run and 1000 iterations. Output is redirected to run1.log.
+	
+	Make targets:
+	run - Default target, creates run1.log and run2.log.
+	run1.log - Run the benchmark with performance parameters, and output to run1.log
+	run2.log - Run the benchmark with validation parameters, and output to run2.log
+	run3.log - Run the benchmark with profile generation parameters, and output to run3.log
+	compile - compile the benchmark executable 
+	link - link the benchmark executable
+	check - test MD5 of sources that may not be modified
+	clean - clean temporary files
+	
+	ITERATIONS: 
+	By default, the benchmark will run between 10-100 seconds.
+	To override, use ITERATIONS=N
+	> make ITERATIONS=10 
+	Will run the benchmark for 10 iterations. 
+	It is recommended to set a specific number of iterations in certain situations e.g.:
+	- Running with a simulator
+	- Measuring power/energy
+	- Timing cannot be restarted
+	
+	Minimum required run time: 
+	Results are only valid for reporting if the benchmark ran for at least 10 secs!
+	
+	XCFLAGS:
+	To add compiler flags from the command line, use XCFLAGS e.g.
+	> make XCFLAGS="-g -DMULTITHREAD=4 -DUSE_FORK=1"
+	
+	o CORE_DEBUG
+	
+	Define to compile for a debug run if you get incorrect CRC.
+	> make XCFLAGS="-DCORE_DEBUG=1"
+	
+	o Parallel Execution
+	
+	Use XCFLAGS=-DMULTITHREAD=N where N is number of threads to run in parallel.
+	Several implementations are available to execute in multiple contexts,
+	or you can implement your own in <core_portme.c>.
+	> make XCFLAGS="-DMULTITHREAD=4 -DUSE_PTHREAD" 
+	Above will compile the benchmark for execution on 4 cores, using POSIX Threads API.
+	
+	REBUILD:
+	To force rebuild, add the flag REBUILD to the command line
+	> make REBUILD=1
+	
+	Check core_portme.mak for more important options.
+
+	Run parameters for the benchmark executable:
+	Coremark executable takes several parameters as follows (if main accepts arguments).
+	1st - A seed value used for initialization of data.
+	2nd - A seed value used for initialization of data.
+	3rd - A seed value used for initialization of data.
+	4th - Number of iterations (0 for auto : default value)
+	5th - Reserved for internal use. 
+	6th - Reserved for internal use. 
+	7th - For malloc users only, ovreride the size of the input data buffer.
+	
+	The run target from make will run coremark with 2 different data initialization seeds.
+
+	Alternative parameters: 
+	If not using malloc or command line arguments are not supported, the buffer size
+	for the algorithms must be defined via the compiler define TOTAL_DATA_SIZE.
+	TOTAL_DATA_SIZE must be set to 2000 bytes (default) for standard runs.
+	The default for such a target when testing different configurations could be ...
+	> make XCFLAGS="-DTOTAL_DATA_SIZE=6000 -DMAIN_HAS_NOARGC=1"
+	
+Topic: Documentation
+	When you unpack the documentation (tar -vzxf coremark_<version>_docs.tgz) a docs folder will be created.
+	Check the file docs/html/index.html and the website http://www.coremark.org for more info.
+	
+Topic: Submitting results
+	CoreMark results can be submitted on the web.
+	
+	Open a web browser and go to http://www.coremark.org/benchmark/index.php?pg=benchmark
+	Select the link to add a new score and follow the instructions.
+	
+Topic: Run rules
+	What is and is not allowed.
+	
+	Required:
+	1 - The benchmark needs to run for at least 10 seconds.
+	2 - All validation must succeed for seeds 0,0,0x66 and 0x3415,0x3415,0x66, 
+		buffer size of 2000 bytes total.
+		o If not using command line arguments to main:
+		> make XCFLAGS="-DPERFORMANCE_RUN=1" REBUILD=1 run1.log
+		> make XCFLAGS="-DVALIDATION_RUN=1" REBUILD=1 run2.log
+	3 - If using profile guided optimization, profile must be generated using seeds of 8,8,8,
+		and buffer size of 1200 bytes total.
+		> make XCFLAGS="-DTOTAL_DATA_SIZE=1200 -DPROFILE_RUN=1" REBUILD=1 run3.log
+	4 - All source files must be compiled with the same flags.
+	5 - All data type sizes must match size in bits such that:
+		o ee_u8 is an 8 bits datatype.
+		o ee_s16 is an 16 bits datatype.
+		o ee_u16 is an 16 bits datatype.
+		o ee_s32 is an 32 bits datatype.
+		o ee_u32 is an 32 bits datatype.
+	
+	Allowed:
+	- Changing number of iterations
+	- Changing toolchain and build/load/run options
+	- Changing method of acquiring a data memory block
+	- Changing the method of acquiring seed values
+	- Changing implementation in core_portme.c
+	- Changing configuration values in core_portme.h
+	- Changing core_portme.mak
+	
+	Not allowed:
+	- Changing of source file other then core_portme* (use make check to validate)
+
+Topic: Reporting rules
+	How to report results on a data sheet?
+
+	CoreMark 1.0 : N / C [/ P] [/ M]
+	
+	N - Number of iterations per second with seeds 0,0,0x66,size=2000)
+	C - Compiler version and flags
+	P - Parameters such as data and code allocation specifics
+		- This parameter *may* be omitted if all data was allocated on the heap in RAM.
+		- This parameter *may not* be omitted when reporting CoreMark/MHz
+	M - Type of parallel execution (if used) and number of contexts
+		This parameter may be omitted if parallel execution was not used.
+
+	e.g. 
+	> CoreMark 1.0 : 128 / GCC 4.1.2 -O2 -fprofile-use / Heap in TCRAM / FORK:2 
+	or
+	> CoreMark 1.0 : 1400 / GCC 3.4 -O4 
+	
+	If reporting scaling results, the results must be reported as follows:
+	
+	CoreMark/MHz 1.0 : N / C / P [/ M]
+	
+	P - When reporting scaling results, memory parameter must also indicate memory frequency:core frequency ratio.
+		- If the core has cache and cache frequency to core frequency ratio is configurable, that must also be included.
+	
+	e.g.
+	> CoreMark/MHz 1.0 : 1.47 / GCC 4.1.2 -O2 / DDR3(Heap) 30:1 Memory 1:1 Cache
+
+	
+Topic: Log File Format
+	The log files have the following format
+(start example)
+2K performance run parameters for coremark.	(Run type)
+CoreMark Size    	: 666					(Buffer size)
+Total ticks			: 25875					(platform dependent value)
+Total time (secs) 	: 25.875000				(actual time in seconds)
+Iterations/Sec 		: 3864.734300			(Performance value to report)
+Iterations			: 100000				(number of iterations used)
+Compiler version	: GCC3.4.4				(Compiler and version)	
+Compiler flags		: -O2					(Compiler and linker flags)
+Memory location		: Code in flash, data in on chip RAM
+seedcrc				: 0xe9f5				(identifier for the input seeds)
+[0]crclist			: 0xe714				(validation for list part)
+[0]crcmatrix		: 0x1fd7				(validation for matrix part)
+[0]crcstate			: 0x8e3a				(validation for state part)
+[0]crcfinal			: 0x33ff				(iteration dependent output)
+Correct operation validated. See readme.txt for run and reporting rules.  (*Only when run is successful*)
+CoreMark 1.0 : 6508.490622 / GCC3.4.4 -O2 / Heap 						  (*Only on a successful performance run*)		
+(end example)
+
+Topic: Legal
+See LICENSE.txt or the word document file under docs/LICENSE.doc.
+For more information on your legal rights to use this benchmark, please see
+http://www.coremark.org/download/register.php?pg=register	
+
+Topic: Credits
+Many thanks to all of the individuals who helped with the development or testing of CoreMark including (Sorted by company name)
+o Alan Anderson, ADI
+o Adhikary Rajiv, ADI
+o Elena Stohr, ARM
+o Ian Rickards, ARM
+o Andrew Pickard, ARM
+o Trent Parker, CAVIUM
+o Shay Gal-On, EEMBC
+o Markus Levy, EEMBC
+o Ron Olson, IBM
+o Eyal Barzilay, MIPS
+o Jens Eltze, NEC
+o Hirohiko Ono, NEC
+o Ulrich Drees, NEC
+o Frank Roscheda, NEC
+o Rob Cosaro, NXP
+o Shumpei Kawasaki, RENESAS
+'''
diff --git a/benchmarks/microbench/README.md b/benchmarks/microbench/README.md
new file mode 100644
index 0000000..b332e2a
--- /dev/null
+++ b/benchmarks/microbench/README.md
@@ -0,0 +1,69 @@
+# MicroBench
+
+CPU正确性和性能测试用基准程序。对AbstractMachine的要求:
+
+1. 需要实现TRM和IOE的API。
+2. 在IOE的全部实现均留空的情况下仍可运行。如果有正确实现的`AM_TIMER_UPTIME`,可以输出正确的统计时间。若这个功能没有实现(返回`0`),仍可进行正确性测试。
+3. 使用`putch(ch)`输出。
+4. 堆区`heap`必须初始化(堆区可为空)。如果`heap.start == heap.end`,即分配了空的堆区,只能运行不使用堆区的测试程序。每个基准程序会预先指定堆区的大小,堆区不足的基准程序将被忽略。
+
+## 使用方法
+
+同一组程序分成三组:test,train和ref。
+test数据规模很小,作为测试用,不计时不评分。
+train数据规模中等,可用于在仿真环境研究微结构行为,计时不评分。
+ref数据规模较大,作为衡量CPU性能用,计时并评分。
+
+默认运行ref数据规模,使用
+```bash
+make ARCH=native run mainargs=test
+```
+运行test数据规模,使用
+```bash
+make ARCH=native run mainargs=train
+```
+运行train数据规模。
+
+## 评分根据
+
+每个benchmark都记录以`REF_CPU`为基础测得的运行时间微秒数。每个benchmark的评分是相对于`REF_CPU`的运行速度,与基准处理器一样快的得分为`REF_SCORE=100000`。
+
+所有benchmark的平均得分是整体得分。
+
+## 已有的基准程序
+
+| 名称    | 描述                                   | ref堆区使用  |
+| ----- | ------------------------------------ | ----- |
+| qsort | 快速排序随机整数数组                           | 640KB |
+| queen | 位运算实现的n皇后问题                          | 0     |
+| bf    | Brainf**k解释器,快速排序输入的字符串              | 32KB  |
+| fib   | Fibonacci数列f(n)=f(n-1)+…+f(n-m)的矩阵求解 | 256KB |
+| sieve | Eratosthenes筛法求素数                    | 2MB   |
+| 15pz  | A*算法求解4x4数码问题                        | 2MB   |
+| dinic | Dinic算法求解二分图最大流                      | 1MB   |
+| lzip  | Lzip数据压缩                             | 4MB   |
+| ssort | Skew算法后缀排序                           | 4MB   |
+| md5   | 计算长随机字符串的MD5校验和                      | 16MB  |
+
+## 增加一个基准程序`foo`
+
+在`src/`目录下建立名为`foo`的目录,将源代码文件放入。
+
+每个基准程序需要实现三个函数:
+
+* `void bench_foo_prepare();`:进行准备工作,如初始化随机数种子、为数组分配内存等。运行时环境不保证全局变量和堆区的初始值,因此基准程序使用的全局数据必须全部初始化。
+* `void bench_foo_run();`:实际运行基准程序。只有这个函数会被计时。
+* `int bench_foo_validate();`:验证基准程序运行结果。正确返回1,错误返回0。
+
+在`benchmark.h`的`BENCHMARK_LIST`中增加相应的`def`项,格式参考已有的benchmark。
+
+## 基准程序可以使用的库函数
+
+虽然klib中提供了一些函数,但不同的klib实现会导致性能测试结果有差异。
+因此MicroBench中内置一些简单的库函数:
+
+* `bench_memcpy(void *dst, const void *src, size_t n)`: 内存复制。
+* `bench_srand(uint seed)`:用seed初始化随机数种子。
+* `bench_rand()`:返回一个0..32767之间的随机数。
+* `bench_alloc`/`bench_free`:内存分配/回收。目前回收是空操作。
+