ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-26 19:55:04 +00:00

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

This commit is contained in:

Dibakar Gope

2024-07-10 07:14:51 -05:00

committed by

GitHub

parent 83321c6958

commit 0f1a39f343

14 changed files with 2534 additions and 53 deletions

									
										10

Makefile
									
												View File
												
				@ -835,7 +835,8 @@ OBJ_GGML += \

					ggml/src/ggml.o \

					ggml/src/ggml-alloc.o \

					ggml/src/ggml-backend.o \

					ggml/src/ggml-quants.o

					ggml/src/ggml-quants.o \

					ggml/src/ggml-aarch64.o

				OBJ_LLAMA = \

					src/llama.o \

				@ -969,6 +970,13 @@ ggml/src/ggml-quants.o: \

					ggml/src/ggml-common.h

					$(CC) $(CFLAGS)    -c $< -o $@

				ggml/src/ggml-aarch64.o: \

					ggml/src/ggml-aarch64.c \

					ggml/include/ggml.h \

					ggml/src/ggml-aarch64.h \

					ggml/src/ggml-common.h

					$(CC) $(CFLAGS)    -c $< -o $@

				ggml/src/ggml-blas.o: \

					ggml/src/ggml-blas.cpp \

					ggml/include/ggml-blas.h

ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)

10 Makefile Unescape Escape View File

10

Makefile

View File