References

There are many sources that I cite in my projects. Some of them naturally fit as inline text hyperlinks, but others require an end-of-project bibliography (especially key paper citations). Rather than have one for each post, this will be a running list of references that all my quantization related posts will share.

Widrow, B., & Kollár, I. (2008). Quantization noise: roundoff error in digital computation, signal processing, control, and communications. Cambridge University Press.

Wu, H., Judd, P., Zhang, X., Isaev, M., & Micikevicius, P. (2020). Integer quantization for deep learning inference: Principles and empirical evaluation. arXiv preprint arXiv:2004.09602.

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). Qlora: Efficient finetuning of quantized llms. Advances in neural information processing systems, 36, 10088-10115.

Dettmers, T., Lewis, M., Belkada, Y., & Zettlemoyer, L. (2022). Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale. Advances in neural information processing systems, 35, 30318-30332.

Dudley, R. (2023). The Shapiro–Wilk test for normality.

Meller, E., Finkelstein, A., Almog, U., & Grobman, M. (2019, May). Same, same but different: Recovering neural network quantization error through weight factorization. In International Conference on Machine Learning (pp. 4486-4495). PMLR.

Noune, B., Jones, P., Justus, D., Masters, D., & Luschi, C. (2022). 8-bit numerical formats for deep neural networks. arXiv preprint arXiv:2206.02915.