The auto-awq model loading error is caused by your LLM inference software not having access to the AutoAWQ software package while trying to load up an AWQ quantized model. As AutoAWQ has been officially deprecated as of may 2025 with its functionality largely adopted by the vLLM project, aside from fixing this error by manually installing the AutoAWQ library you might consider switching to other model types, such as GGUF.
Quick Solution: Install the Missing AutoAWQ Library
The simplest fix for the error message ImportError: Loading an AWQ quantized model requires auto-awq library (pip install autoawq)
, is simply installing the missing AutoAWQ software package. This Python library is needed to load models quantized in the AWQ format, which the software you’re using is attempting to do.
To quickly fix this error, open your terminal or command prompt in the environment in which you’re attempting to load the model and run the following pip command:
pip install autoawq
Keep in mind that as the AutoAWQ library is deprecated, in some contexts you might face compatibility issues with other packages such as torch. Make sure that the software you’re using can work with the latest available version of AutoAWQ.
Why This Error Happens
This import error occurs because the model you’re trying to load is in AWQ format (Activation-aware Weight Quantization) – which is a 4-bit model compression method. AWQ models require the AutoAWQ loader library to work.
If this library isn’t present, the Transformers backend component of the software you’re using raises the mentioned error telling you to install it.
This can happen in many LLM setups – for instance when directly interfacing with the transformers library, or when using text generation WebUI software like OobaBooga when you attempt to load an AWQ model without having first added the AutoAWQ dependency manually.
AWQ Support in Oobabooga WebUI (Deprecated)
If you’re using the Oobabooga Text Generation WebUI, note that AWQ model support has been deprecated in newer versions of the application in the response to sunsetting the mainline AutoAWQ project in may 2025.
As of release 1.15 of the “text-generation-webui”, the developers have removed the built-in AutoAWQ support due to incompatibilities with newer versions of PyTorch and the CUDA framework. This is why the WebUI no longer lists an “AutoAWQ” loader option, and why selecting an AWQ model will trigger an import error.
While you can still manually install the AutoAWQ library if you really want to use AWQ models, this is an unofficial workaround, and you will most likely get errors related to other software packages not being compatible with the newest version of the discontinued AutoAWQ library, as you can see here.
That’s why the most feasible solution for OobaBooga users is to switch to other supported model formats.
Better Solution: Switch to Supported Model Formats (GGUF, GGML, GPTQ, ExLlama)
While the AWQ quantization algorithm and model format aren’t deprecated themselves, the deprecation of the standalone AutoAWQ loader.
While you can still use AWQ models without any trouble in software that support compatible loaders such as vLLM, MLX-LM and so on, if you don’t (and for instance you want to stay with OobaBooga Text Generation WebUI or KoboldCPP), it might be best to switch to more modern model formats.
Most modern LLM tools and UIs now favor formats like GGUF (and the older GGML), GPTQ, or ExLlama v2 (EXL2) over AWQ. For example, if you are running models with CPU-focused backends like llama.cpp or Kobold, the recommended format is GGUF (the unified successor to GGML) for quantized models.
Most new popular models are available as GGUF (mainly for CPU inference) and GPTQ files (for GPU inference), which will both load directly in llama.cpp, OobaBooga WebUI or KoboldCpp without needing to install any extra software libraries.
So, if you’re able to successfully install the AutoAWQ package without any compatibility issues, it should resolve the immediate error and let you use an AWQ model. However, since the AutoAWQ loader is no longer officially supported in the latest tools, it might be wise to consider switching to a more widely supported quantization format like GGUF or GPTQ. That’s pretty much it!