On August 26, Alibaba Cloud launched Qwen-VL, a large-scale visual language model, which is directly open source in one step. Qwen-VL is developed with Qwen-7B, the 7 billion parameter model of Tongyi Qianwen, as the base language model, which supports graphic input and has multi-modal information understanding capabilities. In the mainstream multimodal task evaluation and multimodal chat ability evaluation, Qwen-VL has achieved far better performance than the general-purpose model of the same scale. Compared with the previous VL model, Qwen-VL not only has basic graphic and text recognition, description, question and answer and dialogue capabilities, but also adds visual positioning, image Chinese word understanding and other capabilities.