With the advancement of social media and mobile technology, any smartphone users can easily become a seller on social media and e-commerce platforms, such as Instagram and Carousell in Hong Kong, or Taobao in China. A seller shows images of their products, and annotates their images with suitable tags that can be searched easily by others. Those images could be taken by the seller, or the seller could use images shared by other sellers. Among sellers, some sell counterfeit goods, and these sellers may use disguising tags and language, which make detecting them a difficult task. This paper proposes a framework to detect counterfeit sellers by using deep learning to discover connections among sellers from their shared images. Based on 473K shared images from Taobao, Instagram and Carousell, it is proven that the proposed framework can detect counterfeit sellers. The framework is 30% better than approaches using object recognition in detecting counterfeit sellers . To the best of our knowledge, this is the first work to detect online counterfeit sellers from their shared images. The project is under discussion that it is on the way of commercialization.

Data Collection

Data is collected from Taobao, a popular Chinese e-commerce platform. On Taobao, there are individual sellers and famous brands, so it contains a wide range of products. The collected data can be divided into two categories, shoes and cosmetics, which are commonly found on Taobao. The information, including the prices and images of the products are collected using Octopus. For each seller, 80 products are selected from their product list, and all images of each product is collected. In order to avoid the interference of thumbnails and advertising images, we set the minimum size of the captured images to 400x400. In total, 101,090 and 51,870 images are collected from 93 and 100 shoes and cosmetics sellers, respectively. There are 38 counterfeit and 55 non-counterfeit sellers among the shoe sellers, while there are 23 counterfeit and 77 non-counterfeit sellers among the cosmetics sellers. The sellers are labelled manually by surveying 40 experienced online shoppers to mark each seller independently by considering the seller’s pages, images and price.

The data is available here:

Dataset 1 - images of sellers encoded by Resnet. The encoded images of each seller (SellerIDXX, xx is the seller ID) is stored in corresponding folder.

Dataset 2a, Dataset 2b - images of sellers. The encoded images of each seller (SellerIDXX, xx is the seller ID) is stored in corresponding folder.

Groundtruth - the label of each seller: 1 for counterfeit sellers and -1 for others.


[1] Cheung, M., She, J., & Liu, L. (2018, April). Deep learning-based online counterfeit-seller detection. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) (pp. 51-56). IEEE.

[2] Cheung, M., She, J., Sun, W., & Zhou, J. (2019). Detecting online counterfeit-goods seller using connection discovery. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 15(2), 1-16.