Image indoor localization, which is advantageous in infrastructure-less deployment and high positioning accuracy among various indoor localization schemes, is usually deployed in a lightweight end (shooting picture) and heavyweight server (searching image for location in the database) manner. Such a scheme is suffered from two major drawbacks: (1) The uploading and searching of the shooting pictures cause a severe burden to the server. (2) The image map will become defected as time elapse, such as missing features in initialization maps and continuous changes in the scene. To keep the image map up-to-date, a labor-intensive site survey is required with dedicated devices. The frequent updating will further exhaust the limited resources in the server. These two drawbacks have greatly hindered the image-based indoor localization from large-scale deployment. To this end, we study how to use crowdsourcing data to mitigate map defects and optimize location computation and map storage through edge architecture. In this paper, we propose CrowdLoc, an edge-assisted image localization architecture with lightweight multi-view localization on clients and crowdsourcing map-cache on the edge. Our method is mainly innovative in three parts. Firstly, CrowdLoc provides a lightweight multi-view localization on the client-side and the low-confidence localization can be offloaded to the edge server to use the crowdsourcing map for more accurate calculation. Secondly, CrowdLoc uses a map defect perception algorithm to recognize the update of the scene during user localization and update the crowdsourcing map-cache automatically on the edge side. Thirdly, in order to improve cache utilization, we propose a cache elimination strategy that runs periodically on the edge side. It can classify cache data according to the frequency of cache hits on a regular basis, remove invalidation data, and persist long-term updated data to the client map to reduce clients’ computing offload. We evaluate the performance of our method based on the real data in the lobby of an experimental building with a map missing rate of 20% and a map update rate of 40%. The results show that the mean localization error of CrowdLoc is 1.54 meters, which is 59.4% lower than that of multi-view localization without crowdsourcing maps, and only 0.56 meters higher than localization in non-defective maps.