Current radio communication systems that adopt amplitude and phase modulations demand high linearity and high efficiency. The cascade connection between digital baseband pre-distorter (DPD) and power amplifier (PA) can be a cost-effective solution to guarantee the required linearity without compromising the efficiency. In the design of a DPD for a single band PA, direct learning can be used to extract the pre-inverse parameters or, alternatively, indirect learning can be employed by exchanging the position of the system during the identification procedure to avoid the necessity of a PA model within a closed-loop process. The performance of direct learning is substantially dependent on the accuracy of the behavioral model that replaces the PA. Furthermore, in a practical environment where only an approximation to the inverse is achieved, the linearization capability of the indirect learning is affected by shifting the post-inverse placed after the PA to a pre-inverse located before the PA. For concurrent dual-band PAs, an additional advantage of the indirect approach is that the post-inverse identifications for each band are completely independent of each other. In an authors’ previous work, a comparative analysis between the two learning architectures applied to the linearization of concurrent dual-band PAs was performed based on DPDs modeled by polynomials with memory. This work contributions are the extension of such comparative analysis to DPDs modeled by artificial neural networks, the development of complex-valued three-layer perceptrons suitable for concurrent-dual band DPDs and the introduction of a modified indirect approach to improve the accuracy of previous direct and indirect learnings. Spectre-RF transient simulations are performed in the circuit-under-test described by a wideband 130 nm CMOS PA concurrently stimulated by 2.4 GHz Wi-Fi and 3.5 GHz LTE signals. Reported simulation results show that, in a comparison with the previous direct and indirect learnings with similar output mean powers of about 60 mW, the modified indirect approach provides a superior linearity performance. The modified indirect learning reduces the error vector magnitude (EVM) metric to 0.87% and 1.13% for Wi-Fi and LTE bands, respectively, whereas the indirect and direct learnings achieve EVM equal to or larger than 1.05% and 1.49% for Wi-Fi and LTE bands, respectively.