In this paper, we study game dynamics and learning schemes for heterogeneous 4G networks. We introduce a novel learning scheme called cost-to-learn that incorporates the cost to switch, the switching delay, and the cost of changing to a new action and, captures the realistic behavior of the users that we have experimented on OPNET simulations. Considering a dynamic and uncertain environment where the users and operators have only a numerical value of their own payoffs as information, we construct various heterogeneous combined fully distributed payoff and strategy reinforcement learning (CODIPAS-RL): the users try to learn their own optimal payoff and their optimal strategy simultaneously. We establish the asymptotic pseudo-trajectories as solution of differential equations. Using evolutionary game dynamics, we prove the convergence and stability properties in specific classes of dynamic robust games. We provide various numerical examples and OPNET simulations in the context network selection in wireless local area networks (WLAN) and Long Term Evolution (LTE).