Combinatorial optimization problems are very common in the real world but difficult to solve. Among the promising algorithms that have been successful in solving these problems are metaheuristics. The two basic search behaviors used in metaheuristics are exploration and exploitation, and the success of metaheuristic search largely depends on the balance of these two behaviors. Machine learning techniques have provided considerable support to improve data-driven optimization algorithms. One of the techniques that stands out is Q-Learning, which is a reinforcement learning technique that penalizes or rewards actions according to the consequence it entails. In this work, a general discretization framework is proposed where Q-Learning can adapt a continuous metaheuristic to work in discrete domains. In particular, we use Q-learning so that the algorithm learns an optimal binarization schemEqe selection policy. The policy is dynamically updated based on the performance of the binarization schemes in each iteration. Preliminary experiments using our framework with sine cosine algorithm show that the proposal presents promising results compared to other algorithms.