Empirical data collection for software fault prediction using open-source Java projects