Quantifying discriminability of evaluation metrics in link prediction for real networks

Which evaluation metrics works best at discriminating methods?