The extraction of spectral features from a music clip is a computationally
expensive task. As in order to extract accurate features, we need to process
the clip for its whole length. This preprocessing task creates a large overhead
and also makes the extraction process slower. We show how formatting a dataset
in a certain way, can help make the process more efficient by eliminating the
need for processing the clip for its whole duration, and still extract the
features accurately. In addition, we discuss the possibility of defining set
generic durations for analyzing a certain type of music clip while training.
And in doing so we cut down the need of processing the clip duration to just
10% of the global average.