Thread level speculation (TLS) allows compiler to create speculative threads for a sequential program without having to prove independence between threads. However, the performance of TLS may suffer from frequent inter-thread data dependences. Previous study shows that it is beneficial to schedule instructions for scalar variable communication. We extend that scheduling algorithm to optimize non-scalar variable communication. Intra-thread data/control speculations with recovery support are also used to further increase the overlap between threads. Even though such aggressive scheduling techniques are used, some parallelized loops may still suffer performance loss due to their sequential nature. It is important to identify those loops before we apply compiler optimizations. We develop a loop selection method that can automatically select a set of loop candidates suitable for parallelization under TLS. It includes three key parts: context-sensitive loop profiling, TLP estimation and optimal loop set selection.