Obtaining spatially continuous, high resolution thermal images is crucial in order to effectively analyze heat-related phenomena in urban areas and the inherent high spatial and temporal variations. Spatiotemporal Fusion (STF) methods can be applied to enhance spatial and temporal resolutions simultaneously, but most STF approaches for the generation of Land Surface Temperature (LST) have not focused specifically on urban regions. This study therefore proposes a two-phase approach using Landsat 8 and MODIS images acquired on a study area in Beijing to first, investigate the sharpening of the fine resolution image input with urban-related spectral indices and second, to explore the potential of implementing the sharpened results into the Spatiotemporal Adaptive Data Fusion Algorithm for Temperature Mapping (SADFAT) to generate high spatiotemporal resolution LST images in urban areas. For this test, five urban indices were selected based on their correlation with brightness temperature. In the thermal sharpening phase, the Fractional Urban Cover (FUC) index was able to delineate spatial details in urban regions whilst maintaining its correlation with the original brightness temperature image. In the STF phase however, FUC sharpened results returned relatively high levels of correlation coefficient values up to 0.689, but suffered from the highest Root Mean Squared Error (RMSE) and Average Absolute Difference (AAD) values of 4.260 K and 2.928 K, respectively. In contrast, Normalized Difference Building Index (NDBI) sharpened results recorded the lowest RMSE and AAD values of 3.126 K and 2.325 K, but also the lowest CC values. However, STF results were effective in delineating fine spatial details, ultimately demonstrating the potential of using sharpened urban or built-up indices as a means to generate sharpened thermal images for urban areas, as well as for input images in the SADFAT algorithm. The results from this study can be used to further improve STF approaches for daily and spatially continuous mapping of LST in urban areas.