Recent research has utilized text-to-image diffusion models in conjunction with natural image datasets to achieve photo-realistic image reconstruction from brain activity. In this Project, we deliver ongoing progress reports detailing the evaluation and characterization of these models and datasets. Thus far, our findings indicate that the accuracy of reconstructions based on text-to-image diffusion models often diminishes when tested on the Deeprecon dataset (Shen et al., 2017/2019) supplemented with captions, as opposed to the NSD dataset employed in the original studies. Umap visualization of NSD images reveals only ~40 clusters in the semantic feature space of the CLIP model, which is used for brain decoding and image generation. These clusters present significant overlap between the NSD training and test image sets. For each test image, one can find visually similar images in the same cluster of the test set. Conversely, the Deeprecon dataset displays more diverse and less overlapping distributions, aligning with its intended design. It is noteworthy that one of the studies engages in post-generation cherry-picking, which results in misleadingly accurate outcomes. These findings invite a critical appraisal of the much-heralded reports, questioning if the results truly represent reconstruction that generalizes beyond training examples. We remain committed to delivering further results from our ongoing evaluations and characterizations, and to proposing guidelines for generalizable methods and reliable evaluations.